Data Distiller Application

The combination of structured and unstructured data sources can unlock significant value for the enterprise. The TeraHelix platform supplies application developers with an extensible toolkit to consume, enrich and validate data from both structured and unstructured data sources. The Data Distiller application demonstrates these capabilities by combining Twitter feeds, NLP sentiment analysis, PDF annual reports and market volatility data in order to produce new benchmarks and insights. It features:

  • Pluggable natural language processing and sentiment analysis engine. Reference implementation based on Stanford NLP.

  • Uniform data loading - identical pipeline into the TeraHelix Loading Bay, regardless of whether the feed contains PDF documents or real time Twitter messages.

  • PDF Extraction View - drill down from extracted field to specific PDF page.

Perception Score View and Drill Down

  • Instituion Analysis based on Goodwill/Asset Ratio, Stock Price Volatility and Twitter Sentiment.

  • Drill Down / Access to linked underlying Twitter and PDF source data.

Uniform Data Loading Pipeline

  • Same Loading Bay pipeline whether PDF or Twitter feed.

  • Visual inspection / management of the pipeline.

PDF Extraction View

  • Link entries to their original PDF / page location.

  • All 'raw' extraction entries available for further use and model refinement.