Data is fundamental - Use the right model to unlock potential

Over the last decade, data has become an increasingly important part of the enterprise ecosystem. Tremendous amount of effort and expense goes towards the curation and management of enterprise data. And it is is becoming blatantly clear to many firms that in order to thrive in today’s competitive landscape, it is imperative that an advantage is gained through clever use of their data estates. The size, quality and speed of the enterprise’s data all help drive the development of ever complex models [1] that inform, and in many cases, drive business decisions.

Yet, for all the reliance on data and its management, there is a surprising lack of tools in the market place that aid in the logical modelling of data structures and schemas.

Granted, there is no shortage of freely available tools and frameworks that aid with the physical modelling and management of enterprise data. Taking schemas, for instance, you have XML Schema, JSON Schema or even SQL DDL. Then for optimised serialization and 'on-the-wire' formats you again have an abundance of choices in the form of ProtoBuf, Avro and others [2].

These frameworks, however, all have the consequence of tightly coupling your data model to the implementation technology , bringing with it all the associated overheads [3]. Instead, how about having a technology that enables you to define data models in a pure, logical and conceptual sense coupled with the ability to automatically express it in any desired physical implementation platform - essentially ensuring the ubiquity and future-proofing of your most valuable asset?

That technology is Spear.

What is Spear?

Spear derives its name from Wotan’s (Odin’s) spear in Germanic mythology as the instrument on which all the world’s contracts are engraved and the King of the Gods draws his power. This is a fitting analogy to the role that Spear plays for TeraHelix. It is the pivotal piece of the platform that enables numerous features and gives TeraHelix its 'power'.

Technically, an approximate description of Spear would be that of a data definition tool. But unlike other data definition tools, Spear places particular emphasis on being implementation agnostic and having extensive support for large scale data platform features such as built in validations, referential integrity and polymorphic modelling techniques.

How does Spear work?

Spear definition source files can be authored by a domain expert but also generated from third party libraries or set of existing data sets.

The Spear compiler (which can be invoked statically, dynamically or as a hosted service) validates the definitions and produces a Spear Parse Tree or SPT [4]. The SPT codifies the data axioms and is the foundation from where all other features flow.

Spear Features

The Spear feature set is rich and varied. It is effectively a combined toolbox of all the tried-and-tested utilities the TeraHelix team has built up over years of experience working on large scale enterprise wide data platforms. While these warrant a full white paper of their own - here are the broad use-cases that Spear addresses:

Cross Platform

  • Supports Python, Java, Typescript and .NET. More to follow.

  • Data science tool integrations (including Jupyter, Apache Spark and R Studio).

  • Generation of OpenAPI compliant REST APIs for cross system communication.

Serialization and Integrations

  • Interoperability with common formats - XML, JSON, CSV, YAML. And reporting formats such as Parquet.

  • Pluggable serialization - for instance, use Apache Arrow or ProtoBuf at implementation layer.

  • Auto-generation of User Interfaces and other integrations.

Reporting and Data Quality

  • Type 'flattening' to relational structures enabling ANSI SQL querying.

  • Dynamic SQL support for variable column queries on sparse data sets.

  • Automated data quality metrics and validation checks.

Developer

  • Rich set of development paradigms - including generics, functors and polymorphism.

  • Best practices incorporated as standard - including immutability, identity functions and advanced introspection.

  • Test data generation for functional and performance testing.

How does Spear future-proof my data?

Spear’s singular focus is on providing logical modeling capabilities for data structures and consciously avoids coupling itself to any particular implementation technology. Data is elevated to being a first class concept in its own right and not just as a concealed layer in the system.

By using Spear, the enterprise’s intellectual property is defined in terms of a 'data-first' paradigm rather than in terms of whatever implementation technology paradigm happens to be the most efficient and cost-effective on the market today.

As new technologies come into market, Spear simply adapts its output code generation to take advantage of the new implementation platform’s capabilities. The logical data model itself does not change, meaning the existing investment in modeling is preserved.

Notes

  1. While these models are becoming increasingly Machine Learning / Artificial Intelligence driven, they share many similarities traditional analytical models in their requirement for data to do validation, back-testing and calibration

  2. This list is not exhaustive, there are many more technologies in this space, including Apache Thrift and Apache Arrow to name but a few.

  3. A typical symptom of this tight coupling is the hefty cost of developers having to write 'glue code' to convert from one format to another or from one environment (say Java) to another (Python, for instance).

  4. The Spear Parse Tree is defined in terms of Spear itself - at TeraHelix we eat our own dog food.