Enterprise Data Model Discovery and Surveillance

Published on 18 July 2021 by J.Steenkamp

Increasingly complex data models in enterprises lead to duplication, inconsistencies, and integrity issues. Explore challenges, drawbacks of static documentation, and the benefits of continuous integration with TeraHelix Spear for automated data model management.

Challenges of Enterprise Data Model Management

Enterprises, as they mature, accumulate vast and extensive data estates comprising multiple database instances, numerous data feeds between systems, and several reporting layers. Over time, the complexities of the firm’s overall data model accumulate, leading to various adverse effects:

  • Duplication of Data Field Values: The same data field value may be duplicated across multiple model representations with different field names (e.g., ’trade_id’ and ’trade_identifier’ referring to the same value).

  • Inconsistencies in Data Representations: Data may be represented inconsistently, such as the directionality of a trade being denoted by ‘Buy’/‘Sell’, ‘B’/‘S’, or ‘Yes’/‘No’.

  • Incorrect Data Types: Data fields may have incorrect data types (e.g., numeric values represented as strings).

  • Violation of Domain Constraints: Domain constraints may be violated (e.g., negative share price values).

  • Referential Integrity Issues: Incorrect references or referential integrity issues may cause data linkage problems (e.g., an identifier not carried over to the next stage of processing).

These complexities lead to poorly understood data flows within the enterprise, making it expensive to evolve systems or react to changes in external market conditions.

Drawbacks of the ‘Static Documentation’ Approach

To address enterprise data model sprawl, a common approach is to manually capture and document all data fields and data flows. However, this static documentation approach has several drawbacks:

  • Manual creation of documentation is time-consuming and prone to errors.
  • Static documentation quickly becomes outdated as enterprise systems evolve.
  • Risk of misinterpretation by individuals not involved in the original system design.
  • Limited sampling size due to manual analysis, potentially overlooking critical issues.
  • Increased likelihood of mistakes, such as spelling errors and copy-paste errors.

Continuous Integration for Data Model Discovery and Surveillance

To keep pace with an ever-evolving data estate, enterprises need to adopt widespread automation and embrace Continuous Integration principles. This involves frequent and repeatable authentication of the ongoing state of the data model through automated processes.

TeraHelix Spear Approach

TeraHelix Spear provides enterprises with powerful data modeling capabilities and automates the discovery and surveillance of changes to the data model over time. Here’s how:

  1. Data Model Discovery and Inference: Spear can infer data model structures from various sources, including data schemas, APIs, and sample data.

  2. Data Model Consistency: The Spear compiler performs automated checks and validations to ensure data model consistency.

  3. Attribute, Tag, and Annotation Management: Subject matter experts can add context, comments, and tags to data model definitions, ensuring consistency and coherence.

  4. Referential Schema Linking: Spear establishes linkages within and between data model definitions, enhancing understanding of the data estate.

  5. Change Impact Analysis: Spear evaluates new model versions against previous versions to analyze the impact of changes, including detecting common data model mistakes and version incompatibilities.

Conclusion

An accurate and consistent data model is crucial for enterprises to remain competitive and drive down costs. TeraHelix Spear provides the tools required to streamline data model creation and ensure accuracy as the enterprise evolves.