Third-Party Analytics Library Integration Case Study - OpenGamma Strata

A key TeraHelix offering is the ability to integrate and run third-party analytics libraries as a standard part of the platform. So rather than taking the data to the code, TeraHelix allows you to bring your code to the data. There are many analytics library choices out there, however in this blog we will explore a concrete integration example using OpenGamma’s excellent open source Strata library.

OpenGamma’s Strata

Strata is an open source analytics and market risk library from OpenGamma. The available product coverage, while perhaps not as comprehensive as one would find with an in-house maintained library at a large financial institution, it is nevertheless a valuable addition to the toolbox. Even if you do not intend to use Strata as your main pricing and risk calculator, you should still consider employing it for validation purposes. Implementing a process of 'checks-and-balances' on your in-house analytics will provide valuable insights and catch potential issues early on in your product development life cycle.

As part of the work done on the Systemic Risk Report - FR Y-15, there was a requirement to the be able to perform some calculations for FX Vanilla Options and Bullet Payments. The Strata domain classes where scanned by the TeraHelix integration software and a set of link../spear[Spear] definitions where generated. These Spear definitions were then used as the basis for the construction of the FR Y-15 application and associated reporting flows.

Spear Definition Examples

The Strata domain classes are essentially polymorphic object models with behaviors encapsulated within their definitions. The TeraHelix data tool has support for all the polymorphic concepts one would encounter in a general purpose programming language, albeit with the important distinction that its primary focus is on the data - not the behavior.

For instance, here is an example of the FxVanillaOptionTrade trade definition:

FxVanillaOptionTrade Spear

The trade has a strong association with the underlying FxVanillaOption:

FxVanillaOption Spear

Why Do We Require Spear Definitions?

At this point you may be wonder why there is a need to go through the process of generating Spear definition bindings to the Strata domain classes at all. Can we not use the Strata domain classes directly?

Well, when you build out a solution there are number of platform level considerations which analytics libraries on their own cannot address. Once integrated into a Spear data model, TeraHelix is able to provide the following platform services:

Multi-Platform, Multi-Language Support

Larger organizations and departments often have employees with a divergent skill set and preference for the tools they would like to use to interact with the firm’s data. You might, for instance, have a requirement by group operations to interact with the data using Microsoft Excel, while the data scientists may need to use Python and associated libraries to do their work. Then there are web developers who may need bindings to TypeScript in order for them to build out the front-end user tools.

TeraHelix automatically generates language bindings for all of the above, allowing users to interact with the data using their platform and language of choice. TeraHelix also generates into each of the structures a rich set of cross-platform utilities plus support for a wide range of data formats and integrations.

Spear Generated Languages
Table 1. Generated Sources Examples
Python C# (Used by Excel Add-In)
Generated Python Example
Generated C# Example

Persistence, Audit Trail and Query

With the analytics model defined in Spear, persistence and a full audit trail is now possible. The TeraHelix Store not only keeps all revisions of the data bi-temporally, it also has an understanding of the object’s attributes. This allows for data processing in terms of fields types that make business sense, rather being limited to only generic queries had the data been an opaque blob.

Table 2. Persistence, Audit Trail and Query
Full History Calculation / Data Loading Audit Trail Dynamic SQL Query
Persistence Full History
Calculation Data Loading Audit Trail
Dynamic SQL Query.png

Reporting

The generation of reports from the analytics calculation results often encounter object-relational impedance mismatch problems. This can further lead to the adoption of an anti-pattern where developers encode bespoke data mapping logic into their reporting views - which quickly leads to a maintenance nightmare.

TeraHelix solves this problem by inferring a bi-directional 'flattened-out' relational view for the defined structures. This automatically bridges from object to relation space (and vice versa), enabling the creation of reports from virtually any object definition:

Relational Structure Generation - Object Definitions

Data Quality and Validations

Beyond calculation and reporting there are also data quality and validation concerns. By integrating with excellent libraries such as Amazon Deequ (as per this blog), TeraHelix provides data quality metrics automatically on the analytical data sets:

Data Quality Metrics - Vanilla Options

Calculation Orchestration and Data Loading

Managing the triggering of calculations and the loading of data into the platform is taken care of by TeraHelix’s built in Snapshot Manager and integration of open source ETL tools.

Snapshot Manager - Calculation Flow
Data loading Flow

Conclusion

TeraHelix is the perfect analytics library companion. At the same time as providing essential data services and model management, it broadens the reach of enterprise data to more consumers, use cases and applications. TeraHelix’s automated code generation allows for seamless analytics library integration and accelerates the creation of robust full data platform solutions.