Introducing InGen
An Open Source ETT (Extract, Transform, Transfer) Python Tool
By: Swarna Dhakad, Senior Engineering Team Director, Aladdin Wealth Tech, & Piyush Ranjan, Engineer III, Aladdin Wealth Tech
From ETL to ELT to ETT
Interfaces allow two different systems to exchange data between each other. An example of an interface is a file which contains data in rows and columns. Think of an Environmental, Social and Governance (ESG) data provider sending a daily feed of files containing security identifiers and their ESG scores to an asset manager. Overnight feeds of analytics data, custodial ingestion of accounts and positions are other examples where one system ‘interfaces’ with another by exchanging data in an agreed upon format. Even in the age of APIs and real-time, streaming data, such asynchronous data transfer is still an extremely prevalent use case in the fintech industry.
Therefore, a common requirement is to be able to generate these interfaces repeatedly and automatically. While the purpose of the data generated will vary as per the business logic, the fundamental operations needed to extract the data are typically within a finite set of data operations like reading data from different sources, performing some data massaging and writing it to its destination in a predefined format.
The financial industry, and specifically asset management, has focused a lot on the data integration problems and has built a lot of solutions for doing ETL (Extract, Transform, Load), ELT (Extract, Load, Transform) operations. Most of these solutions require high expertise and skills to use these tools, which requires investing a lot of time in learning and upskilling. Often, we do not need such high-end tools and operations and just want to extract data from some sources and write it in a format that can be ingested to the destination system. We call this pattern Extract, Transform, and Transfer (ETT). ETT can be thought of as extracting data from various sources, transforming it into the required format and sending it to another system through files or APIs.
This was a very common use case at BlackRock in the Aladdin Wealth Tech Business, where we integrate with several external sponsor platforms, custodians, and other systems within the firm. Most of these data exchanges are daily and happen overnight. In the spirit of not reinventing the wheel, we first used existing, available tools for several use cases. However, we began to feel that it would be better for us in the long term to separate the business logic of extraction from the process of extraction. It was time for a better wheel, perhaps?
Introducing: InGen
Our ideas led us to create a new solution, InGen, a Python-based command line tool that allows the user to generate interface files from various sources like databases, files, and HTTP APIs, without writing any new code. The process is completely config-driven, requiring only a configuration file in YAML format, which declares the data sources, the formatting operations to be applied on the data, validations, and the output format.
An important aspect of InGen is that the configuration files are easy to write and can be built and maintained by non-developers, as well. This allows for a much broader set of users that can create their own complex data extracts without needing to involve a software team or write a single line of code.
This tool has been built on top of Pandas and Great Expectations library. The data from the defined sources is read into a pandas dataframe and goes through a series of transformations as described in the pre-processing and formatting stages of the configuration file. The data transformations can be thought of as a pipeline which you construct.
Get BlackRockEngineering’s stories in your inbox
Join Medium for free to get updates from this writer.
Pre-processing steps combine data from multiple sources — SQL like joins, concatenation of multiple data frames or filtering of duplicates can be performed in this stage. At the end of this stage, we should have a single data frame which is then passed to the next stage where column level formatting is applied.
Once the data has gone through all these steps, the last step is to write it. The most common example is to write this data to a file in tabular structure. However, InGen also supports transforming data into JSON and pushing it to a web API.
Real-World Example
An example of where we used InGen was for a project to integrate one of our operational processes with an external system. We needed to pick up an Excel file sent to us, translate it into a specific JSON format that the external system can ingest, and do some validations along the way. This was solved by using InGen’s file reader, JSON formatter, API writer and source validations.
Here, you can see the config which includes the data source, pre-processing steps on the source, and the desired output format.
Below shows the destination configuration, which is where we want to send the data. Here, we configure the URL of the API that we are calling and the authentication method.
Finally, below shows the formatting steps where translation happens from the source to the destination. Here the column ‘Residence State’ in the source is being renamed to ‘State’. The ‘Portal Date’ column is being renamed to ‘Date’ in the destination as well as undergoing a date formatting.
Open Sourcing InGen
BlackRock relies heavily on open-source software and increasingly aims to give back to the open-source community. To realize the full potential of this configurable command line tool we are making InGen an open-source project with two goals in mind:
- Allow others to use this tool for their interface generation processes.
- Welcome the open-source community to contribute to and enhance this project.
To use InGen, follow the guidelines in the Getting Started section of the README and to contribute to this project checkout our open issues or contribution guidelines.
Learn more about Aladdin and technology careers at BlackRock.

