It all began with ETL: Extract -> Transform -> Load
Data is pulled from various data sources that house user data, it can be multiple API’s or event data or even raw static files. These are transformed to fit into a predefined data model and then loaded into the data warehouse.
This in itself turned out to be problematic for customers because it blocks iteration. This method forces you to commit to a data model and any kind of extra information needed demands changing the transformation step, which ties you to look at the source data every time.
Enter ELT: Extract -> Load -> Transform
This is the new way of ingesting data into the system specially with the advent of cloud data warehouses which house tons of data in its raw form.
The staging tables allow you to easily reiterate and make any extra information available for the final tables.
While all traditional systems worked on implementing this to bring data into the system from various sources. A new data problem started arising for organisations:
- Marketing team: I need audiences and visits data to send to my marketing team, but we only need the required data and no PII
- Sales team: I want to integrate purchase data to Hubspot for analytics purposes, but only need the calculated numbers.
- BI team: I want to load event analytics to Looker but only need mobile app events so that I can show data to my business organisation
...and many more.
While these requests became increasingly more and more with the advent of several AI and data tools that ease the work of non-technical orgs, came the advent of:
This is the process of picking up data from the data warehouse and building custom data models for each application that the other teams need support on.
Many companies began to address this problem by building connectors from the data warehouse to several marketing and advertising applications. However this in itself is another big cost factor point for organisations.
Imagine paying a huge sum of money to set up a CDP as well as other ETL tools to bring data from various sources into the warehouse and then paying even more money to a reverseETL provider to get data out of your system now for other applications. It becomes less feasible for companies to continue to setup these tools and then maintain each of them individually
Enter the composable CDP -> One platform that does ETL/ELT + reverse ETL together.
This platform enables you to get data from various sources into the system and then from the data warehouse to several plugin applications. Many CDP’s and reverseETL started expanding their capabilities or I would say pivoting to this all in one architecture that allows you to do everything.
But let's come to reality now:
How easy it is for all these organisations that have collected data over the years to a single data warehouse. In real life following issues are pretty common:
- Data is in silos and different data is owned by different teams which is a challenge to consolidate in itself
- Data is messy. All data is not clean in the data warehouse and to use these connectors directly still requires some work from the engineering team to clean it up before being able to use it. Imagine event data with missing keys
- Maintainability of these tools also becomes a problem when you have 100 connectors being used by several teams. Any change in the data warehouse in the schema needs to ensure that downstream connectors of the reverseETL process are not broken.
- Data growth is very organic and is not always uniform in one destination and knowledge is also shares across the data organisations
We at Aampe understand all these various architectures that all of our clients may be adopting, therefore we do not force you to adopt one single standard.
- You use a traditional CDP -> We will fetch data from the CDP and use it to ingest to our system
- You have a single data warehouse as source of truth -> We will hook up to the data warehouse and extract data from that.
- You have data at various different sources scattered -> Aampe will connect to all simultaneously and combine data from all.
We understand that data organisations are different and we can adhere to all your needs.
How do we make all this possible?
We essentially do a reverse “ELT”
Aampe extracts data from all or one of your data sources, and loads it into our system so that the customer doesn't need to do any of the transformations. We will do all the transformations on our end so that the integration effort on the company’s data team is as minimal as possible. Here is an architecture of the same: