The question of ETL vs ELT is a recurring one in the world of data analytics. Historically, data engineers provide data consumers with processed data ready for consumption in a data warehouse. The process of delivering such processed data to consumers can be thoughts of as Extract, Transform, and Load (ETL). This blog demonstrated hoe data consumer needs have cause a process change to Extract, Load and Transform (ELT). Further, we explore the reasons we need to make the shift to ELT.
A large retailer and their changing data needs
Let us consider a scenario. A large retailer gets business intelligence and data analytics insights from their existing Data Warehouse. The Data Warehouse extracts information from their POS (Point of ale) and ERP (Enterprise Resource Planning) systems. The information is transformed for consumption before being loaded into the Data Warehouse. Data engineers spend a lot of time getting to consumable data hosted in the Data Warehouse. Any changes or updates would require additional analysis, development, testing, and deployment time. For data consumers, this means they would have to wait for a couple of weeks before their request for data is available for consumption.
Let us now consider that the retailer’s marketing and customer service department wants to monitor Twitter, Facebook, and other online platforms to understand the customer landscape and positive or negative chatter about the company (social listening). The data from these online platforms is not in a pre-defined format, meaning the data is in a free format and is changing all the time.
Using the above example, we can review the three factors that have led to a change in thinking from Extract, Transform, and Load (ETL) to Extract, Load, and Transform ( ELT).
Data consumption requirements have changed
Our retailer must glean insights quickly from social media chatter. However, IT (Information technology) pre-defined process does not allow for consuming non-standard data formats. In addition, the velocity of data generated and data consumers (in our example, marketing and customer service department) expectations to get insights now. These change in data and demands has resulted in IT (Information Technology) to Extract and Load the data first and then worry about Transformation.
Data deluge leading to process optimization
Extraction of value at speed, from an ever-increasing volume and variety of data, requires the process to be efficient and effective, in short optimal. Social media sources have increased, and our retailer is plugged into all the different social media sources for optimal data points. The result is an increase in the variety of data sources, and an expectation to gain insights from the data sources. IT(Information Technology) takes the logical step of extract and load first and transform later
Speed of access leading to further optimization
Social media data have little shelf life, a typical social media feed loses value in a day. Data consumers, therefore, need to get access to data in near real-time. Real-time is a myth, so I am sticking to near real-time. This requirement of speed further validated the approach of Extract, Load, and Transform
Availability of Cheap Storage has accelerated the adoption of ELT
Cheap storage in form of Cloud Storage ( S3, Blob storage) is a blessing that has led to “Data Lakes” ( we will discuss this topic in our future blogs).
The confluence of the above factors has led to the shift from: