This report is brought to you by Info-Tech Research Group Inc. To learn more about Info-Tech, visit www.infotech.com.
Extract, transform and load (ETL) was considered the most effective way to load information into a data warehouse. Early data warehouses were not viewed as being capable of handling the extensive processing required to perform the complex transformations involved in the warehouse load process. Instead, third-party tools like IBM's WebSphere DataStage and Informatica Corp. were leveraged to orchestrate data movement between source systems and the data warehouse. With the advancement in both hardware and data warehouse software technology, warehouse designers can now consider extract, load and transform (ELT) a viable option.
Loading a data warehouse
Loading a data warehouse can be extremely intensive from a system resource perspective. In companies with data sets greater than 5 terabytes, load time can take as much as eight hours depending on the complexity of the transformation rules. Most data warehousing teams schedule load jobs to start after working hours so as not to affect performance when a user query is being executed. However, as data volumes and warehouse subject areas increase, load times can increase even further and spill over into regular working hours.
In the case of ETL, data is moved to an intermediate platform where the transformation rules are applied before passing the data along to the data warehouse. By contrast, ELT uses standard transfer mechanisms such as FTP to transfer the data directly to the data warehouse infrastructure. The transformation rules are then applied and the data warehouse tables are loaded.
This was first published in January 2008