First of all we should know what does ETL stand for? ETL is Extract, Transfer, and Load. When the data is fed into the system in the front end, it goes to the database after batch processing. Then the data is transfered to the ETL system. As per the business transformation rule, the ETL developer/tester run the jobs in ETL. To run these jobs they use proprietary language of ETL's company/or/Unix. As per the business need jobs are created. Once we run the jobs, the data from source file get transfered to the target file. We might also need to use a lot of basic and advanced SQL to map and verify those data and see whether the data is transfered in accordance with the business transforamtion rule or not. Normally, we have to map data from source file to stage table; stage table to development database table; development database table to production database table.
- Tool: Informatica / Business Object's Data Integrator (OR other tools in the market). Actually ETL tools are proprietary.
- Should know how to test mappings
- Should be familiar with at least one programming language or scripting language. As ETL languages are proprietary, it would be helpful to know some language in advance. Keep in mind that each ETL's proprietary language has interface to call SQL. Most of the ETL tools have interfaces to use the bulk loading/unloading tools associated with the major databases.
- If we have Unix Platform, we should have knowledge of UNIX shell scripting
- We should have a knowledge of SQL. SQL can be used to extract data in most of the cases.
- Strong database (Oracle/SQL Server/Sybase whatever, one of them) knowledge
- Knowledge of working with large data sets
- Know how to analyse data quality
- Generally ETL tester involve in validating the source database, target database, data extraction, data transformation and data loading
- In addition, ETL tester creates UNIX shell scripts to access data and move data from production to development
- Further, they test different transformations (rules) to move data to the Staging area, ODS and finally Target Data base using Informatica or Data Integrator Designer.
- Should know how and where to use DDL and DML Statements
- Should be able to test ETL code
Although, ETL tools are common in market today, many Data Warehouse projects (especially in the past ) did not use the external ETL tool. In those cases, load utility is used to extract the data and populate tables which gives the flat file structure. So, in the absense of ETL tool, its job of transform and load of the staging and production table is still possible by using Transact SQL or PL/SQL or DB2/SQL.