Wednesday, October 15, 2008

What should we know as a ETL Tester?

First of all we should know what does ETL stand for? ETL is Extract, Transfer, and Load. When the data is fed into the system in the front end, it goes to the database after batch processing. Then the data is transfered to the ETL system. As per the business transformation rule, the ETL developer/tester run the jobs in ETL. To run these jobs they use proprietary language of ETL's company/or/Unix. As per the business need jobs are created. Once we run the jobs, the data from source file get transfered to the target file. We might also need to use a lot of basic and advanced SQL to map and verify those data and see whether the data is transfered in accordance with the business transforamtion rule or not. Normally, we have to map data from source file to stage table; stage table to development database table; development database table to production database table.

  • Tool: Informatica / Business Object's Data Integrator (OR other tools in the market). Actually ETL tools are proprietary.
  • Should know how to test mappings
  • Should be familiar with at least one programming language or scripting language. As ETL languages are proprietary, it would be helpful to know some language in advance. Keep in mind that each ETL's proprietary language has interface to call SQL. Most of the ETL tools have interfaces to use the bulk loading/unloading tools associated with the major databases.
  • If we have Unix Platform, we should have knowledge of UNIX shell scripting
  • We should have a knowledge of SQL. SQL can be used to extract data in most of the cases.
  • Strong database (Oracle/SQL Server/Sybase whatever, one of them) knowledge
  • Knowledge of working with large data sets
  • Know how to analyse data quality
  • Generally ETL tester involve in validating the source database, target database, data extraction, data transformation and data loading
  • In addition, ETL tester creates UNIX shell scripts to access data and move data from production to development
  • Further, they test different transformations (rules) to move data to the Staging area, ODS and finally Target Data base using Informatica or Data Integrator Designer.
  • Should know how and where to use DDL and DML Statements
  • Should be able to test ETL code

Although, ETL tools are common in market today, many Data Warehouse projects (especially in the past ) did not use the external ETL tool. In those cases, load utility is used to extract the data and populate tables which gives the flat file structure. So, in the absense of ETL tool, its job of transform and load of the staging and production table is still possible by using Transact SQL or PL/SQL or DB2/SQL.

Thursday, October 2, 2008

Unix Platform

"vi is a visual editor (default) in Unix. Because of this we will be able to see the document we are editing. We can edit files really quickly on vi editor, as it is extremely economical with the keystrokes. Due to its different modes for inserting and issuing commands, it is much faster than most non-mode based editors." (ref:roxanne.org). So, vi editor is basically used to create a new file or edit the existing file.

To open a vi editor, first we have to go to the Unix shell (black screen).

Commands:
vi filename (press enter) - to open the vi editor with the given filename - we can see the filename at the bottom of the vi editor screen. At first, when we open a new file we can observe that blank lines started with tilda sign (~).
At the botoom of the screen; type :q! (press enter) - to quit and come out of the vi editor
Before typing in vi editor, we should know that there are 2 modes in vi, i.e. Insert Mode and Command Mode. In the Insert Mode, typed input is recognized only as text and goes into the file ie opened. In the command mode, whateve we typed is interpreted as command.
When we first open the file, it will be in command mode. To go back to insert mode we have to type 'i' or 'a'; and to go to command mode we have to hit 'Esc'.
To save (:w) anything we typed, we have to go back to the command mode (type 'Esc') to execute it. To save and exit at a time, type ':wq' and enter.
To go to the next line - type 'o', and it will take your cursor to the beginning of the 2nd line in order to ADD a NEW LINE. Uppercase 'O' will add a new line before our existing line, the line where the cursor is blinking.
To navigate between lines - to go up type 'j' to go down, 'k' to go up, 'l' to go right, 'h' to go left
'Shift ^' will take your cursor to the beginning of the line. [We Should Be in the Command Mode]
d - will delete the letter [We Should Be in the Command Mode]
dw - will delete the entire word (or, Shift + d) [We Should Be in the Command Mode]
dd - will delter the entire line [We Should Be in the Command Mode]
u - will undo the things [We Should Be in the Command Mode]
2 d - will delete 2 lines [We Should Be in the Command Mode]
In unix, yanking is kind of cut and paste. To do so, we have to be in command mode. To copy a line and paste it, we have put the cursor at the end of the line and type # (how many lines we want to paste) e.g. if we want to copy 3 lines from the line where the cursor is we have to type 3yy and go to the line where we want to paste it and type 'p'.

To insert a file - :r (filename)
To search forward - / e.g. if we have a number of word 'box' in the file and want to search 'box'; at the bottom of the screen we do forward slash /box
To search backward - ? e.g. ?box
To repeat the last search - ?? or //
To find the next occurance - n or next
To repeat the last search - N
To undo the changes in last command - u
To undo the changes in current line - U
To reload the current document without saving any changes - :e!
To put back the last nth deletion - "p
To go to the end of the file (say when we are at the beginning of the file) - just type press colon : and $
To go to the end of the line - just type $
To get back to the beginning of line - just type caret '^'
To replace any word in the line with another word - e.g. say the word 'budda' which is in 3rd line of the file has to be replaced by the word 'budhi'; then type => :3s/budda/budhi and ENTER (colon, s-substitute, / is forward slash for the word ahead of us substituted with the word budhi). If we want to do the same thing not for the particular line, then we have to use % instead of 3 (line#) and also need to add /g at the end to mention that it has to be acted globally. So, the command would be like :%s/budda/budhi/g and ENTER