ETL - Interview Question&Answers-2

ETL & Big Data Tester Interview Topics:
SQL Quaries( Null Values, Duplicate Records, Joins,Where,group by, count, NVL
Functions)
ETL Testing Concepts/ DataWare House Concepts( SCD Type1,2,3 Fact Dimension
Tables, Star Schema, Snow Flake Schema)
ETL Pre Validation / Post
Data Ware House Project Workflow. Validations(Before/After Transformation)
Data Validations:
1st Level : Row Count, null Values, Duplicates
2nd Level: Source, Target Validations
3rd Level(Data MOdel Level): Column Size, Date Time Size
BIG Data Project Work Flow
HIVE Quaries
Azure Cloud Testing
Azure Devops Test Plans(Epic, Bug, User Story, Task)
JIRA Tool Whole Procedure ( How to raise an bug JIRA)
Bug Life Cycle/ Software Testing Life Cycle
Testing : Regression Testing, Integration Testing, Re Testing, load testing, Data Base
Testing, Functional Testing
Writing Test Cases/ Test Plans
ETL & Big Data Testing Interview Questions:
Important Questions:
1) Tell me about Your Self --- (You need to Explain years of exp, what kind of testing
you have exeprience, which kind of tools
you have hands on exeprience, & finally your Roles, Responsibilities as ETL Tester)
2) Explain About Project:

(Tell me about domain of the Project, What kind of modules it has, What kind of
Functionality it has)
3) Explain about ETL Project or BigData Data Warehouse Project Workflow

(Source --> Landing Phase ---> Staging Phase --> Data ware House)
Real Time Project/ Process oriented Questions:
Agile Methodology/ Sprits Related Interview Questions
1) who will schedule scrum meeting?

Ans) Scrum Master
2) what is sprint duration ?
Ans ) 2 Weeks
3What is Retro spective Meeting? who will attend this meeting ?
Ans) (Scrum Master & Scrum Team(Dev, Testers))
4) What kind of meetings is there in Agile Methodoly?

(6 types of Meetings)
(Kick of Meeting, Product Backlog Grooming Meeting, Sprint Planning
Meeting, Daily Scrum Meeting, and Sprint review Meeting, Sprint
Retrospective Meeting.)
5) As a ETL Tester in which kind of Meetings you are attending?
Ans) Daily Scrum Call/ Status Call, Sprint Retrospective Meeting, Sprint Planning
Meeting
12) Who is Product Owner?
Ans) Product Owner’ is a role defined in scrum methodologies, as any person who is part of the business
or key user team. The roles and responsibilities of the Product Owner include what features would be a
part of the product release. The Product Owner defines user stories based on customer requirements and
prioritizes them for the development team.
A product owner is responsible for ensuring the success of a project in Scrum. The product owner is
responsible for managing and optimizing the product backlog in order to maximize the value of the
product. A Scrum framework is an Agile methodology that facilitates communication and self-organization
within a team.
A Product Owner is part of the scrum team. The key responsibilities of a Product Owner are to define user
stories and create a product backlog. The Product Owner is the primary point of contact on behalf of the
customer to identify the product requirements for the development team.
JIRA Related Interview Questions:
1)What is User Story in JIRA Tool?

Ans) A user story is an informal, general explanation of a software feature written
from the perspective of the end user. Its purpose is to articulate how a software
feature will provide value to the customer.
2) Explain about Backlog in JIRA Tool (imp)
Ans) A product backlog is a prioritized list of work for the development team
that is derived from the roadmap and its requirements. The most important
items are shown at the top of the product backlog so the team knows what
to deliver first. The development team doesn't work through the backlog at
the product owner's pace and the product owner isn't pushing work to the
development team. Instead, the development team pulls work from the
product backlog as there is capacity for it, either continually (kanban) or by
iteration (scrum).
3) How will you raise a bug in JIRA Tool
Ans) Procedure
1. In an application in the Monitor view, select the issues that you want to submit as defects.
2. Click List menu > Create Defect
3. On the Create Defect page, select JIRA, complete the URL, Username, and Password fields, and

click Connect to test the connection to JIRA.
(OR)
How to create BUG:
- Choose Bug option from Issue Type dropdown menu

- Write a broad description for the team and responsible developer
- Add steps how to reproduce again
- Expected result
- Actual result
- Evidence (Screenshots, etc.)
- Link your related Test as Linked Issue
- Choose related developer as Assignee
- Defect ID
- Description
- Version of the application
- Date raised
- Severity
- Detected By
4)Explain in the Sprint what kind of phases will be there?
Ans) TO DO-->IN Progress -->Complete
5) Who will execute JOBs & what kind of Jobs

Ans) ETL Developers will execute Jobs,
Job is nothing but It’s a scheduled task internally it has Transformational Logic based
on Mapping Document requirement.
6) What is an Epic in JIRA Tool?

Ans) An epic is a large body of work that can be broken down into a number of smaller stories, or
sometimes called “Issues” in Jira. Epics often encompass multiple teams, on multiple projects, and can
even be tracked on multiple boards. Epics are almost always delivered over a set of sprints.
Epic: March 2050 Launch
Story: Update date Story: Promote Saturn

range to include Story: Reduce load time Summer Sale on confirm
March 2050 Launch for requested flight page for First Class
dates. listings to < 0.45 seconds bookings.
In Agile, an epic is simply a collection of user stories. These stories are related to one another
and combine to form one large story. Epics can work across different teams and projects, but
they will be united under a broad banner label, known as a theme. An initiative will group
similar epics under one common objective within an organization.
To avoid confusion, let’s summarize these definitions:
 User story: A single request
 Epic: A group of user stories
 Initiative: A group of epics
 Theme: A label for organizational g
Benefits of epics
 Better organization
 Improved time management
 Clear client priorities
7) Under Issue Type what kind of Options will be there in JIRA Tool?
Ans) Epic, User Stories, Tasks, Sub Tasks, Bug, Test
8) Where you execute Test Cases?
Ans ) In JIRA Tool we have apps called zephyr squad, Xray
XRAY
Project Scenario Related Interview Questions:
1) Who will give you Data Modelling Document in the Company?

Ans) Data Model Designers or Project Architect
2) What is Your Source Data in Your Project?
Ans) Project Sourece is Oracle Data Base files, CSV files, Flat files, .sas
files, Sales forces files, pdf files etc
3) What kind of data you are getting from your source? (Data Base files, Log Files,
CRM Files, Flat Files)
4) What is your Target System?
Ans) Oracle Data Base, or Snowflake Data Base, or AWS S3 bucket.
4) What kind of validations will you apply on Source and Target Systems?
Ans)Count Check, Data Quality Check, Meta Data Check, +ve ,-ve check,
Data Incompletenes check, Null check, consistency check.
5) How to Load data from Source to Target?
Ans) Steps Involved in Source to Target Mapping

 Step 1: Defining the Attributes
 Step 2: Mapping the Attributes
 Step 3: Transforming the Data
 Step 4: Testing the Mapping Process
 Step 5: Deploying the Mapping Process
 Step 6: Maintaining the Mapping Process
6) Who will give Mapping Document?

Ans) Business Analyst/Product Owner
7) What is MAPPING Document?
Ans) STTM (Source to Target Mapping) document

A document that contains the mapping of source system fields to the fields of a
target system. In a data warehouse, this will typically be a mapping from either
source files or database tables to the respective tables in your target data
warehouse / data mart.
The STTM document is the blueprint of your ETL solution. In addition to mapping
the fields from source to target, the STTM document also captures the following
important information:
 Loading frequency for the mapping described by the document.

 How source tables / files should be joined together in order to get the
desired source dataset.
 Data types of both the source and target fields.
 Any conversion logic applied to convert between data types.
 Any business rules that need to be applied.
 Any slowly changing dimension attributes and logic.
What is ETL Mapping Document :

The ETL mapping document contains the source,target and business rules information’s, this
document will be the most important document for the ETL developer to design and develop the
ETL jobs.
What should an ETL mapping document include:

A typical mapping document should contain the following information’s
1) Mapping indicator(Values A:ADD, D:Delete,C:Change)
2) Change description (Used to indicate mapping changes)
3) Key Indicator( Indicates whether the field is Primary key or not)
4) Source Table/File Name
5) Source Field Name
6) Source Field Data Type
7) Source Field Length
8) Source Field Description(The description will be used as a meta data for end user)
9)Business Rule
10)Target Table Name
11) Target Field Name
12) Target Data Type
13) Target Field Length
14) Comment
8) What kind of Data bases you are using in your Project?
Ans) Oracle 19C, My SQL, Micro Soft SQL Server, Postgres SQL Server.
9) Explain about Defect Life Cycle/Bug Life Cycle

Ans) Defect Life Cycle or Bug Life Cycle in software testing is the specific set of states
that defect or bug goes through in its entire life. The purpose of Defect life cycle is to easily
coordinate and communicate current status of defect which changes to various assignees
and make the defect fixing process systematic and efficient.
Defect Status or Bug Status in defect life cycle is the present state from which the defect or a
bug is currently undergoing. The goal of defect status is to precisely convey the current state
or progress of a defect or bug in order to better track and understand the actual progress of
the defect life cycle.
The number of states that a defect goes through varies from project to project. Below lifecycle
diagram, covers all possible states
 New: When a new defect is logged and posted for the first time. It is assigned a status
as NEW.
 Assigned: Once the bug is posted by the tester, the lead of the tester approves the bug
and assigns the bug to the developer team
 Open: The developer starts analyzing and works on the defect fix
 Fixed: When a developer makes a necessary code change and verifies the change, he
or she can make bug status as “Fixed.”
 Pending retest: Once the defect is fixed the developer gives a particular code for
retesting the code to the tester. Since the software testing remains pending from the
testers end, the status assigned is “pending retest.”
 Retest: Tester does the retesting of the code at this stage to check whether the defect is
fixed by the developer or not and changes the status to “Re-test.”
 Verified: The tester re-tests the bug after it got fixed by the developer. If there is no bug
detected in the software, then the bug is fixed and the status assigned is “verified.”
 Reopen: If the bug persists even after the developer has fixed the bug, the tester
changes the status to “reopened”. Once again the bug goes through the life cycle.
 Closed: If the bug is no longer exists then tester assigns the status “Closed.”
 Duplicate: If the defect is repeated twice or the defect corresponds to the same concept
of the bug, the status is changed to “duplicate.”
 Rejected: If the developer feels the defect is not a genuine defect then it changes the
defect to “rejected.”
 Deferred: If the present bug is not of a prime priority and if it is expected to get fixed in
the next release, then status “Deferred” is assigned to such bugs
 Not a bug:If it does not affect the functionality of the application then the status
assigned to a bug is “Not a bug”.
10) Where you will execute your Test Cases?
Ans) In JIRA Tool by using Zypher Sqaud or XRAY
11) Who will execute Test Cases?
Ans) You or Team Lead
12) Explain about left, right join and Types of Join
13) What is your Project Source & Where it is Located
Ans)
14) What is the Job, who will execute the JOb?
15) What is your company, where it is located?
16) To whom to assign the JOB?
17) What is Your Source Data and what type of Data it is?
18) Explain the Architecture of ETL/DWH Project
19) How will you validate your Source data and where will you validate that?
20) Where you will execute query in your Target System?
21) Which kind of Validations you did in your Project? And In Which area you execute
those validation (Platform: Snowflake /HIVE Query Editor/ Informatica Tool/ Oracle )?
22) What type of Databases you are using in your Project?
SQL Interview Questions
1) Write A Query(WAQ) for get duplicate records?

2)WQA to get unique records but not duplicates?
3) diff between union and union all? which one is fast
4)What is Partition and buckting
5) how to write join quaries.
6) how to even records?
7) what is NVL functions? in which cases you are using?
8) which one is Fatest union/union all
9) WAQ to get max salary
10)how to fetch null values without using null values
Select emp_id, emp_name from employee where emp_id IS NOT NULL
11)How to Hide the columns which are not requested in for Testing?(Views)
12)WAQ for RANK, Row Number, Dense Rank
Scenarios : On Joins
Scenarios : group by
Scenarios:duplicates
Scenarios: Null Values
13) Difference between Rank and Dense Rank
14) Write a Query to Join the tables, Types of Joins
15)What are the Constrains
16)How to Identify Duplicate Values
17) WAQ to get last 500 records out of 1000 records.
18) find out duplicate records
sel
Big Data Related Questions:

1) What is Map Reduce
2) What is HIVE
3)How the Data will load into Big Data Environment?
4) What is Sqoop
5) What is HDFS
6) Have you worked with Streaming Data?
7) Have you worked with Spark?
ETL & Data Ware House Questions:

1) Expain Slowly Changing Dimension(SCD), Types of SCD’s & explain with
Examples
2) What is OLTP and OLAP and thier differences
What is OLAP?
Online Analytical Processing, a category of software tools which provide analysis of data for
business decisions. OLAP systems allow users to analyze database information from multiple
database systems at one time.
The primary objective is data analysis and not data processing.
What is OLTP?
Online transaction processing shortly known as OLTP supports transaction-oriented
applications in a 3-tier architecture. OLTP administers day to day transaction of an
organization.
The primary objective is data processing and not data analysis
Example of OLAP
Any Datawarehouse system is an OLAP system. Uses of OLAP are as follows
 A company might compare their mobile phone sales in September with sales in October,
then compare those results with another location which may be stored in a sperate
database.
 Amazon analyzes purchases by its customers to come up with a personalized
homepage with products which likely interest to their customer.
Example of OLTP system

An example of OLTP system is ATM center. Assume that a couple has a joint account with a
bank. One day both simultaneously reach different ATM centers at precisely the same time
and want to withdraw total amount present in their bank account.
However, the person that completes authentication process first will be able to get money. In
this case, OLTP system makes sure that withdrawn amount will be never more than the
amount present in the bank. The key to note here is that OLTP systems are optimized
for transactional superiority instead data analysis.
Other examples of OLTP applications are:
 Online banking
 Online airline ticket booking
 Sending a text message
 Order entry
 Add a book to shopping cart
Key Difference between OLTP and OLAP

 Online Analytical Processing (OLAP) is a category of software tools that analyze data
stored in a database, whereas Online transaction processing (OLTP) supports
transaction-oriented applications in a 3-tier architecture.
 OLAP creates a single platform for all types of business analysis needs which includes
planning, budgeting, forecasting, and analysis, while OLTP is useful for administering
day-to-day transactions of an organization.
 OLAP is characterized by a large volume of data, while OLTP is characterized by large
numbers of short online transactions.
 In OLAP, a data warehouse is created uniquely so that it can integrate different data
sources for building a consolidated database, whereas OLTP uses traditional DBMS.

Difference between OLTP and OLAP

Below is the difference between OLAP and OLTP in Data Warehouse:
Parameters OLTP OLAP

It is an online transactional system. It OLAP is an online analysis and data retrievi
Process
manages database modification. process.
It is characterized by large numbers
Characteristic It is characterized by a large volume of data
of short online transactions.
OLTP is an online database OLAP is an online database query managem
Functionality
modifying system. system.
Method OLTP uses traditional DBMS. OLAP uses the data warehouse.
Insert, Update, and Delete
Query Mostly select operations
information from the database.
Tables in OLTP database are
Table Tables in OLAP database are not normalize
normalized.
OLTP and its transactions are the Different OLTP databases become the sour
Source
sources of data. data for OLAP.
Parameters OLTP OLAP
OLTP database must maintain data OLAP database does not get frequently mo
Data Integrity
integrity constraint. Hence, data integrity is not an issue.
Response time It’s response time is in millisecond. Response time in seconds to minutes.
The data in the OLTP database is The data in OLAP process might not be
Data quality
always detailed and organized. organized.
It helps to control and run It helps with planning, problem-solving, and
Usefulness
fundamental business tasks. decision support.
Operation Allow read/write operations. Only read and rarely write.
Audience It is a market orientated process. It is a customer orientated process.
Queries in this process are
Query Type Complex queries involving aggregations.
standardized and simple.
Complete backup of the data OLAP only need a backup from time to time
Back-up
combined with incremental backups. Backup is not important compared to OLTP
DB design is application oriented.
DB design is subject oriented. Example:
Example: Database design changes
Design Database design changes with subjects like
with industry like Retail, Airline,
sales, marketing, purchasing, etc.
Banking, etc.
It is used by Data critical users like
Used by Data knowledge users like workers
User type clerk, DBA & Data Base
managers, and CEO.
professionals.
Designed for real time business Designed for analysis of business measures
Purpose
operations. category and attributes.
Performance Transaction throughput is the
Query throughput is the performance metric
metric performance metric
Number of This kind of Database users allows This kind of Database allows only hundreds
users thousands of users. users.
It helps to Increase user’s self-service Help to Increase productivity of the business
Productivity
and productivity analysts.
An OLAP cube is not an open SQL server d
Data Warehouses historically have
warehouse. Therefore, technical knowledge
Challenge been a development project which
experience is essential to manage the OLAP
may prove costly to build.
server.
It provides fast result for daily used It ensures that response to the query is quic
Process
data. consistently.
It lets the user create a view with the help o
Characteristic It is easy to create and maintain.
spreadsheet.
OLTP is designed to have fast A data warehouse is created uniquely so tha
Style response time, low data redundancy can integrate different data sources for build
and is normalized. consolidated database
BASIS FOR
OLTP OLAP
COMPARISON
Basic It is an online It is an online data retrieving
transactional system and and data analysis system.
manages database
modification.
Focus Insert, Update, Delete Extract data for analyzing
information from the that helps in decision
database. making.
Data OLTP and its transactions Different OLTPs database
are the original source of becomes the source of data
data. for OLAP.
Transaction OLTP has short OLAP has long transactions.
transactions.
Time The processing time of a The processing time of a
transaction is transaction is comparatively
comparatively less in more in OLAP.
OLTP.
Queries Simpler queries. Complex queries.

BASIS FOR
OLTP OLAP
COMPARISON
Normalization Tables in OLTP database Tables in OLAP database
are normalized (3NF). are not normalized.
Integrity OLTP database must OLAP database does not get
maintain data integrity frequently modified. Hence,
constraint. data integrity is not affected.
3) What kind of Transformations applied in your project?

Transformation Description
Source Reads data from a source.

Target Writes data to a target.
Aggregator An active transformation that performs aggregate calculations on groups of data.
Cleanse A passive transformation that adds a cleanse asset that you created in
Data Quality
to a mapping or mapplet. Use a cleanse asset to standardize the form and content of your data.
Data Masking A passive transformation that masks sensitive data as realistic test data for nonproduction
environments.
Deduplicate An active transformation that adds a deduplicate asset that you created in
Data Quality
to a mapping or mapplet. Use a deduplicate asset to find instances of duplicate identities in a
data set and optionally to consolidate the duplicates into a single record.
Expression A passive transformation that performs calculations on individual rows of data.
Filter An active transformation that filters data from the data flow.
Hierarchy Builder An active transformation that converts relational input into hierarchical output.
Hierarchy Parser A passive transformation that converts hierarchical input into relational output.
Hierarchy An active transformation that converts hierarchical input into relational output, or relational input
Processor into hierarchical output, or hierarchical output into hierarchical output of a different schema, or
hierarchical input into denormalized flattened output.
Input A passive transformation that passes data into a mapplet. Can be used in a mapplet, but not in a
mapping.
Java Executes user logic coded in Java.
Can be active or passive.
Joiner An active transformation that joins data from two sources.
Labeler A passive transformation that adds a labeler asset that you created in
Data Quality
to a mapping or mapplet. Use a labeler asset to identify the types of information in an input field
and to assign labels for each type to the data.
Lookup Looks up data from a lookup object. Defines the lookup object and lookup connection. Also
defines the lookup condition and the return values.
A passive lookup transformation returns one row. An active lookup transformation returns more
than one row.
Machine Runs a machine learning model and returns predictions to the mapping.
Learning
Mapplet Inserts a mapplet into a mapping or another mapplet. A mapplet contains transformation logic
that you can create and use to transform data before it is loaded into the target.
Can be active or passive based on the transformation logic in the mapplet.
Normalizer An active transformation that processes data with multiple-occurring fields and returns a row for
each instance of the multiple-occurring data.
Output A passive transformation that passes data from a mapplet to a downstream transformation. Can
be used in a mapplet, but not in a mapping.
Parse A passive transformation that adds a parse asset that you created in
Data Quality
to a mapping or mapplet. Use a parse asset to parse the words or strings in an input field into
one or more discrete output fields based on the types of information that the words or strings
contain.
Python Runs Python code that defines transformation functionality. Can be active or passive.
Rank An active transformation that limits records to a top or bottom range.
Router An active transformation that you can use to apply a condition to incoming data.
Rule A passive transformation that adds a rule specification asset that you created in
Specification Data Quality
to a mapping or mapplet. Use a rule specification asset to apply the data requirements of a
business rule to a data set.
Sequence A passive transformation that generates a sequence of values.
Generator
Sorter A passive transformation that sorts data in ascending or descending order, according to a
specified sort condition.
SQL Calls a stored procedure or function or executes a query against a database.
Passive when it calls a stored procedure or function. Can be active or passive when it processes
a query.
Structure Parser A passive transformation that analyzes unstructured data from a flat file source and writes the
data in a structured format.
Transaction An active transformation that commits or rolls back sets of rows during a mapping run.
Control
Union An active transformation that merges data from multiple input groups into a single output group.
Velocity A passive transformation that executes a Velocity script to convert JSON or XML hierarchal input
from one format to another without flattening the data.
Verifier A passive transformation that adds a verifier asset that you created in
Data Quality
to a mapping or mapplet. Use a verifier asset to verify and enhance postal address data.
Web Services An active transformation that connects to a web service as a web service client to access,
transform, or deliver data.
3) what is Informatica?
Informatica is a data integration tool based on ETL architecture. It provides
data integration software and services for various businesses.
(OR)
Informatica is a data processing tool that is widely used for ETL to extract transform
and load processing.
The latest version of Informatica PowerCenter available is 9.6.0. The different editions for the
PowerCenter are
 Standard edition
 Advanced edition
 Premium edition
4) what is Normalisation?
o Ans) Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used
to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate anomalies
leads to data redundancy and can cause data integrity and other problems as the database grows.
Normalization consists of a series of guidelines that helps to guide you in creating a good database
structure.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to individual
relations. The relation is said to be in particular normal form if it satisfies constraints.
Following are the various types of Normal forms:
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
5) What is Redundency?
Ans) Redundancy means having multiple copies of same data in the database. This
problem arises when a database is not normalized.
Data redundancy refers to the practice of keeping data in two or more places within a
database or data storage system. Data redundancy ensures an organization can
provide continued operations or services in the event something happens to its data
-- for example, in the case of data corruption or data loss. The concept applies to
areas such as databases, computer memory and file storage systems.
6) What is Fact Table & Dimension Talbe?
7) What is difference between Snowflake schema and Star Schema?



 Star Schema:
Star schema is the type of multidimensional model which is used for data warehouse. In
star schema, The fact tables and the dimension tables are contained. In this schema fewer
foreign-key join is used. This schema forms a star with fact table and dimension tables.

 Snowflake Schema:
Snowflake Schema is also the type of multidimensional model which is used for data
warehouse. In snowflake schema, The fact tables, dimension tables as well as sub
dimension tables are contained. This schema forms a snowflake with fact tables,
dimension tables as well as sub-dimension tables.

S.NOStar Schema Snowflake Schema

In star schema, The fact While in snowflake schema, The fact
tables and the dimension tables, dimension tables as well as
1. tables are contained. sub dimension tables are contained.
Star schema is a top-down

2. model. While it is a bottom-up model.
Star schema uses more

3. space. While it uses less space.
It takes less time for the While it takes more time than star
4. execution of queries. schema for the execution of queries.
In star schema, While in this, Both normalization and

5. Normalization is not used. denormalization are used.
6. It’s design is very simple. While it’s design is complex.
While the query complexity of

The query complexity of star snowflake schema is higher than star
7. schema is low. schema.
It’s understanding is very

8. simple. While it’s understanding is difficult.
It has less number of While it has more number of foreign

9. foreign keys. keys.
It has high data

10. redundancy. While it has low data redundancy.
Key Difference Between Star Schema and Snowflake Schema

 The star schema is the simplest type of Data Warehouse schema. It is known as star
schema as its structure resembles a star.
 Comparing Snowflake vs Star schema, a Snowflake Schema is an extension of a Star
Schema, and it adds additional dimensions. It is called snowflake because its diagram
resembles a Snowflake.
 In a star schema, only single join defines the relationship between the fact table and any
dimension tables.
 Star schema contains a fact table surrounded by dimension tables.
 Snowflake schema is surrounded by dimension table which are in turn surrounded by
dimension table
 A snowflake schema requires many joins to fetch the data.
 Comparing Star vs Snowflake schema, Start schema has simple DB design, while
Snowflake schema has very complex DB design.
 What is a Star Schema?

 Star Schema in data warehouse, in which the center of the star can have one fact table
and a number of associated dimension tables. It is known as star schema as its
structure resembles a star. The Star Schema data model is the simplest type of Data
Warehouse schema. It is also known as Star Join Schema and is optimized for querying
large data sets.
 In the following Star Schema example, the fact table is at the center which contains keys
to every dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID &
other attributes like Units sold and revenue.
 Example of Star
Schema Diagram
 What is a Snowflake Schema?

 Snowflake Schema in data warehouse is a logical arrangement of tables in a
multidimensional database such that the ER diagram resembles a snowflake shape. A
Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions.
The dimension tables are normalized which splits data into additional tables.
 In the following Snowflake Schema example, Country is further normalized into an
individual table.

Difference between Star Schema and Snowflake Schema

Following is a key difference between Snowflake schema vs Star schema:
Star Schema Snowflake Schema

Hierarchies for the dimensions are stored in the
Hierarchies are divided into separate tables
dimensional table.
One fact table surrounded by dimension tab
It contains a fact table surrounded by dimension tables.
which are in turn surrounded by dimension
In a star schema, only single join creates the
A snowflake schema requires many joins to
relationship between the fact table and any dimension
the data.
tables.
Simple DB Design. Very Complex DB Design.
Denormalized Data structure and query also run faster. Normalized Data Structure.
High level of Data redundancy Very low-level data redundancy
Single Dimension table contains aggregated data. Data Split into different Dimension Tables.
Cube processing might be slow because of
Cube processing is faster.
complex join.
Offers higher performing queries using Star Join Query
The Snowflake schema is represented by
Optimization.
centralized fact table whi
Tables may be connected with multiple dimensions.
9) What is Surrogate Key
10) What is Primary Key, Foreign Key
11) What Look Up Table
Ans) Lookup tables are useful for replacing source data with the actual data that you want to
appear in the data warehouse.
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on
the target table (fact table / warehouse) based upon the primary key of the target, it just updates the
table by allowing only new records or updated records based on the lookup condition.
A lookup table is nothing but a 'lookup' it give values to referenced table (it is a reference), it is used at the run time,
Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid)
items in an array and, in some programming .
Lookup tables are a key-value store for faster lookups by key (e.g. use a lookup table to get all users who clicked on a
specific ad in a specific timeframe).
12) Explain about Data Ware House Achitectures
1)Centralized or Enterprise Architecture
1) Fedareated Architecture
2) Multi Tired Architecture)
12) What is Fact table & Dimension Table
Ans) In data warehousing, a fact table consists of the measurements, metrics or facts of a business process.
(or)
A fact table consists of facts of a particular business process
Facts are also known as measurements or metrics
Characteristics of Fact Table

Below are the characteristics of the fact table:
 Keys: It has a key or a primary key which is the accumulation of all the primary keys of all
dimension tables linked with it. That key is known as a concatenated key that helps to
uniquely identify the row.
 Fact Table Grain: The grain of a table depicts the level of the detail or the depth of the
information that is contained in that table. More the level, the more the efficiency of the
table.
 Additive Measures: Attributes present in this can be fully additive, non-additive or
semi-additive. Fully additive or additive measures are added to all the dimensions. Semi-
additive are those measures that are added to some of the dimensions and not to all the
dimensions and non-additive measures are stored fundamental units of measurement
for a business process.
 Sparse Data: There are records that have attributes containing null values or measures.
They provide no information.
 Shrunken Rollup Dimensions: Shrunken Rollup dimensions are the subdivisions of the
base dimension.
Types of Fact Table
It is categorized under three fundamental measurement events:
 Transactional
 Periodic Snapshot
 Accumulating Snapshots
Let us understand this based on the measurement in brief.
1. Transaction Fact Table

This is a fundamental and basic view of business operations. It is used to represent an
occurrence of an event at any instantaneous point in time. The facts measure are valid only for
that particular instant and only for that event. The grain which is associated with the
transaction table specifies as “one row per line in a transaction”. Usually, it contains the data of
the detailed level, which leads it to have a large number of dimensions associated with it. It
captures the measurement at the most basic or atomic level of dimension. This helps the table
to give robust dimensional grouping, roll up & drill-down reporting capabilities to the users. It
is dense and sparse. It can be large, maybe containing billions of records. Let us see an
example of sales in a grocery shop.
2. Snapshot Fact Table

The snapshot gives the state of things at a particular instance of time or “picture of the
moment”. It normally includes more non-additive and semi-additive facts. It helps to review
the cumulative performance of the business at regular and predictable intervals of time. In this,
the performance of an activity at the end of each day or a week or a month or any other time
interval is represented, unlike the transaction fact table where a new row is added for the
occurrence of every event. But snapshot fact tables or periodic snapshots are dependent on
the transaction fact table to get the detailed data present in the transaction fact table. The
periodic snapshot tables are mostly dense and can be large as transaction fact tables. Let us
see an example of the periodic snapshot of the sales of the same grocery shop as in the
transaction fact table.
3. Accumulating Fact Tables

These are used to represent the activity of any process that has a well defined and clear
starting and end. Accumulating snapshots mostly have multiple data stamps that represent the
predictable phases or events that occur during the lifetime. Sometimes there is an extra
column containing the date that shows when the row was last updated. Let us see an example.
13) Expain about types of Facts?
(Additive, Semi Additive, Non Additive)
Types of Facts
There are three types of facts:
1. Summative facts: Summative facts are used with aggregation functions such as sum (), average (), etc.
2. Semi summative facts: There are small numbers of quasi-summative fact aggregation functions that will
apply.
For example, consider bank account details. We also cannot also apply () for a bank balance which will not
have useful results, but the minimum() and maximum() functions return useful information.
3. Non-additive facts: We cannot use numerical aggregation functions such as sum (), average (), on non-
additive facts. For non-additive facts, ratio or percentage is used.
1. Addictive Type:
Measurements in a fact table that can be summed up across all dimensions
Consider the following retail fact table:

St
ore wise sales
Product-wise
sales
Daily Sales
2. Semi Addictive Fact:
Measurements in a fact table that can be summed up across only a few dimensions keys Following table is
used to record current balance and profit margin for each id at a particular instance of time (Day end)
In the above table, we cannot sum up current balance across Acct Id
If we ask balance for Id 21653 we will say that 22000, not 22000+80000
3. Non-addictive fact:
Facts that cannot be summed up across any dimension key
Note: % and ratio columns are non-addictive facts
4. Factless fact table:
A fact table without any measures is called the factless fact table
 It contains only keys
 It acts as a bridge between dimension keys
13) What is SCD ?

A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and
historical data over time in a data warehouse.
(or)
Slowly changing dimensions or SCD are dimensions that changes slowly over time, rather than
regular bases. In data warehouse environment, there may be a requirement to keep track of the
change in dimension values and are used to report historical data at any given point of time.
(OR)
A slowly changing dimension (SCD) is a dimension that is able to handle data
attributes which change over time.
14) Explain about SCD Type1 and Type2, Type3
SCD Types (High-Level)
ALLOWS PRESERVES
TYPE TITLE DESCRIPTION
UPDATES HISTORY
0 Retain Original ✘ ✘ Attributes never change (e.g. date of birth).
1 Overwrite ✔ ✘ Old values are overwritten with new values.
2 Add a new record ✔ ✔ Track changes by creating multiple records (e.g. ValidFrom
3 Add a new field ✔ ✔(*) Track changes using separate columns (e.g. CurrentValue,
What are the types of SCD?
Very simply, there are 6 types of Slowly Changing Dimension that are commonly used, they are as follows:
 Type 0 – Fixed Dimension

 No changes allowed, dimension never changes
 Type 1 – No History
 Update record directly, there is no record of historical values, only current
state
 Type 2 – Row Versioning
 Track changes as version records with current flag & active dates and other
metadata
 Type 3 – Previous Value column
 Track change to a specific attribute, add a column to show the previous
value, which is updated as further changes occur
 Type 4 – History Table
 Show current value in dimension table but track all changes in separate table
 Type 6 – Hybrid SCD
 Utilise techniques from SCD Types 1, 2 and 3 to track change
In reality, only types 0, 1 and 2 are widely used, with the others reserved for very specific
requirements. Confusingly, there is no SCD type 5 in commonly agreed definitions.
SCD Type Summary

Type 0 Ignore any changes and audit the changes.
Type 1 Overwrite the changes
Type 2 History will be added as a new row.
Type 3 History will be added as a new column.
Type 4 A new dimension will be added
Type 6 Combination of Type 2 and Type 3
15) Discuss briefly about SCD Type-1, SCD Type-2, SCD Type-3
Ans) SCD Type 0

Type 0 is a fixed dimension. The data in this dimension table never changes. The data into this
dimension table is loaded one time at the beginning of the project. An example for Type 0 is
business users data assigned to particular regions. These business users will never change their
location. So the data in this dimension never changes. The total sales done by each business user
can be generated.
SCD Type 1
SCD type 1 methodology is used when there is no need to store historical data in the dimension
table. This method overwrites the old data in the dimension table with the new data. It is used to
correct data errors in the dimension.
As an example, assume the customer table with the below data.
surrogate_key customer_id customer_name Location
------------------------------------------------
1 1 Mark Chicago
Here the customer Location is Chicago and the customer moved to another location New York. If
you use type1 method, it just simply overwrites the data. The data in the updated table will be.
surrogate_key customer_id customer_name Location
------------------------------------------------
1 1 Mark New York
The advantage of type1 is ease of maintenance and less space occupied. The disadvantage is that
there is no historical data kept in the data warehouse.
SCD Type 2
SCD type 2 stores the entire history the data in the dimension table. With type 2 we can store
unlimited history in the dimension table. In type 2, you can store the data in three different ways.
They are:
 Versioning
 Flagging
 Effective Date
SCD Type 2 Versioning

In versioning method, a sequence number is used to represent the change. The latest sequence
number always represents the current row and the previous sequence numbers represents the
past data.
As an example, let’s use the same example of customer who changes the location. Initially the
customer is in Illinois location and the data in dimension table will look as.
surrogate_key customer_id customer_name Location Version
--------------------------------------------------------
1 1 Marston Illinois 1
The customer moves from Illinois to Seattle and the version number will be incremented. The
dimension table will look as
surrogate_key customer_id customer_name Location Version
--------------------------------------------------------
2 1 Marston Seattle 2
Now again if the customer is moved to another location, a new record will be inserted into the
dimension table with the next version number.
SCD Type 2 Flagging
In flagging method, a flag column is created in the dimension table. The current record will have
the flag value as 1 and the previous records will have the flag as 0.
surrogate_key customer_id customer_name Location flag
--------------------------------------------------------
Now when the customer moves to a new location, the old records will be updated with flag value
as 0 and the latest record will have the flag value as 1.
surrogate_key customer_id customer_name Location flag
--------------------------------------------------------
2 1 Marston Seattle 1
SCD Type 2 Effective Date

In Effective Date method, the period of the change is tracked using the start_date and end_date
columns in the dimension table.
surrogate_key customer_id customer_name Location Start_date End_date
-------------------------------------------------------------------------
1 1 Marston Illinois 01-Mar-2010 20-Feb-2011
2 1 Marston Seattle 21-Feb-2011 NULL
The NULL in the End_Date indicates the current version of the data and the remaining records
indicate the past data.
SCD Type 3
In type 3 method, only the current status and previous status of the row is maintained in the table.
To track these changes two separate columns are created in the table. The customer dimension
table in the type 3 method will look as
surrogate_key customer_id customer_name Current_Location previous_location
--------------------------------------------------------------------------
1 1 Marston Illinois NULL
--------------------------------------------------------------------------
1 1 Marston Seattle Illinois
Now again if the customer moves from seattle to NewYork, then the updated table will be
--------------------------------------------------------------------------
1 1 Marston NewYork Seattle

The type 3 method will have limited history and it depends on the number of columns you create.
Testing Related Interview Questions:
1) What is Regression Testing

2) What is Retesting
3) Difference between retesting and Regression testing
4) What is Load and Incremental Testing
5) What Integration Testing
6) What is difference between Severity & Priority
7) Difference between use case, test scenario, test case.
8) What is Bug Life Cycle/Defect Life Cycle (****IMP)
9) What is Software test Life Cycle (STLC) (***IMP)
10) What is Test Traceability Matrix (T.R.M) (***IMP)
11) Explain about Test Case Design, Types of Test Cases & Components in Test
Cases (***IMP)
ETL Tester Responsibility/ Day to Day Work wise Interview Questions:

(ETL Project Process Oriented Questions)
1) Explain Test Strategy of ETL/Data Ware House (DWH) Project

1.1) DWH Project Objective
1.2) Test Planning
1.3) Test Execution
1.4) Scope
1.5) System Test Environment and Approach
1.6) Entry and Exit Criteria
2) Which kind of Process you are following to implement your DWH/ETL Project
2.1) Requirements Brain Storming Process
2.2) Design Review Process
2.3) Code Review Process
2.4) offshore Test Plan/Test Case Review Process
2.5) Onsite test Plan/Test Case Review
3) Explain How you been doing your Quality Assurance Process for your ETL/DWH
Project?
QA Process for ETL/ DWH Project:

3.1) Assignment Phase
3.2) Requirement Analysis & KT Phase
3.3) Estimation Phase
3.4) Test approach & Test Plan Phase
3.5) Test Scripts Preparation Phase
3.6) Configuration Check Phase
3.7) Testing Phase
3.8) Defect logging Phase
3.9) QA Sign off Phase
3.10) Post Production Validation Phase
3.11) QA Repository Phase
ETL Testing Challenges:
ETL testing is quite different from conventional testing. There are many challenges we
faced while performing data warehouse testing. Here is the list of few ETL testing
challenges I experienced on my project:
- Incompatible and duplicate data.
- Loss of data during ETL process.
- Unavailability of inclusive test bed.
- Testers have no privileges to execute ETL jobs by their own.
- Volume and complexity of data is very huge.
- Fault in business process and procedures.
- Trouble acquiring and building test data.
- Missing business flow information.
ETL Tester Responsibilities:
 Analysis of the specifications provided by the clients

 Understanding Logical flow of the Application from Business Knowledge
 Participate in reviews (Requirements, Design Documents, Data Model)
 Estimate the effort needed to design and execute the project
 Create and maintain the Test Strategy document
 Design the test cases as per Test Plan and User Specifications.
 Traceability of requirements to test cases

 Define entrance and exit criteria for code and data migration into the test and
production environment
 Creating Test Plan, Test Scenarios, Test Cases and Test Scripts to validate source and
Target data
 played a key role in the development of Teradata SQL queries while preparing test
case scripts.
 Creating, maintaining and executing Re-usable scripts
 executing the test scripts including prerequisites, detailed instructions and anticipated
results
Analysing the root cause of the failed Test Case and provide the related data samples
to client
 Responsible for reviewing of test plans for team members to ensure high quality
deliverables.
 Co-coordinating with the Client and getting proper updates from Onsite.
 Incorporate UAT Test cases in to testing
 Help with the analysis of issues and Data Reconciliation
 Onsite communication with coordinators and client.
 Create, compile, and provide Test Reports and Metrics
 UAT Support - Provide support to Business in during UAT
Scrum Team Testers Responsibilities Agile Scrum:-

1) Involve in sprint planning meeting along with Product Owner(PO), SHs, Scrum
master and Scrum Team Testers.
2) Study user stories and go to Test Design (Test Scenario, Test cases &
Test Case) and review.
3) Review sprint from Scrum team Developers.
4) Execute test cases (Real testing) on sprint to detect bugs (manually).
5) Report Bugs to Scum Team Developer and get modified Sprint.
6) Execute test cases on modified sprint to ensure Bug Fixing correctness
(Manually/automation).
7) Involve in Daily Scrum Meeting, Sprint review meeting, Sprint retrospective
Meeting, PBL Refriment meeting.
Types of Test Cases which will arise in ETL/BI/DWH Project:

There are four different scenarios in ETL project.
1. New table creation
2. Adding new columns to Existing Table
3. Removing columns from Existing Table
4. New View creation
5. Adding columns to Existing View
6. Removing columns from Existing view
1: Metadata Check Test case Description:
QA will validate whether all column names are matched as per PDM document or not.
Expected Result: Column names in table should match as per PDM document.
Actual Result: After test case execution we have to specify actual result
Source SQL: PDM Document will act as Source here
Target SQL: Show table tablename;
2: Data type Check Test case Description:

QA will validate whether all column data types are matched as per PDM document or not.
Expected Result: Data types in table should match as per PDM document.
3: Constraint Check Test case Description:

QA will validate whether all column contraints are matched as per PDM document or not.
Expected Result: Constraints in table should match as per PDM document.
4: Duplicate Check (Based on PK/UPI/Natural Key) Test case Description: QA will validate
was there any duplicates for PK/UPI columns
Expected Result: PK/UPI column should not contain any duplicates
Source SQL: Select Col1, Count (*) from tablename Group by Col1 Having Count (*)>1
Target SQL: Select Col1, Count (*) from tablename Group by Col1 Having Count (*)>1
5: Duplicate Check (Based on Surrogate Key) Test case Description:

QA will validate was there any duplicates for Surrogate key columns
Expected Result: Surrogate key column should not contain any duplicates
Source SQL: Select Col1, Count (*) from tablename Group by Col1 Having Count (*)>1
Target SQL: Select Col1, Count (*) from tablename Group by Col1 Having Count (*)>1
6: Null Check Test case Description:

QA will validate was there any columns which are having all values as null.
Expected Result: No column should not contain all values as Null. (If requirement is like that then
no issue)
Source SQL:Select Col1 from tablename where col1 is not null /*Like this need to check all cols*/
Target SQL:Select Col1 from tablename where col1 is not null /*Like this need to check all cols*/
7: Orphan Check Test case Description:

QA will validate was there any orphans in table.
Expected Result: There should not be any orphan records in table.
Source SQL: Select Col1 from ChildTablename MINUS Select Col1 from TargetTablename
Target SQL: Select Col1 from ChildTablename MINUS Select Col1 from TargetTablename
8: Date Check Test case Description:

QA will validate whether date format is proper or not for all columns.
Expected Result: Date format should be proper for all columns as per requirement doc.
Source SQL:
Target SQL:
9: Scenario Check:
We need to this check based on business knowledge. Example 1: Every policy should have only one
policy term as open record. We need to write test case to validate this scenario. Example 2: Load
timestamp should be lower than Update time stamp. We need to write test case to validate this
scenario.
10: Count Check Test case Description:

QA will validate whether count of records matching from source to target or not as per
requirements/transformation rules.
Expected Result: Count of records should match between source and target as per
requirement/transformation rule
Source SQL: Sel Count (*) from sourcetablename MINUS Sel Count (*) from targettablename
Target SQL: Sel Count (*) from targettablename MINUS Sel Count (*) from sourcetablename
11: Data Check Test case Description:

QA will validate whether Data matching from source to target or not as per
requirements/transformation rules.
Expected Result: Data should match between source and target as per requirement/transformation
rule
Source SQL: Sel Col1,Col2 from sourcetablename MINUS Sel Col1,Col2 from targettablename
Target SQL: Sel Col1,Col2 from targettablename MINUS Sel Col1,Col2 from sourcetablename
12: Positive/Negative Checks: QA will validate Positive/Negative checks based on Business

requirements and Business knowledge.
13: Regression Checks: QA will validate data by using regression scripts to ensure because of
adding/removing columns there should not any impact on any other columns.
14: No Truncation Check: Populating the full contents of each field to validate that no truncation
occurs at any step in the process. For example, if the source data field is a string (30) make sure to
test it with 30 characters
. 15: Data Quality :

• Number Check.
• If in the source format of numbering the columns are as xx_30 but if the target is only 30 then it
has to load not pre_fix(xx_) . We need to validate.
• Date Check
• They have to follow Date format and it should be same across all the records. Standard format :
YYYY_MM_DD
• Precision Check
• Precision value should display as expected in the target table.
• Example: In source 19.123456 but in the target it should display as 19.123 or round of 20.
• Data Check(Transformation Rule Check)
• Source SQL must include the transformation logic and complete data must be validated
By Target SQL
• Null Check
• Few columns should display ―Null‖ based on business requirement
• Check for Non Null Columns
Data completeness – Balancing counts between source and target

Referential Integrity – Orphan and foreign key checks
Data Quality – Selection Criteria: Validate that ETL correctly excludes, substitutes default values,
ignores and reports invalid data.
Data Integrity: Duplicate checks by surrogate key and natural key,
Column Null checks, date checks, and checks like rollups and aggregations
Data transformation – Ensures that all data is transformed correctly according to business rules
and/or design specifications.
Integration testing – Ensures that all the ETL processes functions well with together. This is
currently done in TAC
Regression testing – Ensures existing functionality remains intact with each release of new code
User-acceptance testing – Ensures the solution meets users' current expectations and anticipates
their future expectations.
Types of Test Cases:

Test case Detail
Referential Integrity Check Validate that there should not be
any Orphan
Null/Column Completeness Check Validate that the Column should
not be completely Null
Exclusion Criteria Check Validate if there is any exclusion
criteria provided on the target
table.
Date Checks Validate that the dates are valid
and correct format has been loaded
Duplicate Check based on natural Validate that there should not be
Key any duplicate based on natural key
Count Balancing Matches the Source and Target
table count
Data Integrity Check Validate the check constraints for
each column of target
Absolute Duplicate check Validate that there should not be
any duplicate based on the
surrogate key
Others Business Scenario Check Validate the related business
scenario
Transformation Rule check Validate the transformation rule on
each column of target table
Test Planning
a) Change scrum team in if required:
b) Identify Tactical Ricks
c) Prepare Test Plan-

1. Test Plan Doc ID:-
2. Introduction:-
3. Features :-
4. Features to be tested :-
5. Features not to be tested
6. Test Strategy:-
7. Test Environment:-
8. Test Deliverables:-
9. Entry Criteria :-
10. Suspension Criteria:-
11. Exit Criteria:-
12. Staff:-
13. Responsibilities:-
14. Schedule: -
15. Risk and Responsibilities:-
16. Approvals:-
Note: Please Share Interview Questions with me once you Complete Your Interview, I
will add those questions, It could be helpful for future candidates.

ETL - Interview Question&Answers-2

Uploaded by

Copyright:

Available Formats

ETL - Interview Question&Answers-2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ETL - Interview Question&Answers-2

Uploaded by

Copyright:

Available Formats

ETL & Big Data Tester Interview Topics:

ETL & Big Data Testing Interview Questions:

2) Explain About Project:

3) Explain about ETL Project or BigData Data Warehouse Project Workflow

Agile Methodology/ Sprits Related Interview Questions

1) who will schedule scrum meeting?

4) What kind of meetings is there in Agile Methodoly?

1)What is User Story in JIRA Tool?

2. Click List menu > Create Defect

3. On the Create Defect page, select JIRA, complete the URL, Username, and Password fields, and

How to create BUG:

- Choose Bug option from Issue Type dropdown menu

5) Who will execute JOBs & what kind of Jobs

6) What is an Epic in JIRA Tool?

Story: Update date Story: Promote Saturn

To avoid confusion, let’s summarize these definitions:

 User story: A single request

 Epic: A group of user stories

 Initiative: A group of epics

 Theme: A label for organizational g

1) Who will give you Data Modelling Document in the Company?

Ans) Steps Involved in Source to Target Mapping

6) Who will give Mapping Document?

7) What is MAPPING Document?

Ans) STTM (Source to Target Mapping) document

 Loading frequency for the mapping described by the document.

What is ETL Mapping Document :

What should an ETL mapping document include:

9) Explain about Defect Life Cycle/Bug Life Cycle

SQL Interview Questions

1) Write A Query(WAQ) for get duplicate records?

Select emp_id, emp_name from employee where emp_id IS NOT NULL

Big Data Related Questions:

ETL & Data Ware House Questions:

The primary objective is data analysis and not data processing.

The primary objective is data processing and not data analysis

Example of OLTP system

Key Difference between OLTP and OLAP

Difference between OLTP and OLAP

Parameters OLTP OLAP

Basic It is an online It is an online data retrieving

transactional system and and data analysis system.

Focus Insert, Update, Delete Extract data for analyzing

information from the that helps in decision

Data OLTP and its transactions Different OLTPs database

are the original source of becomes the source of data

data. for OLAP.

Transaction OLTP has short OLAP has long transactions.

Time The processing time of a The processing time of a

transaction is transaction is comparatively

comparatively less in more in OLAP.

Queries Simpler queries. Complex queries.

Normalization Tables in OLTP database Tables in OLAP database

are normalized (3NF). are not normalized.

Integrity OLTP database must OLAP database does not get

maintain data integrity frequently modified. Hence,

constraint. data integrity is not affected.

3) What kind of Transformations applied in your project?

Source Reads data from a source.