Parser Transformations
Parser Transformations
Parser Transformations
It is one of most important transformation used in IDQ. Parsing is the core function of
any data quality tool and IDQ provides rich parsing functionality to handle complex
patterns.
Parser transformation can be created in two mode
Token Based Parsing : It is used to parse strings that match token sets regular
expression or reference table based entries.We will use a simple example to create a
token based parser transformation.Suppose we have email id coming in a field in format
"Name@company.domain" and we want to parse this and store it in multiple fields
NAME COMPANY_NAME DOMAIN
Suppose we have input data coming as below
Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk
We will create a token based parser transformation having email id as input ,After
creating transformation go to properties and strategies tab and click on new
Token Based Parsing :It is used to parse strings that match token sets regular
expression or reference table based enteries.
We will use a simple example to create a token based parser transformation.Suppose
we have email id coming in a field in format "Name@company.domain" and we want to
parse this and store it in multiple fields
NAME
COMPANY_NAME
DOMAIN
Suppose we have input data coming as below
Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk
Step1 : We will create a token based transformation having email id as input ,After
creating transformation go to properties and strategies tab and click on new
Step3 : Select Regular expression (As we want to have multiple output port)
Step4) Select email parser or you can create your own regular expression to parse
different type of transformation
Step5) Create three output port and click on OK then finish
Below is output from Parser transformation Name ,company and email id parsed into
separate fields.
Pattern Based Parsing : Pattern based parsers are useful when working with data that
needs to be parsed apart or sorted and the data has a moderately high number of
patterns that are easily recognized.
Parser Based Transformation need to have output from Label Transformation which will
provide two outputs LabelData and Tokenised data
Suppose we have a field named as PATTERN_DATA in source which contains name
,empno and date in it and we need to parse into three seperate fields
Step1 ) We will first create a label transformation with delimiter as , and below
properties by creating new strategies
and create three new output port in port tab as shown below
Step3 ) In Pattern Tab define below (As per Label defined in Label)
You can preview Parser data broken in three fields NAME EMPNO DOB
==================================================
======================
Q1. EXPLAIN THE IDQ FUNCTIONALITY?
Use the IDQ to design and run processes to complete the following tasks:
Profile data : Profiling reveals the content and structure of data. Profiling is a key
step in any data project, as it can identify strengths and weaknesses in data and
help you define a project plan.
Parse data : Parsing reads a field composed of multiple values and creates a field
for each value according to the type of information it contains. Parsing can also add
information to records. For example, you can define a parsing operation to add units
of measurement to product data.
Validate postal addresses : Address validation evaluates and enhances the accuracy
and deliverability of postal address data. Address validation corrects errors in
addresses and completes partial addresses by comparing address records against
address reference data from national postal carriers. Address validation can also
add postal information that speeds mail delivery and reduces mail costs.
Create reference data tables : Informatica provides reference data that can enhance
several types of data quality process, including standardization and parsing. You can
create reference tables using data from profile results.
Create and run data quality rules : Informatica provides rules that you can run or
edit to meet your project objectives. You can create mapplets and validate them as
rules in the Developer tool.
Collaborate with Informatica users :The Model repository stores reference data and
rules, and this repository is available to users of the Developer tool and Analyst tool.
Users can collaborate on projects, and different users can take ownership of objects
at different stages of a project.
Q1 What is the difference between the Power Center Integration Service and the
Data Integration Service?
The Power Center Integration Service is an application service that runs sessions
and workflows.
The Data Integration Service is an application service that performs data integration
tasks for the Analyst tool,the Developer tool, and external clients.
The Analyst tool and the Developer tool send data integration task requests to the
Data Integration Service to preview or run data profiles, SQL data services, and
mappings.
Commands from the command line or an external client send data integration task
requests to the Data Integration Service to run SQL data services or web services.
Q2.What is the difference between the PowerCenter Repository Service and the
Model Repository Service?
The PowerCenter application services and PowerCenter application clients use the
PowerCenter Repository Service. The PowerCenter repository has folder-based
security.
The other application services, such as the Data Integration Service, Analyst
Service, Developer tool, and Analyst tool, use the Model Repository Service.
The Model Repository Service has project-based security.
You can migrate some Model repository objects to the PowerCenter repository.
You can validate a mapplet as a rule. A rule is business logic that defines conditions
applied to source data when you run a profile.
You can validate a mapplet as a rule when the mapplet meets the following
requirements:
Q6 What is the difference between the PowerCenter Repository Service and the
Model Repository Service?
The PowerCenter application services and PowerCenter application clients use the
PowerCenter Repository Service. The PowerCenter repository has folder-based
security.
The other application services, such as the Data Integration Service, Analyst
Service, Developer tool, and Analyst tool, use the Model Repository Service.