DQ 100 GettingStartedGuide en
DQ 100 GettingStartedGuide en
DQ 100 GettingStartedGuide en
0)
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://
www.stlport.org/doc/ license.html, http://asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://
httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/
license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/licenseagreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html;
http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/
2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://
forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://
www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://
www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/
license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http://
www.jmock.org/license.html; http://xsom.java.net; http://benalman.com/about/license/; https://github.com/CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js;
http://www.h2database.com/html/license.html#summary; http://jsoncpp.sourceforge.net/LICENSE; http://jdbc.postgresql.org/license.html; http://
protobuf.googlecode.com/svn/trunk/src/google/protobuf/descriptor.proto; https://github.com/rantav/hector/blob/master/LICENSE; http://web.mit.edu/Kerberos/krb5current/doc/mitK5license.html; http://jibx.sourceforge.net/jibx-license.html; https://github.com/lyokato/libgeohash/blob/master/LICENSE; https://github.com/hjiang/jsonxx/
blob/master/LICENSE; https://code.google.com/p/lz4/; https://github.com/jedisct1/libsodium/blob/master/LICENSE; http://one-jar.sourceforge.net/index.php?
page=documents&file=license; https://github.com/EsotericSoftware/kryo/blob/master/license.txt; http://www.scala-lang.org/license.html; https://github.com/tinkerpop/
blueprints/blob/master/LICENSE.txt; http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html; https://aws.amazon.com/asl/; https://github.com/
twbs/bootstrap/blob/master/LICENSE; https://sourceforge.net/p/xmlunit/code/HEAD/tree/trunk/LICENSE.txt; https://github.com/documentcloud/underscore-contrib/blob/
master/LICENSE, and https://github.com/apache/hbase/blob/master/LICENSE.txt.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution
License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License
Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php), the new BSD License (http://opensource.org/
licenses/BSD-3-Clause), the MIT License (http://www.opensource.org/licenses/mit-license.php), the Artistic License (http://www.opensource.org/licenses/artisticlicense-1.0) and the Initial Developers Public License Version 1.0 (http://www.firebirdsql.org/en/initial-developer-s-public-license-version-1-0/).
This product includes software copyright 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this
software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab.
For further information please visit http://www.extreme.indiana.edu/.
This product includes software Copyright (c) 2013 Frank Balluffi and Markus Moeller. All rights reserved. Permissions and limitations regarding this software are subject
to terms of the MIT license.
See patents at https://www.informatica.com/legal/patents.html.
DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied
warranties of noninfringement, merchantability, or use for a particular purpose. Informatica LLC does not warrant that this software or documentation is error free. The
information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is
subject to change at any time without notice.
NOTICES
This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software
Corporation ("DataDirect") which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT
INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT
LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.
Part Number: DQ-GSG-10000-0001
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica My Support Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Product Availability Matrixes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Informatica Support YouTube Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Informatica Marketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Informatica Velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Table of Contents
Table of Contents
Table of Contents
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Table of Contents
Preface
The Data Quality Getting Started Guide is written for data quality developers and analysts. It provides
tutorials to help first-time users learn how to use Informatica Developer and Informatica Analyst. This guide
assumes that you have an understanding of data quality concepts, flat file and relational database concepts,
and the database engines in your environment.
Informatica Resources
Informatica My Support Portal
As an Informatica customer, the first step in reaching out to Informatica is through the Informatica My Support
Portal at https://mysupport.informatica.com. The My Support Portal is the largest online data integration
collaboration platform with over 100,000 Informatica customers and partners worldwide.
As a member, you can:
Search the Knowledge Base, find product documentation, access how-to documents, and watch support
videos.
Find your local Informatica User Group Network and collaborate with your peers.
Informatica Documentation
The Informatica Documentation team makes every effort to create accurate, usable documentation. If you
have questions, comments, or ideas about this documentation, contact the Informatica Documentation team
through email at infa_documentation@informatica.com. We will use your feedback to improve our
documentation. Let us know if we can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your
product, navigate to Product Documentation from https://mysupport.informatica.com.
Informatica Marketplace
The Informatica Marketplace is a forum where developers and partners can share solutions that augment,
extend, or enhance data integration implementations. By leveraging any of the hundreds of solutions
available on the Marketplace, you can improve your productivity and speed up time to implementation on
your projects. You can access Informatica Marketplace at http://www.informaticamarketplace.com.
Informatica Velocity
You can access Informatica Velocity at https://mysupport.informatica.com. Developed from the real-world
experience of hundreds of data management projects, Informatica Velocity represents the collective
knowledge of our consultants who have worked with organizations from around the world to plan, develop,
deploy, and maintain successful data management solutions. If you have questions, comments, or ideas
about Informatica Velocity, contact Informatica Professional Services at ips@informatica.com.
Preface
The telephone numbers for Informatica Global Customer Support are available from the Informatica web site
at http://www.informatica.com/us/services-and-training/support-services/global-support-centers/.
10
Preface
CHAPTER 1
Application clients. A group of clients that you use to access underlying Informatica functionality.
Application clients make requests to the Service Manager or application services.
Application services. A group of services that represent server-based functionality. An Informatica domain
can contain a subset of application services. You create and configure the application services that the
application clients require.
Application services include system services that can have a single instance in the domain. When you
create the domain, the system services are created for you. You can configure and enable a system
service to use the functionality that the service provides.
Profile warehouse. A relational database that the Data Integration Service uses to store profile results.
Repositories. A group of relational databases that store metadata about objects and processes required to
handle user requests from application clients.
Service Manager. A service that is built in to the domain to manage all domain operations. The Service
Manager runs the application services and performs domain functions including authentication,
authorization, and logging.
11
The following table lists the application clients, not including the Administrator tool, and the application
services and the repositories that the client requires:
Application Client
Application Services
Repositories
Data Analyzer
Reporting Service
Jaspersoft repository
Informatica Analyst
Analyst Service
Content Management Service
Data Integration Service
Model Repository Service
Search Service
Model repository
Informatica Developer
Analyst Service
Content Management Service
Data Integration Service
Model Repository Service
Model repository
Metadata Manager
PowerCenter Client
PowerCenter repository
PowerCenter repository
The following application services are not accessed by an Informatica application client:
12
PowerExchange Listener Service. Manages the PowerExchange Listener for bulk data movement and
change data capture. The PowerCenter Integration Service connects to the PowerExchange Listener
through the Listener Service.
PowerExchange Logger Service. Manages the PowerExchange Logger for Linux, UNIX, and Windows to
capture change data and write it to the PowerExchange Logger Log files. Change data can originate from
DB2 recovery logs, Oracle redo logs, a Microsoft SQL Server distribution database, or data sources on an
i5/OS or z/OS system.
SAP BW Service. Listens for RFC requests from SAP BI and requests that the PowerCenter Integration
Service run workflows to extract from or load to SAP BI.
Feature Availability
Informatica products use a common set of applications. The product features you can use depend on your
product license.
The following table describes the licensing options and the application features available with each option:
Licensing Option
Data Quality
Data Services
13
14
Overview. Click the Overview button to get an overview of data quality and data services solutions.
First Steps. Click the First Steps button to learn more about setting up the Developer tool and accessing
Informatica Data Quality and Informatica Data Services lessons.
Tutorials. Click the Tutorials button to see tutorial lessons for data quality and data services solutions.
Web Resources. Click the Web Resources button for a link to mysupport.informatica.com, where you can
access the Informatica How-To Library. The Informatica How-To Library contains articles about
Informatica Data Quality, Informatica Data Services, and other Informatica products.
What's New. Click the What's New button to view the latest features in the Developer tool.
Cheat Sheets
The Developer tool includes cheat sheets as part of the online help. A cheat sheet is a step-by-step guide
that helps you complete one or more tasks in the Developer tool.
When you complete a cheat sheet, you complete the tasks and see the results. For example, after you
complete a cheat sheet to import and preview a relational data object, you have imported a relational
database table and previewed the data in the Developer tool.
To access cheat sheets, click Help > Cheat Sheets.
Profile data. Profiling reveals the content and structure of your data. Profiling is a key step in any data
project as it can identify strengths and weaknesses in your data and help you define your project plan.
Create scorecards to review data quality. A scorecard is a graphical representation of the quality
measurements in a profile.
Standardize data values. Standardize data to remove errors and inconsistencies that you find when you
run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you
can ensure that the city, state, and ZIP code values are consistent.
Parse records. Parse data records to improve record structure and derive additional information from your
data. You can split a single field of freeform data into fields that contain different information types. You
can also add information to your records. For example, you can flag customer records as personal or
business customers.
Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of
your postal address data. Address validation corrects errors in addresses and completes partial
15
addresses by comparing address records against reference data from national postal carriers. Address
validation can also add postal information that speeds mail delivery and reduces mail costs.
Find duplicate records. Duplicate record analysis compares a set of records against each other to find
similar or matching values in selected data columns. You set the level of similarity that indicates a good
match between field values. You can also set the relative weight fixed to each column in match
calculations. For example, you can prioritize surname information over forename information.
Create and run data quality rules. Informatica provides pre-built rules that you can run or edit to suit your
project objectives. You can create rules in the Developer tool.
Collaborate with Informatica users. The rules and reference data tables you add to the Model repository
are available to users in the Developer tool and the Analyst tool. Users can collaborate on projects, and
different users can take ownership of objects at different stages of a project.
Export mappings to PowerCenter. You can export mappings to PowerCenter to reuse the metadata for
physical data integration or to create web services.
Examine the Boston and Los Angeles data for data quality issues.
Standardize address information across the Boston and Los Angeles data.
Validate the accuracy of the postal address information in the data for CRM purposes.
Lessons
Each lesson introduces concepts that will help you understand the tasks to complete in the lesson. The
lesson provides business requirements from the overall story. The objectives for the lesson outline the tasks
that you will complete to meet business requirements. Each lesson provides an estimated time for
completion. When you complete the tasks in the lesson, you can review the lesson summary.
If the environment within the tool is not configured, the first lesson in each tutorial helps you do so.
Tasks
The tasks provide step-by-step instructions. Complete all tasks in the order listed to complete the lesson.
16
Description
Product
Log in to the Analyst tool and create a project and folder for
the tutorial lessons.
Data Quality
Data Quality
Data Quality
Data Quality
Data Quality
Data Quality
Data Quality
Data Quality
Data Services
Data Services
Data Services
17
Profiling includes join analysis, a form of analysis that determines if a valid join is possible between two data
columns.
Tutorial Prerequisites
Before you can begin the tutorial lessons, the Informatica domain must be running with at least one node set
up.
The installer includes tutorial files that you will use to complete the lessons. You can find all the files in both
the client and server installations:
You can find the tutorial files in the following location in the Developer tool installation path:
<Informatica Installation Directory>\clients\DeveloperClient\Tutorials
You can find the tutorial files in the following location in the services installation path:
<Informatica Installation Directory>\server\Tutorials
18
All_Customers.csv
Boston_Customers.csv
LA_customers.csv
19
CHAPTER 2
Objectives
In this lesson, you complete the following tasks:
Create a project to store the assets that you create in the Analyst tool.
Prerequisites
Before you start this lesson, verify the following prerequisites:
20
An administrator has configured a Model Repository Service and an Analyst Service in the Administrator
tool.
You have the host name and port number for the Analyst tool.
You have a user name and password to access the Analyst Service. You can get this information from an
administrator.
Timing
Set aside 5 to 10 minutes to complete this lesson.
2.
3.
If the domain uses LDAP or native authentication, enter your user name and password on the login
page.
4.
5.
2.
3.
4.
Click OK.
21
2.
3.
4.
Click OK.
The folder appears under the tutorial project.
22
CHAPTER 3
Story
HypoStores keeps the Los Angeles customer data in flat files. HypoStores needs to profile and analyze the
data and perform data quality tasks.
Objectives
In this lesson, you complete the following tasks:
1.
Upload the flat file to the flat file cache location and create a data object.
2.
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have the LA_Customers.csv flat file. You can find this file in the <Installation Root Directory>
\<Release Version>\clients\DeveloperClient\Tutorials folder.
Timing
Set aside 5 to 10 minutes to complete this task.
23
In the Analyst tool, click New > Flat File Data Object.
The Add Flat File wizard appears.
2.
3.
4.
Click Next.
The Choose type of import panel displays Delimited and Fixed-width options. Select the Delimited
option. The default option is Delimited.
5.
Click Next.
6.
Under Specify the delimiters and text qualifiers used in your data, select Double quotes as a text
qualifier.
7.
Under Specify lines to import, select Import from first line to import column names from the first
nonblank line.
The Preview panel updates to show the column headings from the first row.
8.
Click Next.
The Column Attributes panel shows the datatype, precision, scale, and format for each column.
9.
Click Next.
The Name field displays LA_Customers.
10.
11.
Click Finish.
The data object appears in the folder contents for the Customers folder.
2.
3.
4.
In the Data Preview panel, review the structure and content of the LA_Customers data object.
The Analyst tool displays the first 100 rows of the flat file data object.
5.
Click Properties.
The Properties panel displays the name, type, description, and location of the data object. You can also
see the column names and column properties for the data object.
24
25
CHAPTER 4
Story
HypoStores wants to incorporate data from the newly-acquired Los Angeles office into its data warehouse.
Before the data can be incorporated into the data warehouse, it needs to be cleansed. You are the analyst
who is responsible for assessing the quality of the data and passing the information on to the developer who
is responsible for cleansing the data. You want to view the profile results quickly and get a basic idea of the
data quality.
Objectives
In this lesson, you complete the following tasks:
1.
Create and run a default profile for the LA_Customers flat file data object.
2.
Prerequisites
Before you start this lesson, verify the following prerequisite:
26
Timing
Set aside 5 to 10 minutes to complete this lesson.
2.
Click Next.
3.
In the Specify General Properties screen, enter a name and an optional description for the profile.
4.
In the Location field, select the project or folder where you want to create the profile. Click Next.
5.
In the Select Source screen, click Choose. Navigate to LA_Customers in the Choose Data Object
dialog box. Click OK.
6.
Click Next.
7.
In the Specify Settings screen, the following options are selected by default:
8.
Click Next.
9.
In the Specify Rules and Filters screen, click Save and Run to create and run the profile.
The Analyst tool creates the profile and the profile results appear in the summary view.
In the Library Navigator > Assets > Profiles pane, click LA_CustomeNavigatorrs profile.
The profile results appear in the summary view.
2.
In the summary view, click Columns in the Filter By pane to view the profile results for columns.
You can view the profile results based on the default filters. You can view all the profile results for the
profile by using the Columns and rules filter.
3.
Hover the mouse over the horizontal bar charts to view the values in percentages.
4.
In the Data Type and Data Domain sections, you can view all the inferred data types and documented
data type for a column when you hover the mouse over the values.
5.
Click Pattern outlier or Value frequency outlier filters to view the outliers in the profile results.
27
Note: You must run outliers explicitly to view the outlier data. Click Actions > Detect Outlier to run
outlier on the profile results.
6.
Click Name to view the profile results for this column in the detailed view.
28
Create a custom profile to exclude columns from the profile and only include the columns you are
interested in.
CHAPTER 5
Story
HypoStores needs to incorporate data from the newly-acquired Los Angeles office into its data warehouse.
HypoStores wants to access the quality of the customer tier data in the LA customer data file. You are the
analyst responsible for assessing the quality of the data and passing the information on to the developer
responsible for cleansing the data.
Objectives
In this lesson, you complete the following tasks:
1.
Create a custom profile for the flat file data object and exclude the columns with null values.
2.
Run the profile to analyze the content and structure of the CustomerTier column.
3.
29
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 5 to 10 minutes to complete this lesson.
2.
Click Next.
3.
In the Specify General Properties screen, you must set the following options:
Enter Profile_LA_Customers_Custom in the name field and an optional description for the profile.
4.
Click Next.
5.
6.
In the Choose Data Object dialog box, select LA_Customers. Click OK.
7.
In the Select Source screen, clear the Address2, Address3, and City2 columns.
8.
Click Next.
9.
10.
Verify that the Exclude approved data types and data domains from the data type and data domain
inference in the subsequent profile runs. option is selected. This setting excludes columns with an
approved data type from the data type inference of the next profile run.
11.
Click Next.
12.
In the Specify Rules and Filters screen, click Save and Finish to create the profile.
The Analyst tool creates the profile and displays the profile in the Discovery workspace. You need to
run the profile to view the results.
30
2.
3.
The profile screen appears where you can choose to edit the profile or run the profile. Click Run.
4.
Verify that you are in the summary view of the profile results for the Profile_LA_Customers_Custom
profile.
2.
31
3.
In the detailed view, select the Diamond, Ruby, Emerald, and Bronze values. Right-click on the values in
the Values pane, and select Drilldown.
The rows for the column with a value of Diamond, Ruby, Emerald, or Bronze appear in the Data Preview
panel.
The following image shows the drilldown results in the Data Preview pane when you drilldown on values
Diamond, Ruby, Emerald, or Bronze:
The Data Preview panel displays the first 100 rows for the selected column. The title of the Data
Preview panel shows the logic used for the source column.
32
CHAPTER 6
Story
HypoStores wants to incorporate data from the newly-acquired Los Angeles office into its data warehouse.
HypoStores wants to analyze the customer names and separate customer names into first name and last
name. HypoStores wants to use expression rules to parse a column that contains first and last names into
separate virtual columns and then profile the columns. HypoStores also wants to make the rules available to
other analysts who need to analyze the output of these rules.
Objectives
In this lesson, you complete the following tasks:
1.
Create expression rules to separate the FullName column into first name and last name columns. You
create a rule that separates the first name from the full name. You create another rule that separates the
last name from the first name. You create these rules for the Profile_LA_Customers_Custom profile.
2.
Run the profile and view the output of the rules in the profile.
33
3.
Edit the rules to make them usable for other Analyst tool users.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 10 to 15 minutes to complete this lesson.
2.
Click Edit.
The Profile wizard appears.
3.
4.
5.
6.
In the Expression section, enter the following expression to separate the first name from the Name
column:
7.
Click Validate.
8.
Click OK.
9.
Repeat steps 4 through 8 to create a rule named LastName. Enter the following expression to separate
the last name from the Name column:
SUBSTR(FullName,INSTR(FullName,' ',-1,1),LENGTH(FullName))
10.
2.
Click Edit.
The profile wizard appears
34
3.
4.
In the Select Source screen, select the check box next to Name on the toolbar to clear all columns.
You can see that one of the columns is selected by default because you need to select at least one
column in the Columns section.
5.
Select the FullName column and the FirstName and LastName rules.
6.
7.
8.
9.
10.
Click FirstName rule, the profile results for the rule appears in detailed view.
Select a value in the Values pane. Right-click on the value and click Drilldown.
The values for the FullName column and the FirstName and LastName rules appear in the Data Preview
panel along with other column values. Notice that the Analyst tool separates the FullName column into
first name and last name.
Click Edit in the summary view where the Profile_LA_Customers_Custom profile results appear.
The profile wizard appears.
2.
3.
In the Specify Rules and Filters screen, select the FirstName rule and click Actions > Edit Rule.
The Edit Rule dialog box appears.
4.
Select the Do you want to save this rule as a reusable rule? option, and choose a location to save the
rule.
5.
Click OK.
6.
7.
35
CHAPTER 7
Story
HypoStores wants to incorporate data from the newly-acquired Los Angeles office into its data warehouse.
Before the organization merges the data, they want to verify that the data in different customer tiers and
states is analyzed for data quality. You are the analyst who is responsible for monitoring the progress of
performing the data quality analysis. You want to create a scorecard from the customer tier and state profile
columns, configure thresholds for data quality, and view the score trend charts to determine how the scores
improve over time.
36
Objectives
In this lesson, you will complete the following tasks:
1.
Create a scorecard from the results of the Profile_LA_Customers_Custom profile to view the scores for
the CustomerTier and State columns.
2.
Run the scorecard to generate the scores for the CustomerTier and State columns.
3.
4.
Edit the scorecard to specify different valid values for the scores.
5.
6.
View score trend charts to determine how scores improve over time.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 15 minutes to complete the tasks in this lesson.
Verify that you are in the summary view of the Profile_LA_Customers_Custom profile results.
2.
Select the CustomerTier column, and right-click on the column, and select Add to > Scorecard.
The Add to Scorecard wizard appears. The New Scorecard option is selected by default.
3.
Click Next.
4.
In the Step 2 of 7 screen, enter sc_LA_Customer for the scorecard name, and navigate to the
Customers folder for the scorecard location.
5.
Click Next.
6.
7.
Click Next.
8.
In the Step 4 of 7 screen, you can create, edit, or delete filters for the metrics. In this tutorial, we will not
create a scorecard filter. Click Next.
9.
10.
In the Score using: Values pane, select all the values, and click the Add All button to move the values
to the Valid Values section.
Use the Shift key to select multiple values.
11.
In the Metrics section, select the State metric, and select those values that have two letter state codes
in the Score using: Values section.
12.
Click the Add button to move the values to the Valid Values section.
You can see the total number of valid values and valid value percentage at the top of the section.
37
13.
For each metric in the Metrics section, accept the default settings for the score thresholds in the Metric
Thresholds section.
14.
Click Next.
15.
Optionally, select a metric group to add the metrics. By default, the Analyst tool adds the metrics to the
Default metric group.
16.
Click Next.
17.
In the Default - Metrics pane, double-click the Weight column for the CustomerTier metric.
When you run a scorecard, the Analyst tool calculates the weighted average for each metric group based
on the metric score and weight you assign to each metric.
18.
19.
Verify that you are in the Scorecards workspace. You can see the scorecard sc_LA_Customer.
2.
3.
Select the State row that contains the State score you want to view.
In the sc_LA_Customer - metrics section, you can view the following properties of the scorecard:
38
Scorecard name.
Score trend. You can click on the score trend to view a graphical representation in the Trend Chart
Detail screen.
2.
Cost trend.
Data object. Click the data object to view the data preview of the data object in the Discovery
workspace.
Type of source.
Drilldown icon.
3.
Select Valid Rows to view the scores that are valid for the State column.
4.
Verify the you are in the Scorecard workspace, and the sc_LA_Customer scorecard is open.
2.
3.
4.
In the Score using: Values section, move Ruby from the Valid Values section to the Available Values
section.
Accept the default settings in the Metric Thresholds section.
5.
Click Save & Run to save the changes to the scorecard and run it.
6.
Verify the you are in the Scorecard workspace, and the sc_LA_Customer scorecard is open.
2.
3.
39
4.
In the Metric Thresholds section, enter the following ranges for the Good and Unacceptable scores: 90
to 100% Good; 0 to 50% Unacceptable; 51% to 89% Acceptable.
The thresholds represent the lower bounds of the acceptable and good ranges.
5.
Click Save & Run to save the changes to the scorecard and run it.
In the Scorecard panel, view the changes to the score percentage and the score displayed as a bar for
the State score.
Verify the you are in the Scorecard workspace, and the sc_LA_Customer scorecard is open.
2.
3.
Click Actions > Show Trend Chart, or click the arrow under the Score Trend column.
The Trend Chart Detail dialog box appears. You can view the Good, Acceptable, and Unacceptable
thresholds for the score. The thresholds change each time you run the scorecard after editing the values
for scores in the scorecard.
4.
Point to any circle in the chart to view the valid values in the Valid Values section at the bottom of the
chart.
5.
40
CHAPTER 8
Story
HypoStores wants to profile the data to uncover anomalies and standardize the data with valid values. You
are the analyst who is responsible for standardizing the valid values in the data. You want to create a
reference table based on valid values from profile columns.
Objectives
In this lesson, you complete the following tasks:
1.
Create a reference table from the CustomerTier column in the Profile_LA_Customers_Custom profile by
selecting valid values for columns.
2.
Edit the reference table to configure different valid values for columns.
Prerequisites
Before you start this lesson, verify the following prerequisite:
41
Timing
Set aside 15 minutes to complete the tasks in this lesson.
2.
3.
In the summary view, select the CustomerTier column that you want to add to the reference table. Rightclick and select Add to Reference Table.
The Add to Reference Table dialog box appears.
4.
5.
Click Next.
6.
7.
8.
Click Next.
9.
In the Column Attributes section, configure the following column properties for the CustomerTier
column:
Property
Description
Name
CustomerTier
Datatype
String
Precision
10
Description
10.
Optionally, choose to create a description column for rows in the reference table. Enter the name and
precision for the column.
11.
12.
Click Next.
The Reftab_CustomerTier_HypoStores reference table name appears. You can enter an optional
description.
13.
In the Save in section, select your tutorial project where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
14.
42
15.
Click Finish.
2.
3.
To edit a row, select the row and click Actions > Edit or click the Edit icon.
The Edit Row dialog box appears. Optionally, select multiple rows to add the same alternate value to
each row.
4.
Enter the following alternate values for the Diamond, Emerald, Gold, Silver, and Bronze rows: 1, 2, 3, 4,
5.
Enter an optional audit note.
5.
6.
Click Close.
The changed reference table values appear in the Design workspace.
43
CHAPTER 9
Story
HypoStores wants to standardize data with valid values. You are the analyst who is responsible for
standardizing the valid values in the data. You want to create a reference table to define standard customer
tier codes that reference the LA customer data. You can then share the reference table with a developer.
Objectives
In this lesson, you complete the following task:
Create a reference table using the reference table editor to define standard customer tier codes that
reference the LA customer data.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 10 minutes to complete the task in this lesson.
44
2.
3.
Click Next.
4.
For each column you want to include in the reference table, click the Add New Column icon and
configure the column properties for each column.
Add the following column names: CustomerID, CustomerTier, and Status. You can reorder the columns
or delete columns.
5.
6.
Click Next.
7.
8.
In the Folders section, select the Customers folder in the tutorial project.
9.
Click Finish.
The reference table appears in the Design workspace.
10.
From the Actions menu, select Add Row to populate each reference table column with the following
four values:
CustomerID = LA1, LA2, LA3, LA4
CustomerTier = 1, 2, 3, 4.
Status= Active, Inactive
45
46
CHAPTER 10
Objectives
In this lesson, you complete the following tasks:
Create a project to store the objects that you create in the Developer tool.
47
Prerequisites
Before you start this lesson, verify the following prerequisites:
You have a domain name, host name, and port number to connect to a domain. You can get this
information from a domain administrator.
A domain administrator has configured a Model Repository Service in the Administrator tool.
You have a user name and password to access the Model Repository Service. You can get this
information from a domain administrator.
Timing
Set aside 5 to 10 minutes to complete the tasks in this lesson.
2.
2.
3.
Click Add.
The New Domain dialog box appears.
48
4.
5.
Click Finish.
6.
Click OK.
2.
3.
Click OK.
4.
Click Next.
5.
6.
Select a namespace.
7.
Click Finish.
The Model repository appears in the Object Explorer view.
2.
3.
4.
Click Finish.
The project appears under the Model Repository Service in the Object Explorer view.
In the Object Explorer view, select the project that you want to add the folder to.
2.
3.
4.
Click Finish.
The Developer tool adds the folder under the project in the Object Explorer view. Expand the project to
see the folder.
49
2.
3.
4.
5.
6.
Click OK.
50
CHAPTER 11
Story
HypoStores Corporation stores customer data from the Los Angeles office and Boston office in flat files. You
want to work with this customer data in the Developer tool. To do this, you need to import each flat file as a
physical data object.
Objectives
In this lesson, you import flat files as physical data objects. You also set the source file directory so that the
Data Integration Service can read the source data from the correct directory.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 10 to 15 minutes to complete the tasks in this lesson.
51
2.
3.
Select Physical Data Objects > Flat File Data Objectand click Next.
The New Flat File Data Object dialog box appears.
4.
5.
6.
Click Open.
The wizard names the data object Boston_Customers.
7.
Click Next.
8.
Verify that the code page is MS Windows Latin 1 (ANSI), superset of Latin 1.
9.
10.
Click Next.
11.
12.
13.
Click Finish.
The Boston_Customers physical data object appears under Physical Data Objects in the tutorial
project.
14.
15.
16.
Set the Source File Directory to the following directory on the Data Integration Service machine:
<Informatica Installation Directory>\server\Tutorials
17.
2.
3.
Select Physical Data Objects > Flat File Data Object and click Next.
The New Flat File Data Object dialog box appears.
52
4.
5.
6.
Click Open.
The wizard names the data object LA_Customers.
7.
Click Next.
8.
Verify that the code page is MS Windows Latin 1 (ANSI), superset of Latin 1.
9.
10.
Click Next.
11.
12.
13.
Click Finish.
The LA_Customers physical data object appears under Physical Data Objects in the tutorial project.
14.
15.
16.
Set the Source File Directory to the following directory on the Data Integration Service machine:
<Informatica Installation Directory>\server\Tutorials
17.
2.
3.
Select Physical Data Objects > Flat File Data Object and click Next.
The New Flat File Data Source dialog box appears.
4.
5.
Click Browse and navigate to All_Customers.csv in the following directory: <Informatica Installation
Directory>\clients\DeveloperClient\Tutorials.
6.
Click Open.
The wizard names the data object All_Customers.
7.
Click Next.
8.
Verify that the code page is MS Windows Latin 1 (ANSI), superset of Latin 1.
9.
10.
Click Next.
11.
53
12.
13.
Click Finish.
The All_Customers physical data object appears under Physical Data Objects in the tutorial project.
14.
15.
16.
Set the Source File Directory to the following directory on the Data Integration Service machine:
<Informatica Installation Directory>\server\Tutorials
17.
54
CHAPTER 12
The number of unique and null values in each column, expressed as a number and a percentage.
The patterns of data in each column, and the frequencies with which these values occur.
Statistics about the column values, such as the maximum and minimum lengths of values and the first and
last values in each column.
For join analysis profiles, the degree of overlap between two data columns, displayed as a Venn diagram
and as a percentage value. Use join analysis profiles to identify possible problems with column join
conditions.
55
You can run a column profile at any stage in a project to measure data quality and to verify that changes to
the data meet your project objectives. You can run a column profile on a transformation in a mapping to
indicate the effect that the transformation will have on data.
Story
HypoStores wants to verify that customer data is free from errors, inconsistencies, and duplicate information.
Before HypoStores designs the processes to deliver the data quality objectives, it needs to measure the
quality of its source data files and confirm that the data is ready to process.
Objectives
In this lesson, you complete the following tasks:
Perform a join analysis on the Boston_Customers data source and the LA_Customers data source.
View the results of the join analysis to determine whether or not you can successfully merge data from the
two offices.
View the column profiling results to observe the values and patterns contained in the data.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Time Required
Select the tutorial folder and click File > New > Profile.
2.
3.
Click Next.
4.
5.
Click Finish.
The Tutorial_Profile profile appears in the Object Explorer.
6.
Drag the Boston_Customers and LA_Customers data sources to the editor on the right.
Tip: Hold down the Shift key to select multiple data objects.
7.
8.
9.
Verify that Boston_Customers and LA_Customers appear as data objects, and click Next.
10.
56
Scroll down the wizard pane to view the columns in both data sets.
Click Next.
11.
12.
13.
Double-click the first row in the left column and select CustomerID.
14.
Double-click the first row in the right column and select CustomerID.
15.
16.
If the Developer tools prompts you to save the changes, click Yes.
The Developer tool runs the profile.
Note: Do not close the profile. You view the profile results in the next task.
2.
3.
Verify that the Join Rows column shows zero as the number of rows that contain a join.
This value indicates that CustomerID fields do not have duplicates. You can successfully merge the two
data sources.
4.
To view the CustomerID values for the LA_Customers data object, double-click the circle named
LA_Customers in the Venn diagram.
Tip: Double-click the circles in the Venn diagram to view the data rows. If the circles intersect in the
Venn diagram, double-click the intersection to view data values common to both data sets.
The Data Viewer displays the CustomerID values from the LA_Customers data object.
In the Object Explorer view, browse to the data objects in your tutorial project.
2.
3.
4.
Select Profile.
5.
Click Next.
6.
57
7.
Click Finish.
The All_Customers profile opens in the editor and the profile runs.
Click Window > Show View > Progress to view the progress of the All_Customers profile.
The Progress view opens.
2.
When the Progress view reports that the All_Customers profile finishes running, click the Results view in
the editor.
3.
4.
5.
6.
In the Details section, click the Show list and select Patterns.
The Details section shows the patterns found in the OrderAmount column. The string 9(5) in the Pattern
column refers to records that contain five-figure order amounts. The string 9(4) refers to records
containing four-figure order amounts.
7.
8.
In the Details section, click the Show list and select Statistics.
The Details section shows statistics for the OrderAmount column including the average value, standard
deviation, maximum and minimum lengths, the five most common values, and the five least common
values.
58
You created the All_Customers profile and ran a column profile on the All_Customers data object. You
viewed the results of this profile to discover values, patterns, and statistics for columns in the All_Customers
data object. Finally, you ran the Data Viewer to view rows containing values and patterns that you selected,
enabling you to verify the quality of the data.
59
CHAPTER 13
Story
HypoStores wants the format of customer data files from the Los Angeles office to match the format of the
data files from the Boston office. The customer data from the Los Angeles office stores the customer name in
a FullName column, while the customer data from the Boston office stores the customer name in separate
FirstName and LastName columns. HypoStores needs to parse the Los Angeles FullName column data into
first names and last names so that the format of the Los Angeles data will match the format of the Boston
data.
60
Objectives
In this lesson, you complete the following tasks:
Create a mapping to parse the FullName column into separate FirstName and LastName columns.
Add the LA_Customers data object to the mapping to connect to the source data.
Add the LA_Customers_tgt data object to the mapping to create a target data object.
Add a Parser transformation to the mapping and configure it to use a token set to parse full names into
first names and last names.
Run a profile on the Parser transformation to review the data before you generate the target data source.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 20 minutes to complete the tasks in this lesson.
2.
Configure the read and write options for the data object, including file locations and file names.
3.
2.
3.
4.
5.
Click Open.
6.
7.
Click Next.
8.
Click Next.
61
9.
10.
In the Preview Options section, select Import column names from first line and click Next.
Click Finish.
The LA_Customers_tgt data object appears in the editor.
2.
3.
4.
5.
In the Value column, double-click the source file name and type LA_Customers_tgt.csv.
6.
7.
8.
9.
10.
11.
Right-click and select Paste to paste the directory location you copied from the Read view.
12.
In the Value column, double-click the Header options entry and choose Output Field Names.
13.
In the Value column, double-click the Output file name entry and type LA_Customers_tgt.csv.
14.
In the Object Explorer view, browse to the data objects in your tutorial project.
2.
3.
4.
Select the FullName column and click the New button to add a column.
A column named FullName1 appears.
5.
Rename the column to Firstname. Click the Precision field and enter "30."
6.
Select the Firstname column and click the New button to add a column.
A column named FirstName1 appears.
62
7.
Rename the column to Lastname. Click the Precision field and enter "30."
8.
Create a mapping.
2.
3.
4.
Configure the Parser transformation to parse the source column containing the full customer name into
separate target columns containing the first name and last name.
2.
3.
4.
Click Finish.
The mapping opens in the editor.
In the Object Explorer view, browse to the data objects in your tutorial project.
2.
3.
4.
In the Object Explorer view, browse to the data objects in your tutorial project.
5.
6.
7.
Select the CustomerID, CustomerTier, and FullName ports in the LA_Customers data object. Drag the
ports to the CustomerID port in the LA_Customers_tgt data object.
Tip: Hold down the CTRL key to select multiple ports.
The ports of the LA_Customers data object connect to corresponding ports in the LA_Customers_tgt data
object.
63
2.
3.
4.
5.
Select the FullName port in the LA_Customers data object and drag the port to the Input group of the
Parser transformation.
The FullName port appears in the Parser transformation and is connected to the FullName port in the
data object.
2.
3.
4.
5.
6.
Click the selection arrow in the Inputs column, and choose the FullName port.
7.
8.
Click Next.
9.
Select the Parse using Token Set operation, and click Next.
10.
Select Fixed Token Sets (Single Output Only) and choose the Undefined token set.
11.
12.
In the Operation Outputs dialog box, change the output name to Undefined_Output.
13.
Click Finish.
14.
In the Parser transformation, click the Undefined_Output port and drag it to the FirstName port in the
LA_customers_tgt data object.
A connection appears between the ports.
15.
In the Parser transformation, click the OverflowField port and drag it to the LastName port in the
LA_customers_tgt data object.
A connection appears between the ports.
16.
64
2.
3.
In the editor, click the Results view to display the result of the profiling operation.
4.
Select the Undefined_output column to display information about the column in the Details section.
The values contained in the Undefined_output column appear in the Details section, along with
frequency and percentage statistics for each value.
5.
View the data and verify that only first names appear in the Undefined_output column.
2.
In the Object Explorer view, locate the LA_Customers_tgt data object in your tutorial project and double
click the data object.
The data object opens in the editor.
2.
3.
4.
Verify that the FirstName and LastName columns display correctly parsed data.
65
66
CHAPTER 14
Incorrect values
Use the Standardizer transformation to search for these values in data. You can choose one of the following
search operation types:
Text. Search for custom strings that you enter. Remove these strings or replace them with custom text.
Reference table. Search for strings contained in a reference table that you select. Remove these strings,
or replace them with reference table entries or custom text.
For example, you can configure the Standardizer transformation to standardize address data containing the
custom strings Street and St. using the replacement string ST. The Standardizer transformation replaces
the search terms with the term ST. and writes the result to a new data column.
Story
HypoStores needs to standardize its customer address data so that all addresses use terms consistently. The
address data in the All_Customers data object contains inconsistently formatted entries for common terms
such as Street, Boulevard, Avenue, Drive, and Park.
67
Objectives
In this lesson, you complete the following tasks:
Create a mapping to standardize the address terms Street, Boulevard, Avenue, Drive, and Park to a
consistent format.
Add the All_Customers data object to the mapping to connect to the source data.
Add the All_Customers_Stdz_tgt data object to the mapping to create a target data object.
Add a Standardizer transformation to the mapping and configure it to standardize the address terms.
Prerequisites
Before you start this lesson, verify the following prerequisite:
Timing
Set aside 15 minutes to complete this lesson.
2.
Configure the read and write options for the data object, including file locations and file names.
68
2.
3.
4.
5.
Click Open.
6.
7.
Click Next.
8.
Click Next.
9.
In the Preview Options section, select Import column names from first line and click Next.
10.
Click Finish.
The All_Customers_Stdz_tgt data object appears in the editor.
2.
3.
4.
5.
In the Value column, double-click the source file name and type All_Customers_Stdz_tgt.csv.
6.
7.
8.
9.
10.
11.
Right-click and select Paste to paste the directory location you copied from the Read view.
12.
In the Value column, double-click the Header options entry and choose Output Field Names.
13.
In the Value column, double-click the Output file name entry and type All_Customers_Stdz_tgt.csv.
14.
Create a mapping.
2.
3.
4.
Configure the Standardizer transformation to standardize common address terms to consistent formats.
2.
3.
69
4.
Click Finish.
The mapping opens in the editor.
In the Object Explorer view, browse to the data objects in your tutorial project.
2.
3.
4.
In the Object Explorer view, browse to the data objects in your tutorial project.
5.
6.
7.
Select all ports in the All_Customers data object. Drag the ports to the CustomerID port in the
All_Customers_Stdz_tgt data object.
Tip: Hold down the Shift key to select multiple ports. You might need to scroll down the list of ports to
select all of them.
The ports of the All_Customers data object connect to corresponding ports in the
All_Customers_Stdz_tgt data object.
2.
3.
4.
To rename the Standardizer transformation, double-click the title bar of the transformation and type
AddressStandardizer.
5.
Select the Address1 port in the All_Customers data object, and drag the port to the Input group of the
Standardizer transformation.
A port named Address1 appears in the input group. The port connects to the Address1 port in the
All_Customers data object.
Note: You add an output port to the transformation when you configure a standardization strategy.
70
2.
3.
4.
5.
6.
Click the selection arrow in the Inputs column, and choose the Address1 input port.
The Outputs field shows Address1 as the output port.
7.
Select the character space and comma delimiters [\s] and [,]. Optionally, select the options to remove
trailing spaces.
8.
Click Next.
9.
10.
11.
Edit the Custom Strings and Replace With fields so that they contain the first pair of strings from the
following table:
Custom Strings
Replace With
STREET
ST.
BOULEVARD
BLVD.
AVENUE
AVE.
DRIVE
DR.
PARK
PK.
12.
Repeat steps 9 through 12 to define standardization operations for all strings in the table.
13.
Drag the Address1 output port to the Address1 port in the All_Customers_Stdz_tgt data object.
14.
2.
71
In the Object Explorer view, locate the All_Customers_Stdz_tgt data object in your tutorial project and
double-click the data object.
The data object opens in the editor.
2.
3.
4.
Verify that the Address1 column displays correctly standardized data. For example, all instances of the
string STREET should be replaced with the string ST.
72
CHAPTER 15
73
Story
HypoStores needs correct and complete address data to ensure that its direct mail campaigns and other
consumer mail items reach its customers. Correct and complete address data also reduces the cost of
mailing operations for the organization. In addition, HypoStores needs its customer data to include addresses
in a printable format that is flexible enough to include addresses of different lengths.
To meet these business requirements, the HypoStores ICC team creates an address validation mapping in
the Developer tool.
Objectives
In this lesson, you complete the following tasks:
Create a target data object that will contain the validated address fields and match codes.
Create a mapping with a source data object, a target data object, and an Address Validator
transformation.
Configure the Address Validator transformation to validate the address data of your customers.
Run the mapping to validate the address data, and review the match code outputs to verify the validity of
the address data.
Prerequisites
Before you start this lesson, verify the following prerequisites:
United States address reference data is installed in the domain and registered with the Administrator tool.
Contact your Informatica administrator to verify that United States address data is installed on your
system. The reference data installs through the Data Quality Content Installer.
Timing
Set aside 25 minutes to complete this lesson.
2.
Configure the read and write options for the data object, including the file locations and file names.
3.
Add ports to the data object to receive the match code values generated by the Address Validator
transformation.
2.
74
3.
Verify that Create from an existing flat file is selected. Click Browse next to this selection, find the
All_Customers.csv file, and click Open.
4.
5.
Click Next.
6.
Click Next.
7.
In the Preview Options section, select Import column names from first line and click Next.
8.
Click Finish.
The All_Customers_av_tgt data object appears in the editor.
2.
3.
4.
5.
In the Value column, double-click the source file name and type All_Customers_av_tgt.csv.
6.
In the Value column, double-click to highlight the source file directory path.
7.
8.
9.
10.
11.
Right-click this entry and select Paste to add the path you copied from the Read view.
12.
In the Value column, double-click the Header options entry and choose Output Field Names.
13.
In the Value column, double-click the Output file name entry and type All_Customers_av_tgt.csv.
14.
In the Object Explorer view, browse to the data objects in your tutorial project.
2.
3.
4.
Select the final port in the port list. This port is named MiscDate.
5.
Click New.
75
7.
8.
Click New.
A port named MailabilityScore1 appears.
9.
10.
2.
3.
2.
3.
4.
Click Finish.
The mapping opens in the editor.
In the Object Explorer view, browse to the data objects in your tutorial project.
2.
3.
76
4.
In the Object Explorer view, browse to the data objects in your tutorial project.
5.
Select the All_Customers_av_tgt data object and drag it onto the editor.
7.
Click Save.
2.
3.
2.
3.
4.
2.
3.
77
2.
3.
4.
Expand the Hybrid input port group and select the following ports:
Port Name
Description
Locality Complete 1
Postcode 1
Province 1
Country Name
Note: Hold the Ctrl key to select multiple ports in a single operation.
5.
On the toolbar above the port names list, click Add port to transformation.
This toolbar is visible when you select Templates.
The selected ports appear in the transformation in the mapping editor.
6.
78
Connect the source ports to the Address Validator transformation ports as follows:
Source Port
Address1
City
Locality Complete 1
ZIP
Postcode 1
State
Province 1
Country
Country Name
2.
3.
4.
Expand the Address Elements output port group and select the following port:
5.
Port Name
Description
Street Complete 1
Expand the Last Line Elements output port group and select the following ports:
Port Name
Description
Locality Complete 1
Postcode 1
Province Abbreviation 1
Note: Hold the Ctrl key to select multiple ports in a single operation.
6.
7.
8.
Expand the Country output port group and select the following port:
Port Name
Description
Country Name 1
Country name.
Expand the Status Info output port group and select the following ports:
Port Name
Description
Mailability Score
Match Code
Code that represents the degree of similarity between the input address and the
reference data.
On the toolbar above the port names list, click Add port to transformation.
This toolbar is visible when you select Templates.
79
9.
Connect the Address Validator transformation ports to the All_Customers_av_tgt ports as follows:
Address Validator Transformation Port
Target Port
Street Complete 1
Address1
Locality Complete 1
City
Postcode 1
ZIP
Province Abbreviation 1
State
Country Name 1
Country
Mailability Score
MailabilityScore
Match Code
MatchCode
Connect the unused ports on the data source to the ports with the same names on the data target.
2.
In the Object Explorer view, find the All_Customers_av_tgt data object in your tutorial project and
double click the data object.
The data object opens in the editor.
2.
80
3.
4.
Scroll across the mapping results so that the Mailability Score and Match Code columns are visible.
5.
6.
Description
A1
Address code lookup found a partial address or a complete address for the
input code.
A0
C4
C3
C2
Corrected, but the delivery status is unclear due to absent reference data.
C1
I4
I3
Data cannot be corrected completely, and there are multiple matches with
addresses in the reference data.
I2
I1
N7
Validation error. Validation did not take place because single-line validation is
not unlocked.
N6
Validation error. Validation did not take place because single-line validation is
not supported for the destination country.
N5
Validation error. Validation did not take place because the reference database
is out of date.
N4
Validation error. Validation did not take place because the reference data is
corrupt or badly formatted.
N3
Validation error. Validation did not take place because the country data cannot
be unlocked.
81
82
Code
Description
N2
Validation error. Validation did not take place because the required reference
database is not available.
N1
Validation error. Validation did not take place because the country is not
recognized or not supported.
Q3
Suggestion List mode. Address validation can retrieve one or more complete
addresses from the address reference data that correspond to the input
address.
Q2
Suggestion List mode. Address validation can combine the input address
elements and elements from the address reference data to create a complete
address.
Q1
Q0
RB
RA
R9
R8
R7
Country recognized from the country name, but the transformation identified
errors in the country data.
R6
R5
R4
R3
R2
R1
R0
S4
S3
S1
Parse mode. There was a parsing error due to an input format mismatch.
Code
Description
V4
Verified. The input data is correct. Address validation checked all postally
relevant elements, and inputs matched perfectly.
V3
Verified. The input data is correct, but some or all elements were standardized,
or the input contains outdated names or exonyms.
V2
Verified. The input data is correct, but some elements cannot be verified
because of incomplete reference data.
V1
Verified. The input data is correct, but user standardization has negatively
impacted deliverability. For example, the post code length is too short.
83
APPENDIX A
84
Mapping that moves data between sources and targets. This type of mapping differs from a
PowerCenter mapping only in that it cannot use shortcuts and does not use a source qualifier.
Logical data object mapping. A mapping in a logical data object model. A logical data object mapping
can contain a logical data object as the mapping input and a data object as the mapping output. Or, it
can contain one or more physical data objects as the mapping input and logical data object as the
mapping output.
Virtual table mapping. A mapping in an SQL data service. It contains a data object as the mapping
input and a virtual table as the mapping output.
Virtual stored procedure mapping. Defines a set of business logic in an SQL data service. It contains
an Input Parameter transformation or physical data object as the mapping input and an Output
Parameter transformation or physical data object as the mapping output.
What is the difference between a mapplet in PowerCenter and a mapplet in the Developer tool?
A mapplet in PowerCenter and in the Developer tool is a reusable object that contains a set of
transformations. You can reuse the transformation logic in multiple mappings.
A PowerCenter mapplet can contain source definitions or Input transformations as the mapplet input. It
must contain Output transformations as the mapplet output.
A Developer tool mapplet can contain data objects or Input transformations as the mapplet input. It can
contain data objects or Output transformations as the mapplet output. A mapping in the Developer tool
also includes the following features:
85
Index
profiling data
overview 55
I
importing physical data object
overview 51
86
R
reference tables
overview 44
S
setting up Analyst tool
overview 20
setting up Developer tool
overview 47