Ds 42 Tutorial en PDF
Ds 42 Tutorial en PDF
Ds 42 Tutorial en PDF
Tutorial
© 2020 SAP SE or an SAP affiliate company. All rights reserved.
1 Documentation changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Tutorial
2 PUBLIC Content
4.9 Ensuring that the Job Server is running. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
4.10 Executing the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.11 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Tutorial
Content PUBLIC 3
8.1 Adding the SalesFact job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2 Creating the SalesFact data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.3 Defining the details of the Query transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
8.4 Using a lookup_ext function for order status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.5 Validating the SalesFact data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.6 Executing the SalesFact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.7 Viewing Impact and Lineage Analysis for the SALES_FACT target table. . . . . . . . . . . . . . . . . . . . . . .91
8.8 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Tutorial
4 PUBLIC Content
Adding the conditional. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Specifying the If-Then work flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.5 Creating the script that updates the status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.6 Verify the job setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
11.7 Executing the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129
11.8 Data Services automated recovery properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
11.9 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Tutorial
Content PUBLIC 5
13.5 Repopulating the material dimension table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Adding the material dimension job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Adding ABAP data flow to Material Dimension job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Defining the DF_SAP_MtrlDim ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Executing the JOB_SAP_MtrlDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
13.6 Repopulating the Sales Fact table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Adding the Sales Fact job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Adding ABAP data flow to Sales Fact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180
Defining the DF_ABAP_SalesFact ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Executing the JOB_SAP_SalesFact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.7 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Tutorial
6 PUBLIC Content
1 Documentation changes
The following table contains changes to the documentation, and the related SAP Data Services version in which
the changes were made. The list begins with the most recent changes.
Added the following topics: Standard topics in all Data Services 4.2 SP 9 Patch 1
documents.
● Welcome
● SAP information resources
Removed the following topics: Topics were under Product overview. 4.2 SP 8 Patch 2
Removed because the concepts were 4.2 SP 9
● System configurations
for advanced users.
● Windows and UNIX implementa
tion
Removed: Environment requirements Topic was under Preparation for this tu 4.2 SP 8 Patch 2
torial. Removed because the concept 4.2 SP 9
was for advanced users.
Removed the following topics: Topics were under Setting up for the tu 4.2 SP 8 Patch 2
torial. Removed because an administra 4.2 SP 9
● BI platform and the Central Man
tor should perform the tasks.
agement Server (CMS)
● Opening the Central Management
Console
● Installing SAP Data Services
● Verifying the Windows service
● Creating a new Data Services user
account
Tutorial
Documentation changes PUBLIC 7
Change Notes Version
Renamed the topic Setting up for the Topic is located under Preparation for 4.2 SP 8 Patch 2
tutorial to Tasks required to prepare for this tutorial. 4.2 SP 9
the tutorial.
Added the following topics to the Tuto These topics make it easier for readers 4.2 SP 8 Patch 2
rial: to find additional documents that we 4.2 SP 9
reference in other topics.
● SAP Information resources
● Accessing documentation from the
Web
● Documentation set for SAP Data
Services
Removed all Terminology topics These topics were not consistent, and 4.2 SP8 Patch 2
many terms were used in more than
one section.
Tutorial
8 PUBLIC Documentation changes
2 Introduction to the tutorial
This tutorial introduces you to the basic use of SAP Data Services Designer by explaining key concepts and
providing a series of related exercises and sample data.
Data Services Designer is a graphical user interface (GUI) development environment in which you extract,
transform, and load batch data from flat-file and relational database sources for use in a data warehouse. You
can also use Designer for real-time data extraction and integration.
The tutorial is for users experienced in many areas of database management, SQL, and Microsoft Windows.
The tutorial introduces core SAP Data Services Designer functionality. We wrote the tutorial assuming that you
have experience in some of the following areas:
After you complete this tutorial, you will be able to extract, transform, and load data from various source and
target types, and understand the concepts and features of SAP Data Services Designer.
You will know about the various Data Services objects such as datastores and transforms, and you will be able
to define a file format, import data, and analyze data results.
You will learn how to use Data Services Designer features and functions to do the following:
Tutorial
Introduction to the tutorial PUBLIC 9
2.3 Product overview
Data Services extracts, transforms, and loads (ETL) data from heterogeneous sources into a target database or
data warehouse. You specify data mappings and transformations by using Data Services Designer.
Data Services combines industry-leading data quality and integration into one platform. It transforms your
data in many ways. For example, it standardizes input data, adds additional address data, cleanses data, and
removes duplicate entries.
Data Services provides additional support for real time data movement and access. It performs predefined
operations in real time, as it receives information. The Data Services real time components also provide
services to Web applications and other client applications.
For a complete list of Data Services resources, see the Designer Guide.
Job Server Application that launches the Data Services processing engine and serves as an interface to the engine
and other components in the Data Services suite.
Engine Executes individual jobs that you define in the Designer to effectively accomplish the defined tasks.
Tutorial
10 PUBLIC Introduction to the tutorial
Component Description
Repository Database that stores Designer predefined system objects and user-defined objects including source and
target metadata and transformation rules. Create a local repository and then a central repository to
share objects with other users and for version control.
Access Server Passes messages between Web applications and the Data Services Job Server and engines. Provides a
reliable and scalable interface for request-response processing.
Administrator Web administrator that provides the following browser-based administration of Data Services resources:
The following diagram illustrates Data Services product components and relationships.
Tutorial
Introduction to the tutorial PUBLIC 11
Related Information
Use the many tools in SAP Data Services Designer to create objects, projects, data flows, and workflows to
process data.
The Designer interface contains key work areas that help you set up and run jobs. The following illustration
shows the key areas of the Designer user interface.
Tutorial
12 PUBLIC Introduction to the tutorial
Related Information
SAP Data Services objects are entities that you create, add, define, modify, or work with in the software.
Each Data Services object has similar characteristics for creating and configuring objects.
Characteristic Description
Properties Text that describes the object. For example, the name, de
scription, and creation date describes aspects of an object.
Attributes Properties that organize objects and make them easier for
you to find. For example, organize objects by attributes such
as object types.
The Designer contains a Local Object Library that is divided by tabs. Each tab is labeled with an object type.
Objects in a tab are listed in groups. For example, the Project tab groups projects by project name and further
by job names that exist in the project.
● Projects
● Jobs
● Workflows
● Data flows
● Transforms
● Datastores
● Formats
● Functions
Related Information
Tutorial
Introduction to the tutorial PUBLIC 13
2.3.3.1 Object hierarchy
Object relationships are hierarchical.
The highest object in the hierarchy is the project. The subordinate objects appear as nodes under a project. You
add subordinate objects to the project in a specific order. For example, A project contains jobs, jobs contain
workflows, and workflows contain data flows.
The following diagram shows the hierarchical relationships for the key object types within Data Services.
Tutorial
14 PUBLIC Introduction to the tutorial
Related Information
A project is the highest-level object in Designer hierarchy. Projects provide a way to organize the subordinate
objects, which are jobs, workflows, and data flows.
A project is open when you can view it in the project area. If you open a different project from the Project tab in
the object library, the project area closes the current project and shows the project that you just opened.
A work flow specifies the order in which SAP Data Services processes subordinate data flows.
Arrange the subordinate data flows under the work flow so that the output from one data flow is ready for input
to the intended data flow.
Tutorial
Introduction to the tutorial PUBLIC 15
A work flow is a reusable object. It executes only within a Job. Use work flows to:
The Data Services objects you can use to create work flows appear as icons on the tool palette to the right of
the workspace. If the object isn't applicable to what you have open in the workspace, the software disables the
icon. The following table contains the programming analogy of each object to describe the role the object plays
in the work flow.
Procedure
Workflow
Data flows process data in the order in which they are arranged in a work flow.
A data flow defines the basic task that Data Services accomplishes. The basic task is moving data from one or
more sources to one or more target tables or files.
You define data flows by identifying the sources from which to extract data, the transformations that the data
should undergo, and the targets.
Tutorial
16 PUBLIC Introduction to the tutorial
Use data flows to:
A data flow is a reusable object. It is always called from a work flow or a job.
A consistent naming convention for Data Services objects helps you easily identify objects listed in an object
hierarchy.
Datastore DS ODS_DS
Related Information
To delete an object, first decide whether to delete the object from the project or delete the object from the
repository.
When you delete an object from a project in the project area, the software removes the object from the project.
The object is still available in the object library and the repository.
Tutorial
Introduction to the tutorial PUBLIC 17
When you delete the object from the object library, the software deletes all occurrences of the object from the
repository. If the object is called in separate data flows, the software deletes the object from each data flow.
The deletion may adversly affect all related objects.
To protect you from deleting objects unintentionally, the software issues a notice before it deletes the object
from the repository. The notice states that the object is used in multiple locations, and it provides the following
options:
● Yes: Continues with the delete of the object from the repository.
● No: Discontinues the delete process.
● View Where Used: Displays a list of the related objects in which the object will be deleted.
The preparation may include some steps that your administrator has already completed. You may need to
contact your administrator for important connection information and access information related to those
tasks.
We have a complete documentation set for SAP Data Services available on our User Assistance Customer
Portal. If you are unclear about a process in the tutorial, or if you don't understand a concept, refer to the online
documentation at http://help.sap.com/bods.
Note
If your administrator has already completed these steps, you may be able to skip the tutorial set up section.
You must have sufficient user permission to perform the exercises in the tutorial. For information about
permissions, see the Administrator Guide.
Tutorial
18 PUBLIC Introduction to the tutorial
You or an administrator sets up your system for this tutorial. Instructions for administrator-only tasks are not
included in the tutorial. The following table lists each task and who performs the task.
Install Central Management Server (CMS) by installing either Administrator. More information in the Installation Guide.
the SAP BusinessObjects Business Intelligence platform (BI
platform) or the Information platform services platform (IPS
platform).
Install SAP Data Services Administrator. Steps are in the Installation Guide.
Create user account for tutorial participants. Administrator. Steps are in the Administrator Guide.
Create tutorial repository, source, and target databases You or a user who has permission to perform these tasks in
your RDBMS. Steps are in the tutorial.
Establish the tutorial repository as your local repository by Administrator or you, if you have sufficient permission.
using the Repository Manager, the Server Manager, and the Steps are in the tutorial.
Central Management Console (CMC)
Run the tutorial scripts to create source and target tables. Administrator or you. Steps are in the tutorial.
1. Creating repository, source, and target databases on an existing RDBMS [page 20]
Create the three databases using your preferred RDBMS.
2. Creating a local repository [page 21]
Use the repository database that you created earlier in your RDBMS to create a local repository.
3. Defining a job server and associating your repository [page 21]
Use the Data Services Server Manager to configure a new job server and associate the job server with
the local repository.
4. Configuring the local repository in the CMC [page 22]
To continue preparing the SAP Data Services local repository, you enter connection information in the
Central Management Console (CMC)
5. Running the provided SQL scripts [page 23]
Run the tutorial SQL scripts to create the sample source and target tables.
Related Information
Tutorial
Introduction to the tutorial PUBLIC 19
2.4.1.1 Creating repository, source, and target databases
on an existing RDBMS
An administrator, or a user with sufficient permissions to your RDBMS must perform these steps.
4. Grant access privileges for the user account. For example, grant connect and resource roles for Oracle.
5. Use the following table as a worksheet to note the connection names, database versions, user names, and
passwords for the three databases that you create. We refer you to this information in several of the
exercises in the tutorial.
Database version
User name
Password
Task overview: Tasks required to prepare for the tutorial [page 18]
Tutorial
20 PUBLIC Introduction to the tutorial
2.4.1.2 Creating a local repository
Use the repository database that you created earlier in your RDBMS to create a local repository.
1. Select Start Programs SAP Data Services 4.2 Data Services Repository Manager.
The remaining connection options are based on the database type you choose.
4. Enter the connection information for the RDBMS repository database that you created.
Use the information in the worksheet that you completed in Creating repository, source, and target
databases on an existing RDBMS [page 20].
5. Type repo for both User and Password.
6. Click Create.
The repository database that you created earlier is now your local repository.
Next, define a Job Server and associate the repository with it.
Task overview: Tasks required to prepare for the tutorial [page 18]
Previous task: Creating repository, source, and target databases on an existing RDBMS [page 20]
Next task: Defining a job server and associating your repository [page 21]
Use the Data Services Server Manager to configure a new job server and associate the job server with the local
repository.
1. Select Start Programs SAP Data Services 4.2 Data Services Server Manager .
Enter a port number that is not used by another process on the computer. If you are unsure of which port
number to use, increment the default port number.
Tutorial
Introduction to the tutorial PUBLIC 21
6. Click Add in the Associated Repositories group on the left.
The remaining connection options that appear are applicable to the database type you choose.
8. Enter the remaining connection information based on the information that you noted in the worksheet in
Creating repository, source, and target databases on an existing RDBMS [page 20].
Example
Only select Default repository for the local repository. There can be only one default repository. If you are
following these steps to set up a different repository other than the local repository, do not select the
Default repository option.
10. Click Apply to save your entries.
Task overview: Tasks required to prepare for the tutorial [page 18]
Next task: Configuring the local repository in the CMC [page 22]
Before you can grant repository access to your user, configure the repository in the Central Management
Console (CMC).
1. Log in to the Central Management Console using your tutorial user name and password, tutorial_user
and tutorial_pass.
2. Click Data Services from the Organize list at left.
The Data Services management view opens.
Tutorial
22 PUBLIC Introduction to the tutorial
5. Enter the connection information for the database you created for the local repository.
6. Click Test Connection.
A dialog appears indicating whether or not the connection to the repository database was successful. Click
OK. If the connection failed, verify your database connection information and re test the connection.
7. Click Save.
The Add Data Services Repository view closes.
8. In the Data Services view, click the Repositories folder node at left.
The existing configured repositories appear. Verify that the new repository is included in the list.
9. Click Log Off to exit the Central Management Console.
Task overview: Tasks required to prepare for the tutorial [page 18]
Previous task: Defining a job server and associating your repository [page 21]
Run the tutorial SQL scripts to create the sample source and target tables.
Data Services installation includes a batch file (CreateTables_<databasetype>.bat) for several of the
supported database types. The batch files run SQL scripts that create and populate tables on your source
database and create the target schema on the target database. If you used the suggested file names, user
names, and passwords for the “ods” and “target” databases, you only add the connection name to the
appropriate areas in the script.
1. Locate the CreateTables batch file for your specific RDBMS in the Data Services installation directory.
The default location is <LINK_DIR>\Tutorial Files\Scripts.
2. Right-click and select Edit.
Tip
Use a copy of the original script file. Rename the original script file indicating that it is the original.
3. If you are not using the suggested user name and password, ods/ods and target/target, update the
script file with the user name and password that you used for the database name and target name.
The Microsoft SQL Server batch is CreateTables_MSSQL2005.bat. It contains the following commands:
Tutorial
Introduction to the tutorial PUBLIC 23
Note
The output files provide logs that contain success or error notifications that you can examine.
4. Save and close the .bat file.
5. Double-click the batch file name to run the SQL scripts.
6. Use the applicable RDBMS query tool to check your source ODS database.
The following tables should exist on your source database after you run the script. These tables should
include a few rows of sample data.
Customer ods_customer
Material ods_material
Employee ods_employee
Region ods_region
Task overview: Tasks required to prepare for the tutorial [page 18]
Previous task: Configuring the local repository in the CMC [page 22]
Tutorial
24 PUBLIC Introduction to the tutorial
2.4.2 Tutorial structure
We use a simplified data model for the exercises in this tutorial to introduce you to SAP Data Services features.
The tutorial data model is a sales data warehouse with a star schema that contains one fact table and some
dimension tables.
In the tutorial, you perform tasks on the sales data warehouse. We divided the tasks in to the following
segments:
Tutorial segments
Segment Lessons
Populate the Sales Organization Dimension from a flat file Introduces basic data flows, query transforms, and source
and target tables. The exercise populates the Sales Organi
zation Dimension table from flat-file data.
Populate the Time Dimension table using a transform Introduces Data Services functions. This exercise creates a
data flow for populating the Time Dimension table.
Populate the Customer Dimension from a relational table Introduces data extraction from relational tables. This exer
cise defines a job that populates the Customer Dimension.
Populate the Material Dimension from an XML File Introduces data extraction from nested sources. This exer
cise defines a job that populates the Material Dimension.
Populate the Sales Fact table from multiple relational tables Continues data extraction from relational tables and introdu
ces joins and the lookup function. The exercise populates
the Sales Fact table.
Complete each segment before going on to the next segment. Each segment creates the jobs and objects that
you need in the next segment. And we reinforce each skill in subsequent segments. As you progress, we
eliminate detailed steps for some of the basic skills that we introduced earlier.
Tutorial
Introduction to the tutorial PUBLIC 25
Parent topic: Preparation for this tutorial [page 18]
Related Information
If you haven't saved your changes, the software prompts you to save your work before you exit.
2. Click Yes to save your work.
Related Information
If you exited the tutorial, you can resume the tutorial at any point.
1. Log in to the Designer and select the repository in which you saved your work.
The Designer window opens.
2. From the Project menu, click Open.
3. Click the name of the tutorial project you want to work with, then click Open.
The Designer window opens with the project and the objects within it displayed in the project area.
Tutorial
26 PUBLIC Introduction to the tutorial
Task overview: Preparation for this tutorial [page 18]
Related Information
Tutorial
Introduction to the tutorial PUBLIC 27
3 Source and target metadata
The software uses metadata for the source and target objects to connect to the data location and to access the
data.
Source and target metadata is especially important when you access data that is in a separate environment
than your Data Services environment.
For the tutorial, you set up logical connections between Data Services, a flat-file source, and a target data
warehouse.
You need to log into the Designer to perform the exercises in the tutorial. After you log in to the Designer a few
times, you won't need to refer to these steps, and you will remember your log in credentials.
Obtain the required repository user credentials from your administrator. Before you begin, review the options in
step 2 to ensure you have the correct information.
Option Description
System-host[:port] The name of the Central Management Server (CMS) system. You may also
need to specify the port when applicable.
Tutorial
28 PUBLIC Source and target metadata
Option Description
User name The name that your administrator used to define you as a user in the Central
Management Console (CMC).
Password The password that your administrator used to define you as a user in the CMC.
Data Services Designer opens. See an example of the Designer interface in The Designer user interface
[page 12].
Next you learn about how to use a datastore to define the connections to the source and target databases.
Related Information
After you create a datastore, import metadata from the database or application for which you created the
datastore. Use the objects from the import for sources or targets in jobs. Keep in mind that you are importing
only metadata and not the data itself. On a basic level, use imported objects in SAP Data Services as follows:
● As a source object, Data Services accesses the data through the connection information in the datastore
and loads the data into the data flow.
● As a target object, Data Services outputs processed data from the data flow into the target object and, if
configured to do so, uploads the data to the database or application using the datastore connection
information.
In addition to other elements such as functions and connection information, the metadata in a datastore
consists of the following table elements:
Tutorial
Source and target metadata PUBLIC 29
● Table name
● Column names
● Column data types
● Primary key columns
● Table attributes
Data Services datastores can connect to any of the following databases or applications:
● Databases
● Mainframe file systems
● Applications that have prepackaged or user-written adapters
● J.D. Edwards, One World, J.D. Edwards World, Oracle applications, PeopleSoft, SAP applications, SAP Data
Quality Management, microservices for location data, Siebel applications, and Google BigQuery.
● Remote servers using FTP, SFTP, and SCP
● SAP systems: SAP applications, SAP NetWeaver Business Warehouse (BW) Source, and BW Target
For complete information about datastores, see the Designer Guide. See the various supplements for
information about specific databases and applications. For example, for applications with adapters, see the
Supplement for Adapters.
Related Information
Create a database datastore to use as a connection to the ODS database that you created when you set up for
the tutorial.
Use the information from the worksheet that you completed in Creating repository, source, and target
databases on an existing RDBMS [page 20] for completing the options in the datastore that you create in the
following steps.
Tutorial
30 PUBLIC Source and target metadata
1. Open the Datastores tab in the Local Object Library in Designer and right-click in the blank area.
2. Select New from the popup menu.
The remaining options change based on the database type you choose.
6. Complete the remaining options that appear after you choose the database type.
The options in the following table are options that are present for most database types. Find your database
in the heading and go down the column to see the options related to that database. If the table doesn't
have your database type, see the Reference Guide for a list of options for each supported database type.
Use TNS name Use data source name Database Subtype Database Version
(DSN)
SID or Service Name Database name Database server name Database Name
7. Click OK.
Data Services saves a datastore for your source in the repository.
Related Information
Tutorial
Source and target metadata PUBLIC 31
Object naming conventions [page 17]
Create a database datastore to use as a connection to the database that you named “target” when you set up
for the tutorial.
Define a datastore for the “target” database using the same procedure as for the source (ODS) database.
Name the datastore Target_DS.
Related Information
After you create a database datastore for the database, you can import table information from the database.
● By browsing
● By file name
● By searching
For the tutorial, we take you through the steps to browse for metadata.
Related Information
Tutorial
32 PUBLIC Source and target metadata
Defining a file format [page 34]
Summary and what to do next [page 36]
Access the ODS datastore external metadata in Designer to import all of the table metadata.
1. In the Datastores tab, right-click the ODS_DS datastore and click Open.
The names of all the tables in the database defined by the datastore named ODS_DS display in the
workspace. Notice that the External Metadata option at the top of the workspace is automatically selected.
2. Optional. Resize the Metadata column by double-clicking with the resize cursor on the column separator.
3. Select all of the tables to highlight them:
○ ods.ods_customer
○ ods.ods_delivery
○ ods.ods_employee
○ ods.ods_material
○ ods.ods_region
○ ods.ods_salesitem
○ ods.ods_salesorder
Data Services imports the metadata for each table into the local repository.
Note
For Microsoft SQL Server databases, the owner prefix might be “dbo” instead of “ods”. For example,
dbo.ods_customer instead of ods.ods_customer.
4. In the Object Library Datastores tab, expand the Tables node under ODS_DS to verify that the tables have
been imported into the repository.
Access the Target datastore external metadata in Designer to import all of the table metadata.
1. In the Datastores tab, right-click the Target_DS datastore and click Open.
The names of all the tables in the database defined by the datastore named Target_DS display in a window
in the workspace. Notice that the External Metadata option at the top of the workspace is automatically
selected.
Tutorial
Source and target metadata PUBLIC 33
2. Optional. Resize the Metadata column by double-clicking with the resize cursor on the column separator.
3. Select all of the tables to highlight them:
○ target.CDC_time
○ target.cust_dim
○ target.employee_dim
○ target.mtrl_dim
○ target.sales_fact
○ target.salesorg_dim
○ target.status_table
○ target.time_dim
Data Services imports the metadata for each table into the local repository.
Note
For Microsoft SQL Server databases, the owner prefix might be “dbo” instead of “target”. For example,
dbo.cust_dim instead of target.cust_dim.
4. In the Object Library Datastores tab, expand the Tables node under Target_DS to verify the tables have
been imported into the repository.
Previous task: Importing metadata for ODS source tables [page 33]
File formats are a set of properties that describe the structure of a flat file.
Use the Data Services file format editor to create a flat file format for sales_org.txt.
1. Open the Formats tab In the Local Object Library and right-click in a blank area in the tab.
Tutorial
34 PUBLIC Source and target metadata
7. In the Default Format group select ddmmyyyy from the Date parameter. If ddmmyyyy is not in the
dropdown list, type it in the Default Format space.
The date format matches the date data under the Field3 column in the data pane in the lower right.
8. In the Input/Output group, set Skip row header to Yes.
9. Select Yes for the prompt asking to overwrite the current schema.
The software replaces the original column headers, Fieldx, in the data pane with contents from the 2nd row.
The software also changes the values in the Field Name column in the upper right to the column names.
10. In the schema attributes pane in the upper right, click the cell under the Data Type column in the DateOpen
row. Select date from the dropdown list to change the data type.
The following screen capture shows the completed File Format Editor.
Tutorial
Source and target metadata PUBLIC 35
Related Information
After you complete the tasks in the Source and target metadata section, make sure that you select to save the
project. The information that you created in this section is saved to the local repository and is available the next
time you log in to Data Services.
Also, before you exit Designer, close any open workspace tabs by clicking the X icon in the upper right of each
workspace.
What you have learned in the Source and target metadata section:
● How to define a datastore from Data Services to your target data warehouse
● How to import metadata from target tables into the local repository
● How to define a flat file format and a connection to flat-file source data
What is next: In the next section you populate the Sales Org.Dimension table with data from the
sales_org.txt flat file.
Related Information
Tutorial
36 PUBLIC Source and target metadata
4 Populate the Sales Organization
dimension from a flat file
Populate the Sales Org. dimension table with data from a source flat file named Format_SalesOrg.
The following diagram shows the Star Schema with the Dimension file circled.
Each task in this segment builds a Data Services project. Each project contains objects in a specific hierarchical
order.
At the end of each task, save your work. You can either proceed to the next task or exit Data Services. If you exit
Data Services before you save your work, the software asks that you save your work before you exit.
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 37
If there are warnings and errors after you validate your job, fix the errors. The job does not execute with
existing errors. You do not need to fix the cause for the warnings because warnings do not prohibit the
job from running.
8. Saving the project [page 46]
You can save the steps you have completed and close Data Services at any time.
9. Ensuring that the Job Server is running [page 46]
Before you execute a job (either as an immediate or scheduled task), ensure that the Job Server is
associated with the repository where the client is running.
10. Executing the job [page 47]
Execute the job to move data from your source to your target.
11. Summary and what to do next [page 49]
In the exercises to populate the Sales Organization dimension table, you learned new skills that you will
use for just about any data flow, and you learned about using functions in an output schema and much
more.
Related Information
Begin the tutorial by creating a new project and opening it in the Project Area of the SAP Data Services
Designer.
Log in to the Designer and follow these steps to create a new project:
A list of your existing projects appears. If you do not have any projects created, the list is empty.
2. Enter the following name in Project name: Class_Exercises.
3. Click Create.
The project Class_Exercises appears in the Project Area of the Designer, and in the Project tab of the Local
Object Library.
Next, create a job for the new project. If you plan to exit Data Services, save the project.
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Tutorial
38 PUBLIC Populate the Sales Organization dimension from a flat file
Related Information
If you are logged out of SAP Data Services Designer, log in and open the project named Class_Exercises.
Follow these steps to create a new job for the Class_Exercises project:
The job appears in the Project Area under Class_Exercises, and in the Jobs tab under the Batch Jobs node
in the Local Object Library.
Save the new job and proceed to the next exercise. Next you add a workflow to the job JOB_SalesOrg.
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Related Information
Workflows contain the order of steps in which the software executes a job.
In Designer, open the Class_Exercises project and expand it to view the JOB_SalesOrg job.
The job opens in the workspace and the tool palette appears to the right of the workspace.
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 39
2. Select the workflow button from the tool palette and click the blank workspace area.
A workflow icon appears in the workspace. The workflow also appears in the Project Area hierarchy under
the job JOB_SalesOrg.
Note
Workflows are easiest to read in the workspace from left to right and from top to bottom. Keep this
arrangement in mind as you add objects to the workflow workspace.
An empty view of the workflow appears in a new workspace tab. Use this area to define the elements of the
workflow.
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Related Information
Make sure the workflow is open in the workspace. If it is not open, click the WF_SalesOrg workflow in the
Project Area.
1. Click the data flow button on the tool palette to the right of the workspace.
2. Click the workspace.
The data flow icon appears in the workspace and the data flow icon also appears in the Project Area.
3. Enter DF_SalesOrg as the name for the dataflow in the text box above the icon in the workspace.
The project, job, workflow, and data flow objects display in hierarchical form in the Project Area. To navigate
to these levels, expand each node in the project area.
4. Click DF_SalesOrg in the Project Area to open a blank definition area in the workspace.
Next, define the data flow DF_SalesOrg in the definition area that appears in the workspace.
Tutorial
40 PUBLIC Populate the Sales Organization dimension from a flat file
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Related Information
To define the instructions for building the sales organization dimension table, add objects to DF_SalesOrg in
the workspace area.
Build the sales organization dimension table by adding a source file, query object, and a target table to the DF
SalesOrg data flow in the workspace:
The next three tasks guide you through the steps necessary to define the content of a data flow:
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 41
4.5.1 Adding objects to the DF_SalesOrg data flow
Add objects to the DF_SalesOrg data flow workspace to start building the data flow.
1. Open the Formats tab In the Local Object Library and expand the Flat Files node.
2. Click and drag Format_SalesOrg to the workspace and release it.
Position the object to the left of the workspace area to make room for other objects.
A notification appears for you to indicate whether to make the object a source or target.
3. Click Make Source.
4. Click the Query icon on the tool palette. and click in the workspace.
The Query icon appears in the workspace. Drag it to the right of the Format_SalesOrg source object in
the workspace.
5. Open the Datastores tab in the Local Object Library and expand the Target_DS node.
6. Click and drag SALESORG_DIM to the workspace and drop it to the right of the Query icon.
A notification appears for you to indicate whether to make the object a source or target.
7. Click Make Target.
All the objects necessary to create the sales organization dimension table are now in the workspace. In the next
section, you connect the objects in the order in which you want the data to flow.
To define the sequence for the dataflow DF_SalesOrg, connect the objects in a specific order.
1. Click the square on the right edge of the Format_SalesOrg source file and drag your pointer to the
triangle on the left edge of the query transform.
When you drag from the square to the triangle, the software connects the two objects with a line. If you
start with the triangle and go to the square, the software won't connect the two objects.
2. Use the same drag technique to connect the square on the right edge of the query transform to the triangle
on the left edge of the SALESORG_DIM target table.
Tutorial
42 PUBLIC Populate the Sales Organization dimension from a flat file
The order of operation is established after you connect all of the objects. Next you configure the query
transform.
Configure the query transform by mapping columns from the source to the target object.
Before you can configure the query transform, you connect the source to the query, and the query to the target
in the workspace. Learn more about creating a data flow and configuring the query transform in the Reference
guide.
When you connect the objects in the data flow, the column information from the source and target files appears
in the Query transform to help you set up the query.
The query editor opens. The query editor contains the following areas:
○ Schema In pane: Lists the columns in the source file
○ Schema Out pane: Lists the columns in the target file
○ Options pane: Contains tabs for defining the query
Because the query is connected to the target in the data flow, the software automatically copies the target
schema to the Schema Out pane.
2. Select the column icon of the specified input column in the Schema In pane and drag it to the
corresponding column in the Schema Out pane. Map the columns as listed in the following table.
SalesOffice SALESOFFICE
DateOpen DATEOPEN
Region REGION
Note
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 43
After you drag the input column to the output column, an arrow icon appears next to the source column to
indicate that the column has been mapped.
The following list names the areas of the query editor marked with red letters in the image.
○ A. Target schema
○ B. Source schema
○ C. Query option tabs
○ D. Column mapping definition
3. Select a field in the Schema Out area and view the column mapping definition in the Mapping tab of the
options pane. For example, in the image above the mapping for the SalesOffice input column to the
SALESOFFICE output column is: Format_SalesOrg.SalesOffice.
4. Click the Type column cell for the SALESOFFICE column in the Schema Out pane and select Decimal.
5. Set Precision to 10 and Scale to 2 in the Type:Decimal popup. Click OK.
6. Click the Back arrow icon from the toolbar at the top of the page to close the query editor and return to
the data flow worksheet.
7. Save your work.
Perform a design-time validation, which checks for construction errors such as syntax errors.
The Validation menu provides design-time validation options. You can check for runtime errors later in the
process.
Tutorial
44 PUBLIC Populate the Sales Organization dimension from a flat file
Note
You can alternatively use the icon bar and click Validate Current and Validate All to perform the same
validations.
After the validation completes, Data Services displays the Output dialog with the Warning tab indicating any
warnings.
Note
Two warning messages appear indicating the Data Services will convert the data type for the SALESOFFICE
column.
An Error tab contains any validation errors. You must fix the errors before you can proceed.
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Related Information
After you validate a job, an output window appears listing warnings and errors if applicable.
Data Services displays the Message window in which you can read the expanded notification text.
2. For errors, double-click the error notification to open the editor of the object containing the error.
After you validate the job with no errors, you have completed the description of the data movement for the
sales organization dimension table.
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 45
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
You can save the steps you have completed and close Data Services at any time.
● To save all changed objects from the current session, click the Save All icon in the toolbar.
● Or, simply exit Designer. Data Services presents a list of all changed objects that haven't been saved. Click
Yes to save all objects in the list, or select specific objects to save. Data Services does not save the objects
that you deselect.
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Before you execute a job (either as an immediate or scheduled task), ensure that the Job Server is associated
with the repository where the client is running.
When the Designer starts, it displays the status of the Job Server for the repository to which you are
connected.
Icon Description
The name of the active Job Server and port number appears in the status bar when the cursor is over the icon.
Tutorial
46 PUBLIC Populate the Sales Organization dimension from a flat file
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
Execute the job to move data from your source to your target.
Complete all of the steps to populate the Sales Organization Dimension from a flat file. Ensure that all errors
are fixed and that you save the job. If you exited Data Services, log back in to Data Services, and ensure that the
Job Server is running.
The software validates the job and displays the Execution Properties.
Note
If you followed the previous steps to validate your job and fix errors, you should not have errors.
Execution Properties includes parameters and options for executing the job and to set traces and global
variables. Do not change the default settings for this exercise.
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 47
4. Click OK.
Data Services displays a job log in the workspace. Trace messages appear while the software executes the
job.
5. Change the log view by clicking the applicable log button at the top of the job log.
Log files
Trace log A list of the job steps in the order they started.
Tutorial
48 PUBLIC Populate the Sales Organization dimension from a flat file
Log file Description
Monitor log A list of each step in the job, the number of rows proc
essed by that step, and the time required to complete the
operation.
Note
The error icon is not active when there are no errors.
Note
Remember that you should periodically close the tabs in the workspace when you are finished working with
the objects in the tab. To close a tab, click the X icon in the upper right of the workspace.
Task overview: Populate the Sales Organization dimension from a flat file [page 37]
What is next: Populate the Time Dimension table with the following time attributes
● Year number
● Month number
● Business quarter
You can now exit Data Services or go to the next group of tutorial exercises. If you exit, the software reminds
you to save your work if you did not save it before. The software saves all projects, jobs, workflows, data flows,
and results in the local repository.
Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 49
Parent topic: Populate the Sales Organization dimension from a flat file [page 37]
Related Information
Tutorial
50 PUBLIC Populate the Sales Organization dimension from a flat file
5 Populate the Time dimension table
Time dimension tables contain date and time-related attributes such as season, holiday period, fiscal quarter,
and other attributes that are not directly ascertainable from traditional SQL style date and time data types.
The Time dimension table in this example is simple in that it contains only the year number, month number,
and business quarter as Time attributes. It uses a Julian date as a primary key.
Tutorial
Populate the Time dimension table PUBLIC 51
8. Summary and what to do next [page 56]
In the exercises to populate the Time Dimension table, you practiced the skills that you learned in the
first group of exercises, plus you learned how to use different objects as source and target in a data
flow.
We use the Class_Exercises project for all of the jobs created in the tutorial.
If you closed Data Services after the last exercise, log in to Data Services and follow these steps to open the
tutorial project.
Next task: Adding a job and data flow to the project [page 52]
Prepare a new job and data flow to populate the Time dimension table.
1. Right-click the project name Class_Exercises in the Project Area and select New Batch Job.
The new job appears under the Class_Exercises project node in the Project Area and an empty
workspace opens.
2. Rename the job JOB_TimeDim.
3. Right-click in the empty Job_TimeDim workspace and select Add New Data Flow .
4. Rename the new data flow DF_TimeDim.
Note
A workflow is an optional object that you can exclude from a data flow. For this job we do not add a
workflow.
Tutorial
52 PUBLIC Populate the Time dimension table
Previous task: Opening the Class_Exercises project [page 52]
Next task: Adding the components of the time data flow [page 53]
The components of the DF_TimeDim data flow consist of a transform as a source and a datastore as a target.
2. Open the Transforms tab in the Local Object Library and expand the Data Integrator
node.
3. Drag the Date_Generation transform onto the data flow workspace.
The transforms in the Transform tab are predefined. The transform on your workspace is a copy of the
predefined Date_Generation transform.
4. Click the query button on the tool palette and click in the workspace.
A query object appears in the workspace. Arrange the query to the right of the Date Generation transform.
5. Open the Datastore tab in the Local Object Library and expand the Tables node under Target_DS.
6. Drag the TIME_DIM table onto the workspace and drop it to the right of the query.
7. Click Make Target from the popup menu.
All of the objects to create the time dimension table are in the workspace.
Previous task: Adding a job and data flow to the project [page 52]
Connect the objects in the DF_TimeDim data flow in the order in which you want Data Services to process
them.
1. Click the square on the right edge of the Date_Generation transform and drag a line to the triangle on the
left edge of the query.
Tutorial
Populate the Time dimension table PUBLIC 53
2. Use the same drag technique to connect the query to the TIME_DIM target.
The connections indicate the flow of data. Now you provide instructions in each object of the data flow so the
software knows how to process the data.
Previous task: Adding the components of the time data flow [page 53]
Next task: Defining the output of the Date_Generation transform [page 54]
Define the Date_Generation transform so it produces a column of dates for a specific range and increment.
Connect all of the objects in the data flow in the correct order before you configure them.
1. Click the name of the Date_Generation transform in the Class_Exercises project in the Project Area.
Increment daily
Make sure that Join rank is set at 0 and Cache is not selected.
Note
The Start Date and End Date options have a dropdown arrow, but you must type the values in for this
exercise.
3. Click the Back arrow in the upper toolbar to close the transform editor and return to the data flow.
4. Save the project.
The software moves the specified data to the Query transform as input.
Tutorial
54 PUBLIC Populate the Time dimension table
Previous task: Defining the flow of data [page 53]
1. Click the Query object in the project area under the DF_TimeDim data flow.
The Query editor opens. The Query editor has an input schema section with a single column, an output
schema that is copied from the target datastore, and an options section.
2. Drag the DI_GENERATED_DATE column from the input schema to the NATIVEDATE column in the output
schema.
3. Map each of the other output columns in the output schema by following these substeps:
The following table contains the column name and the corresponding function to enter.
Note
For this tutorial, the business year is the same as the calendar year.
These columns become the input schema for the TIME_DIM target table.
Previous task: Defining the output of the Date_Generation transform [page 54]
Tutorial
Populate the Time dimension table PUBLIC 55
Next task: Saving and executing the job [page 56]
After you save the data flow DF_TimeDim, execute the JOB_TimeDim job to populate the TIME_DIM dimension
table with the changed data.
For instructions to validate and execute the job, see Validating the DF_SalesOrg data flow [page 44] and
Executing the job [page 47].
After the job successfully completes, view the output data using your database management tool. Compare
the output to the input data and see how the functions that you set up in the query affected the output data.
Note
Remember that you should periodically close the tabs in the workspace when you are finished working with
the objects in the tab.
In the exercises to populate the Time Dimension table, you practiced the skills that you learned in the first
group of exercises, plus you learned how to use different objects as source and target in a data flow.
You have now populated the following tables in the sales data warehouse:
In the next section, you will extract data to populate the Customer dimension table.
Tutorial
56 PUBLIC Populate the Time dimension table
You can now exit Data Services or go to the next group of tutorial exercises. If you exit, the software reminds
you to save your work if you did not save it before. The software saves all projects, jobs, workflows, data flows,
and results in the local repository.
Related Information
Populate the Customer dimension table from a relational table [page 58]
Tutorial
Populate the Time dimension table PUBLIC 57
6 Populate the Customer dimension table
from a relational table
In this exercise, you populate the Customer dimension table in the Sales star schema with data from a
relational table.
In the past exercises you have used a flat file to populate the Sales Org. dimension table and a transform to
populate the Time dimension table. In this exercise you use a relational table to populate the Customer
dimension table.
You also use the interactive debugger to examine the data after each transform or object in the data flow.
Before you continue with this exercise, make sure that you imported the source and target tables as instructed
in the Importing metadata [page 32] section.
Tutorial
58 PUBLIC Populate the Customer dimension table from a relational table
The Designer interactive debugger allows you to examine and modify data row by row using filters and
breakpoints on lines in a data flow diagram.
7. Summary and what to do next [page 67]
In the exercise to populate the Customer dimension table with a relational table, you learned to use
some basic features of the interactive debugger.
1. Right-click the Class_Exercises project name and select New Batch Job.
A tab opens in the workspace area for the new batch job.
2. Rename this job JOB_CustDim.
3. Select the workflow button from the tool palette at right and click the workspace area.
Task overview: Populate the Customer dimension table from a relational table [page 58]
Related Information
1. Click the data flow button in the tool palette at right and click in the workspace.
The project, job, workflow, and data flow objects display in hierarchical form in the Project Area. To navigate
to these levels, click their names in the project area.
Tutorial
Populate the Customer dimension table from a relational table PUBLIC 59
3. Click DF_CustDim in the Project Area.
A blank definition area for the data flow appears in the workspace.
Task overview: Populate the Customer dimension table from a relational table [page 58]
Previous task: Adding the CustDim job and workflow [page 59]
Related Information
Add objects to DF_CustDim in the workspace area to define the data flow instructions for populating the
Custom dimension table.
In this exercise, you build the data flow by adding the following objects:
● Source table
● Query transform
● Target table
Parent topic: Populate the Customer dimension table from a relational table [page 58]
Tutorial
60 PUBLIC Populate the Customer dimension table from a relational table
1. Open the Datastore tab in the Local Object Library and expand the Tables node under ODS_DS.
2. Drag and drop the ODS_CUSTOMER table to the workspace and click Make Source.
3. Click the query button on the tool palette at right and click in the workspace to the right of the
CUSTOMER table.
You configure the query transform by mapping columns from the source to the target objects.
Note
Tutorial
Populate the Customer dimension table from a relational table PUBLIC 61
Note
If your database manager is Microsoft SQL Server or Sybase ASE, specify the columns in the order
shown in the table.
4. Click the Back arrow in the icon bar to return to the data flow.
5. Save your work.
Next you will verify that the data flow has been constructed properly.
From the menu bar, click Validation Validate All Objects in View .
Note
You can alternatively use the icon bar and click Validate Current and Validate All to perform the same
validations.
If your design contains syntax errors, a dialog box appears with a message describing the error. Warning
messages usually do not affect proper execution of the job.
Task overview: Populate the Customer dimension table from a relational table [page 58]
Tutorial
62 PUBLIC Populate the Customer dimension table from a relational table
6.5 Executing the CustDim job
You execute the CustDim job in the same way that you execute the other jobs in the tutorial. However,we show
you how to view data.
1. In the Project Area, right-click the JOB_CustDim job and click Execute.
2. Click OK.
1. Click the DF_CustDim data flow in the Project Area. The data flow workspace opens.
2. Click the magnifying glass that appears on the lower right corner of the target object.
A sample view of the output data appears in the lower pane. Notice that there is not a CUST_TIMESTAMP
column in the output file. However, the software added the CUST_ID column to the output file.
For information about the icon options above the sample data, see the Designer Guide.
Task overview: Populate the Customer dimension table from a relational table [page 58]
The Designer interactive debugger allows you to examine and modify data row by row using filters and
breakpoints on lines in a data flow diagram.
The debugger allows you to examine what happens to the data after each transform or object in the flow.
● Debug filter: Functions as a simple query transform with a WHERE clause. Use a filter to reduce a data set
in a debug job execution.
● Breakpoint: Location where a debug job execution pauses and returns control to you.
When you start a job in the interactive debugger, Designer displays three additional panes as well as the View
Data panes beneath the workspace area. The following diagram shows the default locations for these panes.
Tutorial
Populate the Customer dimension table from a relational table PUBLIC 63
1. View data panes, left and right
2. Call Stack pane
3. Trace pane
4. Debug Variables pane
The left View Data pane shows the data in the CUSTOMER source table, and the right pane shows one row at a
time (the default) that has passed to the query.
Optionally, set a condition in a breakpoint to search for specific rows. For example, you can set a condition to
stop the data flow when the debugger reaches a row in the data with a Region_ID value of 2.
In the next exercise, we show you how to set a breakpoint and debug your DF_CustDim data flow.
Tutorial
64 PUBLIC Populate the Customer dimension table from a relational table
Parent topic: Populate the Customer dimension table from a relational table [page 58]
A breakpoint is a location in the data flow where a debug job execution pauses and returns control to you.
Ensure that you have the Class_Exercises project open in the Project Area.
The Breakpoint settings are in the right pane of the Breakpoint editor.
4. Select the Set checkbox.
Tutorial
Populate the Customer dimension table from a relational table PUBLIC 65
5. Click OK.
1. In the Designer Project Area, right-click Job_CustDim and select Start debug.
Click OK if you see a prompt to save your work.
The Debug Properties editor opens. See The interactive debugger [page 63] for an explanation of the
Debug Properties editor.
2. Click OK to close the Debug Properties editor.
The debugging stops after the first row and displays the View data left and right panes.
3. To process the next row, click from the icon toolbar at the top of the workspace area.
The next row replaces the existing row in the right view data pane.
4. To see all debugged rows, select the All checkbox in the upper right of the right view data pane.
The right pane shows the first two rows that it has debugged.
5. To stop the debug mode, click Stop Debug from the Debug menu, or click the Stop Debug button on the
toolbar. .
For example, add a breakpoint condition for the Customer Dimension job to break when the debugger reaches
a row in the data with a Region_ID value of 2.
1. Open the breakpoint dialog box by double-clicking the breakpoint icon in the data flow.
2. Click the cell under the Column heading and click the down arrow to display a dropdown list of columns.
3. Click CUSTOMER.REGION_ID.
4. Click the cell under the Operator heading and click the down arrow to display a dropdown list of operators.
Click = .
5. Click the cell under the Value heading and type 2.
6. Click OK.
7. Right-click the job name and click Start debug.
The debugger stops after processing the first row with a Region_ID of 2. The right View Data pane shows
the break point.
8. To stop the debug mode, from the Debug menu, click Stop Debug, or click the Stop Debug button on the
toolbar.
Tutorial
66 PUBLIC Populate the Customer dimension table from a relational table
6.7 Summary and what to do next
In the exercise to populate the Customer dimension table with a relational table, you learned to use some basic
features of the interactive debugger.
In the next section, you learn about document type definitions (DTD) and extracting data from an XML file.
For more information about the topics covered in this section, see the Designer Guide.
Parent topic: Populate the Customer dimension table from a relational table [page 58]
Related Information
Tutorial
Populate the Customer dimension table from a relational table PUBLIC 67
7 Populate the Material Dimension from an
XML File
In this exercise, we use a DTD to define the format of an XML file, which has a hierarchical structure. The
software can process the data only after you have flattened the hierarchy.
An XML file represents hierarchical data using XML tags instead of rows and columns as in a relational table.
There are two methods for flattening the hierarchy of an XML file so that the software can process your data. In
this exercise we first use a Query transform and systematically flatten the input file structure. Then we use an
XML_Pipeline transform to select portions of the nested data to process.
To help you understand the goal for the tasks in this section, read about nested data in the Designer Guide.
Tutorial
68 PUBLIC Populate the Material Dimension from an XML File
After unnesting the source data using the Query in the last exercise, validate the DF_MtrlDim to make
sure there are no errors.
6. Executing the MtrlDim job [page 76]
After you save the MtrlDim data flow, execute the MtrlDim job.
7. Leveraging the XML_Pipeline [page 76]
The main purpose of the XML_Pipeline transform is to extract parts of the XML file.
8. Summary and what to do next [page 79]
In this section you learned two ways to process an XML file: With a Query transform and with the XML
Pipeline transform.
Related Information
The software provides a way to view and manipulate hierarchical relationships within data flow sources, targets,
and transforms using Nested Relational Data Modeling (NRDM).
In this tutorial, we use a document type definition (DTD) schema to define an XML source. XML files have a
hierarchical structure. The DTD describes the data contained in the XML document and the relationships
among the elements in the data.
You imported the mtrl.dtd file when you ran the script for this tutorial. It is located in the Formats tab of the
Local Object Library under Nested Schemas.
For complete information about nested data, see the Designer Guide.
Parent topic: Populate the Material Dimension from an XML File [page 68]
Next task: Adding MtrlDim job, workflow, and data flow [page 69]
To create the objects for this task, we omit the details and rely on the skills that you learned in the first few
exercises of the tutorial.
1. Add a new job to the Class_Exercises project and name it JOB_MtrlDim. To remind you of the steps,
see Adding the CustDim job and workflow [page 59].
Tutorial
Populate the Material Dimension from an XML File PUBLIC 69
2. Add a workflow and name it WF_MtrlDim. To remind you of the steps, see Adding the CustDim job and
workflow [page 59].
3. Click WF_MtrlDim in the Project Area to open it in the workspace.
4. Add a data flow to the workflow definition and name it DF_MtrlDim. To remind you of the steps, see
Adding a data flow [page 40].
Task overview: Populate the Material Dimension from an XML File [page 68]
Import the document type definition (DTD) schema named mtrl.dtd as described in the following steps.
The software adds the DTD file Mtrl_List to the Nested Schemas group in the Local Object Library.
Task overview: Populate the Material Dimension from an XML File [page 68]
Previous task: Adding MtrlDim job, workflow, and data flow [page 69]
Tutorial
70 PUBLIC Populate the Material Dimension from an XML File
7.4 Define the MtrlDim data flow
In this exercise you add specific objects to the DF_MtrlDim data flow workspace and connect them in the
order in which the software should process them.
Follow the tasks in this exercise to configure the objects in the DF_MtrlDim data flow so that the data flow
correctly processes hierarchical data from an XML source file.
Parent topic: Populate the Material Dimension from an XML File [page 68]
Next task: Validating that the MtrlDim data flow has been constructed properly [page 75]
Build the DF_MtrlDim data flow with a source, target, and query transform.
The Source File Editor opens containing the Schema Out options in the upper pane and the Source options
in the lower pane.
6. In the Source options, leave File Location set to {none}.
7. Click the File dropdown arrow and select <Select file>.
The File option populates with the file name and location of the XML file.
Tutorial
Populate the Material Dimension from an XML File PUBLIC 71
9. Select XML.
10. Select Enable validation to enable comparison of the incoming data to the stored data type definition (DTD)
format.
The software automatically populates the following options in the Source tab in the lower pane:
○ Format name: The software automatically populates with the schema name Mtrl_List
○ Root element name: The software automatically populates with the primary node name
MTRL_MASTER_LIST
Note
11. Click the back arrow icon to return to the DF_MtrlDim data flow workspace.
12. Click the query transform icon in the tool palette and then click to the right of the table in the workspace.
Use the query transform to unnest the hierarchical Mtrl_List XML source data properly.
This process is lengthy. Make sure that you take your time and try to understand what you accomplish in each
step.
1. Click qryunnest in the Project Area to open the query editor in the workspace.
Note
Notice the nested structure of the source in the Schema In pane. Notice the differences in column
names and data types between the input and output schemas.
Tutorial
72 PUBLIC Populate the Material Dimension from an XML File
Instead of dragging individual columns to the Schema Out pane like you usually do, you use specific
configuration settings and systematically unnest the table.
2. Multiselect the five column names in the Schema out pane so they are highlighted: MTRL_ID, MTRL_TYP,
IND_SECTOR, MTRL_GRP, and DESCR.
3. Right-click and select Cut.
The software cuts the five columns and saves the column names and data types to your clipboard.
Note
Do not use Delete. By selecting Cut instead of Delete, the software captures the correct column names
and data types from the target schema. In a later step, we instruct you to paste the clipboard
information to the Schema Out pane of the MTRL_Master target table schema.
4. From the Schema In pane of the qryunnest query editor, drag the MTRL_MASTER schema to the Schema
Out pane.
The software adds the MTRL_Master table and subtable structure to the qryunnest query in the Schema
Out pane.
5. In the Designer Project Area, click the MTRL.DIM target table.
Note
The Schema In pane in the MTRL_DIM target table editor contains the MTRL_MASTER schema that you
just moved to the Schema Out pane in qryunnest query editor. Now you flatten the Schema Out in the
qryunnest query to fit the target table.
6. Follow these substeps to flatten the columns in the qryunnest Schema Out pane to fit the target table:
a. Select the qryunnest tab in the Workspace area to open the query editor.
b. Right-click MTRL_MASTER in the Schema Out pane and choose Make Current.
Tutorial
Populate the Material Dimension from an XML File PUBLIC 73
e. Right-click the MTRL_MASTER schema in the Schema Out pane and select Paste.
You have added all of the columns back that you cut from the Schema Out pane in queryunnest
earlier. This time, however, the columns appear under the MTRL_MASTER schema that you copied
from the Schema In pane.
f. Map the MTRL_ID, MTRL_TYPE, IND_SECTOR, and MRTL_GROUP columns in the Schema In pane to
the corresponding columns in the Schema Out pane using drag and drop.
7. Now follow these substeps to map the DESCR column in the Schema Out pane to SHORT_TEXT in the
TEXT nested schema in the Schema In pane of queryunnest:
a. Right-click the DESCR column in the Schema Out pane and select Cut.
Now the TEXT schema in the Schema Out pane contains only the DESCR column.
9. Open the MTRL_DIM target table tab in the workspace area to view the schema.
The Schema In pane shows the same schemas and columns that appear in the queryunnest query
Schema Out pane. However, the Schema In of the MTRL_DIM target table is still not flat, and it will not
produce the flat schema that the target requires. Perform the following substeps to flatten the Schema In
schema:
10. Select the qryunnest tab in the workspace area to view the query editor.
11. In the Schema Out pane, right-click the TEXT schema and click Unnest.
Notice that the Schema In pane contains two levels: qryunnest as parent level, and MTRL_MASTER as
child. The Schema Out pane shows one level. We still have to reduce the Schema In of the qryunnest
query to one level.
13. Open the qryunnest tab in the Workspace area to open the query editor.
14. Right-click MTRL_MASTER in the Schema Out pane and click Make Current.
The Schema In and Schema Out panes show one level for each.
Tutorial
74 PUBLIC Populate the Material Dimension from an XML File
17. From the Project menu, click Save All.
After unnesting the source data using the Query in the last exercise, validate the DF_MtrlDim to make sure
there are no errors.
1. In the Project Area, click the DF_MtrlDim data flow name to open the editor at right.
2. Click the Validate All icon or select Validate All Objects in View .
You should see warning messages indicating that data type convertion will be used to convert from varchar
(1024) to the data type and length of the target file. The output schema in the query was set to preserve
the date types from the output schema, so you do not have to change anything because of the warnings.
If your design contains any errors in the Errors tab, you must fix them. Go back over the steps in the
exercise to make sure you didn't miss any steps. If you have syntax errors, a dialog box appears with a
message describing the error. Address all errors before continuing.
If you get the error message: "The flat loader...cannot be connected to NRDM," right-click the error
message and click Go to error, which opens the editor for the object in question. In this case, the source
schema is still nested. Return to the qryunnest query editor and unnest the output schema(s).
Task overview: Populate the Material Dimension from an XML File [page 68]
Tutorial
Populate the Material Dimension from an XML File PUBLIC 75
7.6 Executing the MtrlDim job
After you save the MtrlDim data flow, execute the MtrlDim job.
Or, use a query tool in your RDBMS to check the contents of the MTRL.DIM table.
Task overview: Populate the Material Dimension from an XML File [page 68]
Previous task: Validating that the MtrlDim data flow has been constructed properly [page 75]
The main purpose of the XML_Pipeline transform is to extract parts of the XML file.
When you extract data from an XML file to load into a target data warehouse, you usually obtain only parts of
the XML file. The Query transform does partial extraction (as the previous exercise shows), and it does much
more because it has many of the clauses of a SQL SELECT statement.
Because the XML_Pipeline transform focuses on partial extraction, it utilizes memory more efficiently and
performs better than the Query transform for this purpose.
● The XML_Pipeline transform uses less memory because it processes each instance of a repeatable
schema within the XML file, rather than building the whole XML structure first.
● The XML_Pipeline transform continually releases and reuses memory to steadily flow XML data through
the transform.
You can use the XML_Pipeline transform as an alternate way to build the Material dimension table. The data
flow components for this alternate way will consist of the following objects:
Setting up a job and data flow that uses the XML_Pipeline transform [page 77]
In this exercise, you will achieve the same outcome as in the previous exercise, but you use the XML
Pipeline transform for more efficient configuration and processing.
Tutorial
76 PUBLIC Populate the Material Dimension from an XML File
Configuring the XML_Pipeline and Query_Pipeline transforms [page 78]
Open the transform and the query to map input columns to output columns.
Task overview: Populate the Material Dimension from an XML File [page 68]
In this exercise, you will achieve the same outcome as in the previous exercise, but you use the XML Pipeline
transform for more efficient configuration and processing.
Tutorial
Populate the Material Dimension from an XML File PUBLIC 77
21. Save all files.
Note
Open the transform and the query to map input columns to output columns.
Set up the job as instructed in Setting up a job and data flow that uses the XML_Pipeline transform [page 77].
Note
Unlike the qryunnest Query, the XML_Pipeline transform allows you to map a nested column directly to a
flat target.
The Schema In pane shows the nested structure of the source file.
2. Multiselect the following columns from the Schema In pane and drag them to the XML_Pipeline
transform Schema Out pane.
○ MTRL_ID
○ MTRL_TYPE
○ IND_SECTOR
○ MRTL_GROUP
○ SHORT_TEXT
3. Click the back arrow from the icon menu.
4. Click Query_Pipeline to open the query editor.
5. Map each Schema In column to the corresponding columns in the Schema Out pane.
When you drag each column from the Schema In pane to the Schema Out pane, the Type in the Schema
Out pane remains the same even though the input fields have the type varchar(1024).
Optional. For an experiment, remap one of the fields. After you drop the field into the Schema Out pane, a
popup menu appears. Choose Remap Column. The Remap Column option preserves the name and data
type in Schema Out.
6. In the Project Area, click the MTRL_DIM target table to open the target editor.
Tutorial
78 PUBLIC Populate the Material Dimension from an XML File
7. Open the Options tab in the lower pane and select Delete data from table before loading.
This option deletes existing data in the table before loading new data. If you do not select this option, the
software appends data to the existing table.
8. In the project area, click DF_MTRL_Pipe to return to the data flow.
9. Select the Validate icon from the menu.
The Warnings tab opens. The warnings indicate that each column will be converted to the data type in the
Schema Out pane.
There should not be any errors. If there are errors, you may have missed a step. Fix the errors and try to
validate again.
10. In the Project Area, right-click JOB_Mtrl_Pipe and click Execute.
11. If prompted to save your work, click OK.
12. Accept the default settings in Execution Properties and click OK.
13. After the job completes, ensure that there are no error or warning messages.
14. To view the captured sample data, in the project area select the data flow to open it in the workspace. Click
the magnifying glass on the target MTRL.DIM table to view the six rows of data.
Alternately, use a query tool in your RDBMS to check the contents of the MTRL.DIM table.
In this section you learned two ways to process an XML file: With a Query transform and with the XML Pipeline
transform.
We walked you through using a Query transform to flatten a nested schema. And we worked with a data type
definition (DTD) file for a source XML file.
If you are unclear about how Data Services processes XML files, and about nested data, see the Designer Guide
for more details.
At this point in the tutorial you have populated the following four tables in the sample data warehouse:
In the next section you will populate the sales fact table from more than one source.
Parent topic: Populate the Material Dimension from an XML File [page 68]
Tutorial
Populate the Material Dimension from an XML File PUBLIC 79
Related Information
Tutorial
80 PUBLIC Populate the Material Dimension from an XML File
8 Populate the Sales Fact Table from
Multiple Relational Tables
In this exercise you learn about using joins and functions to populate the Sales Fact table from the Sales star
schema with data from multiple relational tables.
The exercise joins data from two source tables and loads it into a target table.
1. Adding the SalesFact job, work flow, and data flow [page 82]
Use the basic skills that you have learned in earlier exercises to set up a new job named JOB_SalesFact.
2. Creating the SalesFact data flow [page 82]
Add objects to DF_SalesFact and connect the objects to set the flow of data.
3. Defining the details of the Query transform [page 83]
Set up a table join, a filter, and a Lookup expression in the query transform, and then map columns
from the Schema In columns to the Schema Out columns.
4. Using a lookup_ext function for order status [page 85]
Create a Lookup expression to select a column from the ODS_DELIVERY table to include in the
SALES_FACT output table based on two conditions.
5. Validating the SalesFact data flow [page 89]
Use the skills you obtained from previous exercises to validate the data flow.
6. Executing the SalesFact job [page 89]
After you have performed the validation step and fixed any errors, the SalesFact job should execute
without errors.
7. Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]
Use the metadata reporting tool to browse reports about metadata associated with the SalesFact job.
The metadata reporting tool is a Web-based application.
8. Summary and what to do next [page 93]
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 81
In this section you joined two source tables using a filter, and you used a Lookup expression to add a
column from a related table that was not one of the source tables.
Related Information
Reference Guide: Transforms, Platform transforms, Query transform, Joins in the Query transform
Designer Guide: Nested data, Operations on nested data
8.1 Adding the SalesFact job, work flow, and data flow
Use the basic skills that you have learned in earlier exercises to set up a new job named JOB_SalesFact.
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Related Information
Add objects to DF_SalesFact and connect the objects to set the flow of data.
Optional. The data flow has two sources. To make the workspace look more organized, change the appearance
of the data flow by following these steps:
Tutorial
82 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Follow these steps to set up the data flow:
1. Click the DF_SalesFact data flow in the Project Area to open the data flow workspace.
2. Open the Datastores tab in the Local Object Library and expand the Tables node under ODS_DS.
3. Move the ODS_SALESITEM table to the left side of the workspace using drag and drop. Click Make Source.
4. Move the ODS_SALESORDER table to the left of the ODS_SALESITEM table in the workspace using drag and
drop. Click Make Source.
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Previous task: Adding the SalesFact job, work flow, and data flow [page 82]
Next task: Defining the details of the Query transform [page 83]
Set up a table join, a filter, and a Lookup expression in the query transform, and then map columns from the
Schema In columns to the Schema Out columns.
1. Expand DF_SalesFact in the Project Area and click the query to open the editor.
2. Open the FROM tab in the options pane.
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 83
3. Click the dropdown arrow under the Left column heading in the Join pairs area and select
ODS_SALESORDER.
The ODS_SALESITEM is now the right portion of the join. Leave the Join Type set to Inner join.
The software defines the relationship between the SalesItem and SalesOrder tables by using the key
column Sales_Order_Number. The inner join type generates a join expression based on primary and foreign
keys and column names. The SALES_ORDER_NUMBER column is the primary key in ODS_SLAESORDER
table and the foreign key in the ODS_SALESITEM table. The relationship states that the fields in each table
should match before the record is joined.
SALESITEM.SALES_ORDER_NUMBER = SALESORDER.SALES_ORDER_NUMBER
5. Click the elipses icon next to the Right table name ODS_SALESITEM.
These lines filter the sales orders by date. All orders that are from January 1, 2007 up to and including
December 31, 2007 are moved into the target.
Tip
As you type the function names, the Smart Editor prompts you with options. Either ignore the prompts
and keep typing or select an option that is highlighted and press Enter . You can alternately double-
click the prompt to accept it.
8. Click OK.
The join conditions that you added in the Smart Editor appear in the Join Condition column and in the
FROM Clause area.
9. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop.
Tutorial
84 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Source table Source column Target column Column description
10. Keep the Query Editor open for the next task.
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Next task: Using a lookup_ext function for order status [page 85]
Create a Lookup expression to select a column from the ODS_DELIVERY table to include in the SALES_FACT
output table based on two conditions.
Note
We use the following two methods to add expressions in the Select Parameters dialog box:
○ Drag column names into the target columns under Condition, Output, and Order by sections.
○ Click the ellipses button to open the Smart Editor.
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 85
Lookup_ext option settings
Option Procedure
Lookup table 1. Select the Lookup table dropdown arrow. The Input
Parameter dialog box opens.
Note 2. Select Datastore from the Look in dropdown arrow.
The lookup table is where the LOOKUP_EXT() func 3. Select ODS_DS and click OK.
tion obtains the value to put into the ORD_STATUS 4. Select the ODS_DELIVERY table and click OK.
column.
Available parameters 1. Expand the Lookup table node at left and then expand
the ODS_DELIVERY node to expose the columns in
You choose parameters to build conditions from the tables
the table.
that you define here.
2. Expand the Input Schema node and then expand the
ODS_SALESITEM node to expose the columns in the
table.
Tutorial
86 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Option Procedure
Conditions Condition 1:
Note ODS_DELIVERY.DEL_SALES_ORDER_NUMBER
= ODS_SALESITEM.SALES_ORDER_NUMBER
Conditions identify the rules the software follows to
determine what value to output for the ORD_STATUS 1. Under ODS_DELIVERY, move the
column. DEL_SALES_ORDER_NUMBER column to the
Conditions area under the Column in lookup table col
umn using drag and drop.
Note
2. Verify that the operator column, OP.(&), automati
Set up two conditions for this expression because cally sets to =.
there is a one to many relationship between the 3. Click the ellipses under the Expressions column to
SALES_ORDER_NUMBER column and the open the Smart Editor.
SALES_LINE_ITEM_ID column. For example, the
4. Expand the ODS_SALESITEM node and move the
SALES_ORDER_NUMBER column value PT22221000
SALES_ORDER_NUMBER column to the right side us
has two SALES_LINE_ITEM_ID values: IT100 and
ing drag and drop.
IT102.
5. Click OK.
Condition 2:
ODS_DELIVERY.DEL_ORDER_ITEM_NUMBER =
ODS_SALESITEM.SALES_LINE_ITEM_ID
Output parameters specify the column in the Lookup 2. Leave all other options as they are.
table that contains the value to put in the
ORD_STATUS column in the query.
The following image shows the completed Select Parameters dialog box.
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 87
The final lookup function displays in the Mapping tab and looks as follows:
lookup_ext([ODS_DS.DBO.ODS_DELIVERY,'PRE_LOAD_CACHE','MAX'],
[DEL_ORDER_STATUS],[NULL],
[DEL_SALES_ORDER_NUMBER,'=',ODS_SALESITEM.SALES_ORDER_NUMBER,DEL_ORDER_ITEM_NU
MBER,'=',ODS_SALESITEM.SALES_LINE_ITEM_ID]) SET
("run_as_separate_process"='no', "output_cols_info"='<?xml version="1.0"
encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/>
</output_cols_info>' )
7. Click Finish.
To look at the expression for ORD_STATUS again, select the ORD_STATUS column from the Schema Out
pane in the Query editor and open the Mapping tab in the options pane.
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Tutorial
88 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Previous task: Defining the details of the Query transform [page 83]
Use the skills you obtained from previous exercises to validate the data flow.
Possible errors could result from an incorrect join condition clause or other syntax error.
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Previous task: Using a lookup_ext function for order status [page 85]
After you have performed the validation step and fixed any errors, the SalesFact job should execute without
errors.
No error notifications should appear in the status window. You might see a warning notification indicating
that a conversion from a date to datetime value occurred.
2. Accept the settings in the Execution Properties dialog box and click OK.
3. Click DF_SalesFact in the Project Area to open it in the workspace.
4. Click the magnifying-glass icon on the target table SALES_FACT to view 17 rows of data. Compare
Example
The following diagram shows how all of the tables are related, and breaks up the steps that you
completed in the Query editor to help you understand the relationships of the three tables and why you
set up conditions for the Lookup expression.
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 89
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Tutorial
90 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Previous task: Validating the SalesFact data flow [page 89]
Next task: Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]
Use the metadata reporting tool to browse reports about metadata associated with the SalesFact job. The
metadata reporting tool is a Web-based application.
View information about the Sales_Fact target table to find out when the table was last updated and used. Also
see the related source tables and column mappings.
Use the Settings options to make sure that you are viewing the applicable repository and to refresh source
and column data.
5. Check the name in Repository to make sure that it contains the current repository.
6. Open the Refresh Usage Data tab to make sure that it lists the current job server.
7. Click Calculate Column Mapping.
The software calculates the current column mapping and notifies you when it is sucessfully complete.
8. Click Close.
9. In the file tree at left, expand Datastores and then Target_DS to view the list of tables.
10. Expand Data Flow Column Mapping Calculation in the right pane to view the calculation status of each data
flow.
11. Double-click the SALES_FACT table under Target_DS in the file tree.
The Overview tab for SALES_FACT table opens at right. The Overview tab displays general information
about the table such as the table datastore name and the table type.
12. Click the Lineage tab.
The following Lineage tab displays the sources for the SALES_FACT target table. When you move the
pointer over a source table icon, the name of the datastore, data flow, and owner appear.
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 91
13. In the SALES_FACT table in the file tree, double-click the ORD_STATUS column.
The Lineage tab in the right-pane refreshes to show the lineage for the column. For example, you should
see that the SALES_FACT.ORD_STATUS column is related to information in the following source columns:
○ ODS_DELIVERY.DEL_ORDER_STATUS
○ ODS_SALESITEM.SALES_LINE_ITEM_ID
○ ODS_SALESITEM.SALES_ORDER_NUMBER
These relationships show the source columns that you defined in the LOOKUP_EXT() function in Using a
lookup_ext function for order status [page 85].
14. Print the reports by selecting the print option in your browser. For example, for Windows Internet Explorer,
select File Print .
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Tutorial
92 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
8.8 Summary and what to do next
In this section you joined two source tables using a filter, and you used a Lookup expression to add a column
from a related table that was not one of the source tables.
We covered a lot of information in this section. Feel free to go back over the steps, examine the data in the
source and target tables using the magnifying glass icon in the data flow, and look at the example provided to
really understand the results of the settings you made in the query transform.
At this point in the tutorial, you have populated all five tables in the sales data warehouse:
The next section shows you how to use the change data capture feature in Data Services.
For more information about Impact and Lineage reports, see the Management Console Guide.
For more information about the Lookup expression, functions, and filters, see the Designer Guide.
Parent topic: Populate the Sales Fact Table from Multiple Relational Tables [page 81]
Previous task: Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]
Related Information
Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 93
9 Changed data capture
Changed data capture (CDC) extracts only new or modified data after you process an initial load of the data to
the target system.
● Initial load job: A job that loads all of the rows from a source to a target.
● Delta load job: A job that uses CDC to load changed and new data to the target table.
● Initialization script that sets date values for global variables named $GV_STARTTIME and $GV_ENDTIME.
● Data flow that loads only the rows changed or added between the $GV_STARTTIME and $GV_ENDTIME
● Termination script that updates a database table that stores the last $GV_ENDTIME
In the initial load job, the software establishes a baseline by assigning the date and time for each row in the data
source. In the delta load job, the software determines which rows are new or changed based on the last date
and time data.
The target database contains a job status table called CDC_time. The software stores the last date and time
data for each row in CDC_time. When you execute the delta load job, it updates that date and time for the next
execution.
Adding the initial load job and defining global variables [page 95]
Create a job and then create two global variables for the job. The global variables serve as placeholders
for job execution start and end time stamps.
Tutorial
94 PUBLIC Changed data capture
9.1 Global variables
The initial job contains the usual objects that other jobs contain, but it also serves as a baseline for the source
data by using global variables.
Global variables are global within a job only. For example, you create a global variable while you set up a specific
job, and the variable is not available for other jobs.
Global variables provide you with maximum flexibility at runtime. For example, during production you can
change default values for global variables at runtime from a job's schedule or SOAP call without having to open
a job in the Designer.
For this exercise, you set values for global variables in script objects. You can also set values for global variables
in external jobs, job execution, or job schedule properties. For complete information about using global
variables in Data Services, see the Designer Guide.
Related Information
Adding the initial load job and defining global variables [page 95]
Replicating the initial load data flow [page 100]
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Summary and what to do next [page 105]
9.2 Adding the initial load job and defining global variables
Create a job and then create two global variables for the job. The global variables serve as placeholders for job
execution start and end time stamps.
1. Create a new batch job in the Class_Exercises project and name it JOB_CDC_Initial.
2. With the job name selected in the Project Area, select Tools Variables .
The Variables and Parameters editor opens. It displays the job name in the Context header.
3. Right-click Global Variables and click Insert.
Tutorial
Changed data capture PUBLIC 95
8. Create another global variable following the same steps. Name the second global variable $GV_ENDTIME
and set the Data Type to datetime.
9. Close the Variables and Parameters editor by clicking the X in the upper right corner.
10. Save your work.
Related Information
Use scripts in the job to assign values to the global variables that you just created.
1. Select the JOB_CDC_Initial job in the Project Area to open it in the workspace.
2. Add a new workflow to the job using the tool pallet and name it WF_CDC_Initial.
3. Click WF_CDC_Initial in the Project Area to open it in the workspace.
4. Click the script icon in the tool palette and click in the workspace to add a script.
5. Name the script SET_START_END_TIME.
6. Add a data flow from the tool pallet to the workflow workspace and name it DF_CDC_Initial.
7. Add another script to the right of the data flow object using the tool pallet.
8. Name the script UPDATE_CDC_TIME_TABLE.
9. Connect the objects to set the direction of the data flow.
Tutorial
96 PUBLIC Changed data capture
Designate values for the global variables and add functions to instruct how the job should output data.
2. Defining the data flow [page 98]
Add a query and a target template table to the data flow.
3. Defining the QryCDC query [page 99]
Map the output schema in the QryCDC query and add a function that checks the date and time.
Related Information
Designate values for the global variables and add functions to instruct how the job should output data.
When you define scripts, make sure that you follow your database management syntax rules. For more
information about creating scripts in Data Services, see the Designer Guide.
Before you define the scripts, check the date and time in the existing database to make sure you use a date in
the script that includes all of the records.
● Open the Datastores tab in the Local Object Library and expand the Tables node under ODS_DS.
● Double-click the ODS_CUSTOMER source table to open the editor in the workspace.
● Open the View Data tab in the lower pane to see a sample of the source data.
● Look in the CUST_TIMESTAMP column and see that the timestamp for all records is 2008.03.27 00:00:00.
● Close the editor.
1. Expand the WF_CDC_Inital node in the Project Area and click the SET_START_END_TIME script to open
it in the workspace.
2. Enter the following script directly in the text area using the syntax applicable for your database. As you
start to type the global variable name, a dropdown list of variables appears. Double-click the variable name
from the list to add it to the string.
Tutorial
Changed data capture PUBLIC 97
4. Close the script editor for SET_START_END_TIME.
5. Select the UPDATE_CDC_TIME_TABLE script in the Project Area to open the script editor in the workspace.
6. Enter the following script directly in the text area using the syntax applicable for your database.
Note
Our example assumes that the user name is ODS. Ensure that you use the applicable user name that
you defined for the table. For example, the file name ODS.CDC_TIME indicates that the owner is ODS. If
the owner name is DBO, type the name as DBO.CDC_TIME.
The script resets the $GV_ENDTIME value in the CDC_TIME job status table with a new end time based on
when the software completes the job execution.
7. Click the Validate icon to validate the script.
You may receive a validation warning that the DATETIME data type will be converted to VARCHAR. Data
Services always preserves the data type in output schema, so ignore the warning.
8. Close the UPDATE_CDC_TIME_TABLE script file and save your work.
Task overview: Adding a workflow, scripts, and data flow [page 96]
With a target template table, you do not have to specify the table schema or import metadata. Instead, during
job execution, Data Services has the DBMS create the table with the schema defined by the data flow.
Template tables appear in the Local Object Library under each datastore.
Tutorial
98 PUBLIC Changed data capture
6. Add the Template Tables icon to the right of the query in the workspace using drag and drop.
The Drop and re-create table option is selected by default. You can leave the option selected.
11. Click the back arrow icon in the upper toolbar to close the target table definition and return to the data flow
workspace.
12. Save your work.
Task overview: Adding a workflow, scripts, and data flow [page 96]
Map the output schema in the QryCDC query and add a function that checks the date and time.
Note
You can drag the column CUSTOMER.CUST_TIMESTAMP from the Schema In pane to the Where tab, or
you can select the table name and the column name from the list that appears as you start to type the
script. The software also offers suggestions to choose from when you type the global variable name.
4. Click the Validate icon on the toolbar to validate the query statement.
Tutorial
Changed data capture PUBLIC 99
Fix any syntax errors and revalidate if necessary.
5. Select JOB_CDC_Initial in the Project Area and click the Validate All icon on the toolbar.
Fix any errors and revalidate if necessary. Ignore the warning about DATETIME converted to VARCHAR.
6. Save your work.
Task overview: Adding a workflow, scripts, and data flow [page 96]
A new data flow appears in the Data Flow list with “Copy_1” added to the workflow name.
3. Rename the copy dataflow DF_CDC_Delta.
4. Double-click the DF_CDC_Delta data flow to open it in the workspace.
5. Double-click CUST_CDC target table in the workspace to open the Template Target Table editor.
6. Open the Options tab in the lower pane.
7. Deselect the following options: Delete data from table before loading and Drop and re-create table.
Delete data from table before loading Does not delete any of the data in the CUST_CDC target
table before loading changed data to it from the
JOB_CDC_Delta job.
Drop and re-create table Does not drop the existing CUST_CDC table and create a
new table with the same name. Preserves the existing
CUST_CDC table.
Related Information
Tutorial
100 PUBLIC Changed data capture
Adding the initial load job and defining global variables [page 95]
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Summary and what to do next [page 105]
Build the job JOB_CDC_Delta by adding the WF_CDC_Delta workflow and two new scripts.
Adding the job and defining the global variables [page 102]
Create global variables specifically for the delta-load job.
Related Information
Tutorial
Changed data capture PUBLIC 101
9.4.1 Adding the job and defining the global variables
1. Create a new batch job for Class_Exercises and name the job JOB_CDC_Delta.
2. Select the JOB_CDC_Delta job name in the Project Area and click Tools Variables .
The Variables and Parameters editor opens. Notice that the job name displays in the Context box.
3. Right-click Global Variables and click Insert.
4. Double-click the new variable to open the editor. Rename the variable $GV_STARTTIME.
5. Select datetime for the Data type and click OK.
6. Follow these steps to create the global variable $GV_ENDTIME.
7. Close the Variables and Parameters dialog box and save your work.
Note
Because you create global variables for a specific delta-load job, the software does not consider them
as duplicates to the variables that you created for the initial-load job.
1. Double-click the SET_NEW_START_END_TIME script in the workspace to open the script editor.
2. Enter the following text directly in the text area using the syntax applicable for your database.
This script defines the start time global variable to be the last date and time stamp recorded in the
CDC_Time table. The end time global variable equals the system date.
3. Validate the script by clicking the Validate icon. Fix any syntax errors.
4. Select the UPDATE_CDC_TIME_TABLE script in the Project Area to open the script editor in the workspace.
5. Enter the following text directly in the text area using the syntax applicable for your database.
Tutorial
102 PUBLIC Changed data capture
For Oracle, type the following text:
This script replaces the end time in the CDC_Time table to the end time for the Delta job.
6. Validate the script and fix any syntax errors.
Ignore any warnings about the datetime data type being converted to VarChar.
7. Click the JOB_CDC_delta job name in the Project Area and click the Validate All icon. Correct any errors,
and ignore any warnings for this exercise.
8. Save your work.
After you carefully set up both the initial load job and the delta load job, it is time to execute them.
Execute the initial load job named JOB_CDC_Initial and view the results by looking at the sample data.
Because this is the initial job, the software should return all of the rows.
To see how CDC works, open the source data in your DBMS and alter the data as instructed. Then execute the
JOB_CDC_Delta job. The job extracts the changed data and updates the target table with the changed data.
View the results to see the different time stamps and to verify that only the changed data was loaded to the
target table.
Related Information
Tutorial
Changed data capture PUBLIC 103
9.5.1 Executing the initial load job
The initial load job outputs all of the source data to the target table, and adds the job execution date and time to
the data.
Use your DBMS to view the data in the ODS_CUSTOMER table. There are a total of 12 rows. The columns are the
same as the columns that appear in the Schema In pane in the QryCDC object.
If the job fails, click the error icon to read the error messages. You can have script errors even if your script
validation was successful.
3. After successful execution, click the monitor icon and view the Row Count column to determine how
many rows were loaded into the target table. The job should return all 12 rows.
You can also check this row count by opening the data flow and clicking the View Data icon (magnifying
glass) on the target table.
To see CDC in action, change the CUSTOMER table and execute the delta load job.
Note
If your database does not allow nulls for some fields, copy the data from another row.
Cust_ID ZZ01
Cust_Classf ZZ
Name1 EZ BI
ZIP ZZZZZ
Tutorial
104 PUBLIC Changed data capture
Column name Value
Cust_Timestamp A date and time that is equal to or later than the time shown in the CDC_TIME
status table after the initial job execution. To be sure that you enter a valid time,
look at the value in the CDC_TIME table using your DBMS.
You can now go on to the next section, which describes how to verify and improve the quality of your source
data.
For more information about the topics covered in this section, see the Designer Guide.
Related Information
Tutorial
Changed data capture PUBLIC 105
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Data Assessment [page 107]
Tutorial
106 PUBLIC Changed data capture
10 Data Assessment
Use data assessment features to identify problems in your data, separate out bad data, and audit data to
improve the quality and validity of your data.
Data Assessment provides features that enable you to trust the accuracy and quality of your source data.
The exercises in this section introduce the following Data Services features:
● Data profiling that pulls specific data statistics about the quality of your source data.
● Validation transform in which you apply your business rules to data and separate bad data from good data.
● Audit dataflow that outputs invalid records to a separate table.
● Auditing tools in the Data Services Management Console.
Related Information
Tutorial
Data Assessment PUBLIC 107
10.1 Default profile statistics
The Data Profiler executes on a profiler server to provide column and relationship information about your data.
The software reveals statistics for each column that you choose to evaluate. The following table describes the
default statistics.
Statistic Description
● Basic profiling. Minimum value, maximum value, average value, minimum string length,
and maximum string length.
● Detailed profiling. Distinct count, distinct percent, median, median string length, pat
tern count, and pattern percent.
Distincts The total number of distinct values out of all records for the column.
Nulls The total number of NULL values out of all records in the column.
● If the column contains alpha data, the minimum value is the string that comes first al
phabetically.
● If the column contains numeric data, the minimum value is the lowest numeral in the
column.
● If the column contains alphanumeric data, the minimum value is the string that comes
first alphabetically and lowest numerically.
● If the column contains alpha data, the maximum value is the string that comes last al
phabetically.
● If the column contains numeric data, the maximum value is the highest numeral in the
column.
● If the column contains alphanumeric data, the maximum value is the string that comes
last alphabetically and highest numerically.
Related Information
Tutorial
108 PUBLIC Data Assessment
Viewing audit details in Operational Dashboard reports [page 117]
Summary and what to do next [page 119]
Designer Guide: Data Assessment, Using the Data Profiler
Use profile statistics to determine the quality of your source data before you extract, transform, and load it.
1. Open the Datastores tab in the Local Object Library and expand ODS_DS Tables .
2. Right-click the ODS_CUSTOMER table and select View Data.
3. Click the Profile tab icon , which is the second tab from the left in View Data.
The Profile tab opens showing all of the columns in the table, and the basic column profile information.
4. Select the checkboxes next to the following column names:
○ CUST_ID
○ CUST_CLASSF
○ NAME1
○ ZIP
○ CUST_TIMESTAMP
These columns are the columns that you worked with in the previous exercise.
5. Click Update.
The profile statistics appear for the selected columns. The columns that were not selected have “<Blank>”
in the statistics columns.
6. After you examine the statistics, close the View Data dialog.
For this exercise, we comply with a business rule from a fictional company. The rule requires that a target ZIP
column contain numeric data. In the last exercise for changed data capture, you added a new row of data with a
ZIP column value of “ZZZZZ”. Now we set up a validation job that changes that value to blank.
Tutorial
Data Assessment PUBLIC 109
Task overview: Data Assessment [page 107]
Related Information
The Validation transform qualifies a data set based on rules for input schema columns.
The Validation transform can output up to three data outputs: Pass, Fail, and RuleViolation. Data outputs are
based on the condition that you specify in the transform. You set the data outputs when you connect the
output of the Validation transform with a Pass object and a Fail object in the workspace.
For this exercise, we set up a Pass target table for the first job execution. Then we alter the first job by adding a
Fail target table with audit rules.
Related Information
Tutorial
110 PUBLIC Data Assessment
Summary and what to do next [page 119]
1. Add a new job to the Class_Exercises project. Name the job JOB_CustGood.
2. With the job opened in the workspace, add a Data Flow icon to the job from the tool pallet. Name the data
flow DF_CustGood.
3. With DF_CustGood opened in the workspace, expand ODS_DS Tables in the Datastore tab in the
Local Object Library.
4. Move the ODS_CUSTOMER table to the DF_CustGood data flow in the workspace using drag and drop.
5. Select Make Source.
6. Open the Transform tab In the Local Object Library and expand the Platform node.
7. Move the Validation icon for the Validation transform to the right of the ODS_CUSTOMER table in the data
flow using drag and drop.
8. Expand TARGET_DS Template Tables in the Datastores tab of the Local Object Library.
9. Move the Template Tables icon to the right of the Validation transform in the workspace using drag and
drop.
10. In the Create Template dialog box, name the template table Cust_Good.
11. Connect the ODS_CUSTOMER source table to the Validation transform.
12. Connect the Validation transform to the CUST_GOOD target table and select Pass from the dialog box.
The Pass option requires that the software pass all rows, even rows that fail the validation rules, to the
target table.
Tutorial
Data Assessment PUBLIC 111
13. Save your work.
Related Information
Create a validation rule to find records that contain data in the ZIP column that does not comply with your
format rules.
1. With DF_CustGood open in the workspace, double-click the Validation transform to open the Transform
Editor.
2. Open the Validation Rules tab in the lower pane and click Add located in the upper right of the pane.
The Rule Editor dialog box opens.
3. Create a rule that requires data in the ZIP column to contain 5-digit stings. Complete the options as
instructed in the following table.
Option Instruction
Enabled Select.
Note
When you enable a validation rule for a column, a
check mark appears next to the column in the
Schema in pane.
Note
Send to Pass causes the software to pass the row to
the target table even if it fails the
5_Digit_ZIP_Column_Rule rule.
Tutorial
112 PUBLIC Data Assessment
Option Instruction
Note
This condition causes the software to check that the
ZIP column contains 5-digit strings.
4. Click OK.
The rule appears in the Rules list of the Rule Editor pane.
5. In the Rule Editor, select the checkbox under the Enabled column in the lower pane under If any rule fails
and Send to Pass, substitute with.
6. Double-click the cell next to the checked Enabled cell, under Column.
This substitutes “<Blank>” for the ZIP values that do not pass the rule because they do not contain a 5-
digit string in the ZIP column.
Tutorial
Data Assessment PUBLIC 113
9. Click the Validate icon in the Designer tool bar to verify that the Validation transform does not have syntax
errors.
10. Execute the Job_CustGood job.
Open DF_CustGood in the workspace. Click the magnifying-glass icons on the ODS_CUSTOMER source table
and the CUST_GOOD target table to view the data in the lower pane. The value in the ZIP column for the record
CUST_ID ZZ01 shows “<Blank>” in the target table.
Related Information
Collect audit statistics on the data that flows out of any object, such as a source, transform, or target.
In the next exercise, we set up the validation job JOB_CustGood to output failed records to a fail target table
instead of to the pass target table named CUST_GOOD.
For this exercise, we define the following objects in the Audit dialog box:
Object Description
Audit point The object in a data flow where you collect audit statistics.
Audit function The audit statistic that the software collects for the audit
points. For this exercise, we set up a Count audit function on
the source and pass target tables. Count collects the follow
ing statistics:
Audit label The unique name in the data flow that the software gener
ates for the audit statistics collected for each audit function
that you define.
Audit rule A Boolean expression in which you use audit labels to verify
the job.
Audit action on failure Action the software takes when there is a failure.
For a complete list of audit objects and descriptions, see the Designer Guide.
Tutorial
114 PUBLIC Data Assessment
Add an additional target table to the DF_CustGood data flow to contain the records that fail the rule.
Related Information
Add an additional target table to the DF_CustGood data flow to contain the records that fail the rule.
Send to Fail causes Data Services to send the rows that fail the 5_Digit_ZIP_Column_Rule to a fail target
table.
6. Close the Transform Editor.
7. With the DF_CustGood data flow open in the workspace, expand TARGET_DS Template Tables .
8. Move the Template Tables icon to the dataflow using drag and drop.
9. Enter Cust_Bad_Format in Template name in the Create Template dialog box. Click OK.
10. Draw a connection from the Validation transform to the Cust_Bad_Format target table and select the Fail
option.
Your data flow should look as follows:
Tutorial
Data Assessment PUBLIC 115
11. Save your work.
The following steps set up two rule expressions in the DF_CustGood: One for the source table ODS_CUSTOMER
and one for pass target table CUST_GOOD.
1. With DF_CustGood opened in the workspace, click the Audit icon. in the upper tool bar.
1 $Count_ODS_CUSTOMER $Count_CUST_GOOD
2 = =
3 $CountError_ODS_CUSTOMER $CountError_CUST_GOOD
7. In the Action on failure group at the right of the pane, deselect Raise exception.
If there is an exception, the job stops processing. We want the job to continue processing so we deselect
Raise exception.
Tutorial
116 PUBLIC Data Assessment
8. Click Close to close the Audit pane.
Notice that the DF_CustGood data flow indicates the audit points by displaying the audit icon on the right
side of the ODS_Customer source table and Cust_Good target table. The audit points are where the
software collects audit statistics.
9. Click the Validate All icon in the upper tool bar to verify that there are no errors.
10. Save your work.
11. Execute Job_CustGood.
Related Information
View audit details, such as an audit rule summary and audit labels and values, in the SAP Data Services
Management Console.
When your administrator installs SAP Data Services, they also install the Management Console. If you do not
have access credentials, contact your system administrator. For more information about the Management
Console, see the Management Console Guide.
1. In the Designer menu bar, click Tools Data Services Management Console .
2. Log in to Management Console using your access credentials.
The Dashboard opens with statistics and data. For more information about the Dashboard, see the
Management Console Guide.
4. Select the JOB_CustGood job in the table at right.
If the job doesn't appear in the table, adjust the Time Period dropdown list to a longer or shorter time
period as applicable.
Tutorial
Data Assessment PUBLIC 117
The Job Execution Details pane opens showing the job execution history and a chart depicting the
execution history.
5. Select the JOB_CustGood in the Job Execution History table.
The data flow should contain Yes in the Contains Audit Data column.
Three graphs appear at right: Buffer Used, Row Processed, and CPU Used. Read about these graphs in the
Management Console Guide.
7. Select View Audit Data located just above the View Audit Data table.
The Audit Details window opens containing the following information:
Audit Rule Summary shows the audit rules that you created in the Validation
transform.
Audit Rule Failed The audit rule that was violated after data processing.
The validation rule required that all records comply with the 5_Digit_ZIP_Column_Rule. One record failed the
rule. The audit rules that you created requires that the row count be equal. One row failed the validation rule.
The details show that the values are not equal: 13 > 12.
Tutorial
118 PUBLIC Data Assessment
Related Information
After the job executes, open the fail target table to view the failed record.
Open the data flow in the workspace and click the magnifying icon in the lower right corner of the
CUST_BAD_FORMAT target table. The CUST_BAD_FORMAT target table contains one record. In addition to the
fields selected for output, the software added and populated three additional fields for error information:
● DI_ERRORACTION = F
● DI_ERRORCOLUMNS = Validation failed rule(s): ZIP
● DI_ROWID = 1.000000
These are the rule violation output fields that are automatically included in the Validation transform. For
complete information about the Validation transform, see the Reference Guide.
In this exercise you learned a few methods to view data profile information, set audit rules, and assess your
output data.
There are many more ways to use the software for data assessment. Learn more about profiling your data and
data assessment in the Designer Guide. The following lists the methods that we used in this exercise to profile
and audit data details:
● View table data and use the profile tools to view the default profile statistics.
● Use the Validation transform in a data flow to find records in your data that violate a data format
requirement in the ZIP field.
● Add an additional template table for records that fail an audit rule.
● Create an audit expression and an action for when a record fails the expression.
● View audit details in the Management Console Operational Dashboard reports
Tutorial
Data Assessment PUBLIC 119
The next section shows how to design jobs that are recoverable if the job malfunctions, crashes, or does not
complete.
Related Information
Tutorial
120 PUBLIC Data Assessment
11 Recovery Mechanisms
Use Data Services recovery mechanisms to set up automatic recovery or to recover jobs manually that do not
complete successfully.
A recoverable work flow is one that can run repeatedly after failure without loading duplicate data. Examples of
failure include source or target server crashes or target database errors that could cause a job or work flow to
terminate prematurely.
This exercise creates a recoverable job that loads the sales organization dimension table that was loaded in the
section Populate the Sales Organization dimension from a flat file [page 37]. You reuse the data flow
DF_SalesOrg from that exercise to complete this exercise.
Tutorial
Recovery Mechanisms PUBLIC 121
11.1 Recoverable job
Create a job that contains three objects that are configured so that the job is recoverable.
The recoverable job that you create in this section contains the following objects:
Related Information
Local variables contain information that you can use in a script to determine when a job must be recovered.
In previous exercises you defined global variables. Local variables differ from global variables. Use local
variables in a script or expression that is defined in the job or work flow that calls the script.
1. Open the Class_Exercises project in the Project Area and add a new job named JOB_Recovery.
A new variable appears named $NewVariableX where X indicates the new variable number.
4. Double-click $NewVariableX and enter $recovery_needed for Name.
5. Select int from the Data type dropdown list.
6. Follow the same steps to create another local variable.
7. Name the variable $end_time and select varchar(20) from the Data type dropdown list.
Tutorial
122 PUBLIC Recovery Mechanisms
Task overview: Recovery Mechanisms [page 121]
Related Information
Create a script that checks the $end_time variable to determine if the job completed properly.
The script reads the ending time in the status_table table that corresponds to the most recent start time. If
there is no ending time for the most recent starting time, the software determines that the prior data flow must
not have completed properly.
1. With JOB_Recovery opened in the workspace, add a script to the left side of the workspace and name it
GetWFStatus.
2. Open the script in the workspace and type the script directly into the Script Editor. Make sure that the
script complies with syntax rules for your DBMS.
For Microsoft SQL Server or SAP ASE, enter the following script:
Sample Code
Tutorial
Recovery Mechanisms PUBLIC 123
from status_table where start_time = (select max(start_time)from
status_table)');
if ($end_time IS NULL or $end_time = '')$recovery_needed = 1;
else $recovery_needed = 0;
Sample Code
Related Information
11.4 Conditionals
Conditionals are single use objects, which means they can only be used in the job for which they were created.
Define a conditional for this exercise to specify a recoverable data flow. To define a conditional, you specify a
condition and two logical branches:
Tutorial
124 PUBLIC Recovery Mechanisms
Conditional branch Description
Then Work flow elements to execute when the “If” expression eval
uates to TRUE.
Else (Optional) Work flow elements to execute when the “If” ex
pression evaluates to FALSE.
Related Information
2. Click the conditional icon on the tool palette then click in the workspace to the right of the script
GetWFStatus.
3. Name the conditional recovery_needed.
4. Double-click the conditional in the workspace to open the Conditional Editor.
Tutorial
Recovery Mechanisms PUBLIC 125
○ A space for specifying the work flow to execute when the if condition evaluates to TRUE. For example, if
condition = true, then perform the task in the Then space.
○ A space for specifying the work flow to execute when the if condition evaluates to FALSE. For example,
if condition does not equal true, run the task in the Else space,
5. Type the following text into the if text box to state the condition.
($recovery_needed = 1)
Complete the conditional by specifying the work flows to execute for the If and Then conditions.
Complete the conditional by specifying the data flows to use if the conditional equals true or false.
Follow these steps with the recovery_needed conditional open in the workspace:
1. Open the Data Flow tab In the Local Object Library and move DF_SalesOrg to the Else portion of the
Conditional Editor using drag and drop.
You use this data flow for the “false” branch of the conditional.
2. Right-click DF_SalesOrg in the Data Flow tab in the Local Object Library and select Replicate.
3. Name the replicated data flow ACDF_SalesOrg.
4. Move ACDF_SalesOrg to the Then area of the conditional using drag and drop.
Auto correct loading ensures that the same row is not duplicated in a target table by matching primary key
fields. See the Reference Guide for more information about how auto correct load works.
Related Information
This script updates the status_table table with the current timestamp after the work flow in the conditional has
completed. The timestamp indicates a successful execution.
1. With JOB_Recovery opened in the workspace, add the script icon to the right of the recovery_needed
conditional.
Tutorial
126 PUBLIC Recovery Mechanisms
2. Name the script UpdateWFStatus.
3. Double-click UpdateWFStatus to open the Script Editor in the workspace.
4. Enter text using the syntax for your RDBMS.
For Microsoft SQL Server and SAP ASE, enter the following text:
Connect GetWFStatus script to the recover_needed conditional, and then connect recover_needed
conditional to the UpdateWFStatus script.
8. Save your work.
Related Information
Tutorial
Recovery Mechanisms PUBLIC 127
11.6 Verify the job setup
Make sure that job configuration for JOB_Recovery is complete by verifying that the objects are ready.
Objects in JOB_Recovery
Purpose
Object
recovery_needed Conditional Specifies the workflow to execute when the “If” statement is
true or false.
UpdateWFStatus script Updates the status table with the current timestamp after
the work flow in the conditional has completed. The time
stamp indicates a successful execution.
Object Purpose
DF_SalesOrg data flow The data flow to execute when the conditional equals false.
ACDF_SalesOrg data flow The data flow to execute when the conditional equals true.
Related Information
Tutorial
128 PUBLIC Recovery Mechanisms
11.7 Executing the job
Execute the job to see how the software functions with the recovery mechanism.
Edit the status table status_table in your DBMS and make sure that the end_time column is NULL or blank.
1. Execute JOB_Recovery.
2. View the Trace messages and the Monitor data to see that the conditional chose ACDF_SalesOrg to
process.
ACDF_SalesOrg is the job that runs when the condition equals true. The condition is true because there
was no date in the end_time column in the status table. The software concludes that the previous job did
not complete and needs recovery.
3. Now execute the JOB_Recovery again.
4. View the Trace messages and the Monitor data to see that the conditional chose DF_SalesOrg to process.
DF_SalesOrg is the job that runs when the conditional equals false. The condition is false for this job
because the end_time column in the status_table contained the date and time of the last execution of the
job. The software concludes that the previous job completed successfully, and that it does not require
recovery.
Related Information
Data Services provides automated recovery methods to use as an alternative to the job setup for
JOB_Recovery.
With automatic recovery, Data Services records the result of each successfully completed step in a job. If a job
fails, you can choose to run the job again in recovery mode. During recovery mode, the software retrieves the
results for successfully completed steps and reruns incompleted or failed steps under the same conditions as
the original job.
Tutorial
Recovery Mechanisms PUBLIC 129
Data Services has the following automatic recovery settings that you can use to recover jobs:
● Select Enable recovery and Recover from last failed execution in the job Execution Properties dialog.
● Select Recover as a unit in the work flow Properties dialog.
For more information about how to use the automated recovery properties in Data Services, see the Designer
Guide.
Related Information
In this section you learned about the job recovery mechanisms that you can use to recover jobs that only
partially ran, and failed for some reason.
In this section, you learned the following methods for recovering jobs:
As with most features that you learn about in the tutorial, there is more information about the features that you
can learn in the product documentation such as the Designer Guide and the Reference Guide.
The next three sections are optional. They provide information about advanced features available in Data
Services.
Related Information
Tutorial
130 PUBLIC Recovery Mechanisms
Creating local variables [page 122]
Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Multiuser Development [page 132]
Extracting SAP application data [page 161]
Real-time jobs [page 190]
Tutorial
Recovery Mechanisms PUBLIC 131
12 Multiuser Development
Data Services enables teams of developers working on separate local repositories to store and share their work
in a central repository.
Each individual developer or team works on the application in their unique local repository. Each team uses a
central repository to store the master copy of its application. The central repository preserves all versions of all
objects in the application so you can revert to a previous version if necessary.
You can implement optional security features for central repositories. For more information about
implementing Central Repository security, see the Designer Guide.
We base the exercises for multiuser development on the following use case: Two developers use a Data
Services job to collect data for the HR department. Each developer has their own local repository and they
share a central repository. Throughout the exercises, the developers modify the objects in the job and use the
central repository to store and manage the modified versions of the objects. You can perform these exercises
by acting as both developers, or work with another person with each of you assuming one of the developer
roles.
The central object library acts as source control for managing changes to objects in an environment with
multiple users.
Display the central object library in Designer after you create it. The central object library is a dockable and
movable pane just like the project area and object library.
Through the central object library, authorized users access a library repository that contains versions of
objects saved there from their local repositories. The central object library enables administrators to manage
who can add, view and modify the objects stored in the central repository.
Users must belong to a user group that has permission to perform tasks in the central object library.
Administrators can assign permissions to an entire group of users as well as assign various levels of
permissions to the users in a group.
Example
All users in Group A can get objects from the central object library. Getting an object means you place a
copy of the object in your local repository. If the object exists in your local repository, Data Services updates
your local copy with the most recent changes. User01 in Group A has administrator rights and can add,
check in, edit, and check out objects. Plus User01 can set other user permissions for the objects.
In Designer, users check out an object from the central repository using the central object library. Once
checked out, no other user can work on that object until it is checked back into the central repository. Other
users can change their local copy of the object, but that does not affect the version in the central repository.
Tutorial
132 PUBLIC Multiuser Development
Related Information
In the central object library, there are several icons located at the top of the pane to perform the following
tasks:
The top of the pane also displays the current user group to which you belong and the name of the central
repository.
● A list of the object types based on the lower tab that you choose.
● When you check out an object, a red check mark appears over the object name in the left column.
The main area also contains several columns with information for each object:
● Check out user: The name of the user who currently has the object checked out of the library, or blank
when the object is not checked out.
● Check out repository: The name of the repository that contains the checked-out object, or blank when the
object is not checked out.
● Permission: The authorization type for the group that appears in the Group Permission box at the top.
When you add a new object to the central object library, the current group gets FULL permission to the
object and all other groups get READ permission.
● Latest version: A version number and a timestamp that indicate when the software saved this version of the
object.
● Description: Information about the object that you entered when you added the object to the library.
The central repository retains a history for all objects stored there. Developers use their local repositories to
create, modify, or execute objects such as jobs.
Tutorial
Multiuser Development PUBLIC 133
● Get objects
● Add objects
● Check out objects
● Check in objects
Task Description
Get objects Copy objects from the central repository to your local reposi
tory. If the object already exists in your local repository, the
file from the central repository overwrites the object in your
local repository.
Check out objects The software locks the object when you check it out from the
central repository. No one else can work on the object when
you have it checked out. Other users can copy a locked ob
ject and put it into their local repository, but it is only a copy.
Any changes that they make cannot be uploaded to the cen
tral repository.
Check in objects When you check the object back into the central repository,
Data Services creates a new version of the object and saves
the previous version. Other users can check out the object
after you check it in. Other users can also view the object
history to view changes that you made to the object.
Add objects Add objects from your local repository to the central reposi
tory any time, as long as the object does not already exist in
the central repository.
The central repository works like file collaboration and version control software. The central repository retains a
history for each object. The object history lists all versions of the object. Revert to a previous version of the
object if you want to undo your changes. Before you revert an object to a previous version, make sure that you
are not mistakingly undoing changes from other users.
12.3 Preparation
Your system administrator sets up the multiuser environment to include two repositories and a central
repository.
Create three repositories using the user names and passwords listed in the following table.
Tutorial
134 PUBLIC Multiuser Development
User Name Password
central central
user1 user1
user2 user2
Example
For example, with Oracle use the same database for the additional repositories. However, first add the users
listed in the table to the existing database. Make sure that you assign the appropriate access rights for each
user. When you create the additional repositories, Data Services qualifies the names of the repository
tables with these user names.
Example
For Microsoft SQL Server, create a new database for each of the repositories listed in the table. When you
create the user names and passwords, ensure that you specify appropriate server and database roles to
each database.
Consult the Designer Guide and the Management Console Guide for additional details about multiuser
environments.
Follow these steps to configure a central repository. If you created a central repository during installation, use
that central repository for the exercises.
2. From your Windows Start menu, click Programs SAP Data Services 4.2 Data Services Repository
Manager .
Tutorial
Multiuser Development PUBLIC 135
4. Select the database type from the Database type dropdown list.
5. Enter the remaining connection information for the central repository based on the database type you
chose.
6. Entral central for both User name and Password.
7. Click Create.
Data Services creates repository tables in the database that you identified.
8. Click Close.
Configure the two local repositories using the Data Services Repository Manager.
Repeat these steps to configure the user1 repository and the user2 repository.
2. From the Start menu, click Programs SAP Data Services 4.2 Data Services Repository Manager .
3. Enter the database connection information for the local repository.
4. Type the following user name and password based on which repository you are creating:
1 user1 user1
2 user2 user2
Tutorial
136 PUBLIC Multiuser Development
12.3.3 Associating repositories to your job server
You assign a Job Server to each repository to enable job execution in Data Services.
1. From the Start menu, click Programs SAP Data Services 4.2 Data Services Server Manager .
2. Click Configuration Editor in the Job Server tab.
The Job Server Properties dialog box opens. A list of current associated repositories appears in the
Associated Repositories list, if applicable.
4. Click Add under the Associated Repositories list.
The Repository Information options become active on the right side of the dialog box.
5. Select the appropriate database type for your local repository from the Database type dropdown list.
6. Complete the appropriate connection information for your database type as applicable.
7. Type user1 in both the User name and Password fields.
8. Click Apply.
The software resyncs the job server with the repositories that you just set up.
Assign the central repository named central to user1 and user2 repositories.
1. Start the Designer, enter your log in credentials, and click Log on.
2. Select the repository user1 and click OK.
3. Enter the password for user1.
Tutorial
Multiuser Development PUBLIC 137
4. Select Tools Central Repositories. .
If a prompt appears asking to overwrite the Job Server option parameters, select Yes.
10. Exit Designer.
11. Perform the same steps to connect user2 to the central repository.
Related Information
As you perform the tasks in this section, Data Services adds all objects to your local repositories.
Tutorial
138 PUBLIC Multiuser Development
Adding objects to the central repository [page 142]
After you import objects to the user1 local repository, add the objects to the central repository for
storage.
Related Information
Activate the central repository for the user1 and user2 local repositories so that the local repository has central
repository connection information.
Tutorial
Multiuser Development PUBLIC 139
The Central Repository Connections option is selected by default in the Designer list.
2. In the Central repository connections list, select Central and click Activate.
Data Services activates a link between the user1 repository and the central repository.
3. Select the option Activate automatically.
This option enables you to move back and forth between user1 and user2 local repositories without
reactivating the connection to the central repository each time.
4. Open the Central Object Library by clicking the Central Object Library icon on the Designer toolbar.
For the rest of the exercises in this section, we assume that you have the Central Object Library available in the
Designer.
Related Information
Import objects from the multiusertutorial.atl file so the objects are ready to use for the exercises.
Before you can import objects into the local repository, complete the tasks in the section Preparation [page
134].
Tutorial
140 PUBLIC Multiuser Development
1. Log in to Data Services and select the user1 repository.
2. In the Local Object Library, right-click in a blank space and click Repository Import From File .
3. Select multiusertutorial.atl located in <LINK_DIR>\Tutorial Files and click Open.
A prompt opens explaining that the chosen ATL file is from an earlier release of Data Services. The ATL
older version does not affect the tutorial exercises. Therefore, click Yes.
Another prompt appears asking if you want to overwrite existing data. Click Yes.
The multiusertutorial.atl file contains a batch job with previously created work flows and data flows.
6. Open the Project tab in the Local Object Library and double-click MU to open the project in the Project
Area.
The MU project contains the following objects:
○ JOB_Employee
○ WF_EmpPos
○ DF_EmpDept
○ DF_EmpLoc
○ WF_PostHireDate
○ DF_PostHireDate
Related Information
Tutorial
Multiuser Development PUBLIC 141
12.4.3 Adding objects to the central repository
After you import objects to the user1 local repository, add the objects to the central repository for storage.
When you add objects to the central repository, add a single object or the object and its dependents. All
projects and objects in the object library can be stored in a central repository.
Related Information
After importing objects into the user1 local repository, you can add them to the central repository for storage.
Follow these steps to add a single object from the user1 repository to the central repository:
Note
Make sure that you verify that you are using the correct library by reading the header information.
Tutorial
142 PUBLIC Multiuser Development
2. Expand the Flat Files node to display the file names.
A status Options dialog box opens to indicate that Data Services added the object successfully.
Note
If the object already exists in the central repository, the Add to Central Repository option is not active.
6. Open the Central Object Library and open the Formats tab.
Expand Flat Files to see the NameDate_Format file is now in the central repository.
Select to add an object and object dependents from the local repository to the central repository.
Log in to Data Services Designer, select the user1 repository, and enter user1 for the repository password.
○ DF_EmpDept
○ DF_EmpLoc
3. Right-click WF_EmpPos in the Local Object Library and select Add to Central Repository Object and
dependents .
Instead of choosing the right-click options, you can move objects from your local repository to the central
repository using drag and drop. The Version Control Confirmation dialog box opens. Click Next and the click
Next again so that all dependent objects are included in the addition.
The comment appears for the object and all dependents when you view the history in the central
repository.
6. Click Continue.
The Output dialog box displays with a message that states “Add object completed”. Close the dialog box.
7. Verify that the Central Object Library contains the WF_EmpPos, DF_EmpDept, and DF_EmpLoc objects in
their respective tabs.
Tutorial
Multiuser Development PUBLIC 143
When you include the dependents of the WF_EmpPos, you add other dependent objects, including dependents
of the two data flows DF_EmpDept and DF_EmpLoc.
● Open the Datastores tab in the Central Object Library to see that the NAMEDEPT and the POSLOC tables.
● Open the Format tab in the Central Object Library to see the flat files PosDept_Format,
NamePos_Format, and NameLoc_Format objects
Add an object and dependents that has dependent objects that were already added to the central repository
through a different object.
This topic continues from Adding an object and dependents to the central repository [page 143]. We assume
that you are still logged in to the user1 repository in Designer.
2. Right-click WF_PosHireDate and select Add to Central Repository Objects and dependents .
The Add to Central Repository Alert dialog box appears listing the objects that already exist in the central
repository:
○ DW_DS
○ NameDate_Format
○ NamePos_Format
○ POSHDATE(DWS_DS.USER1)
3. Click Yes to continue.
It is okay to continue with the process because you haven't changed the existing objects yet.
4. Enter a comment and select Apply comments to all objects.
5. Click Continue.
6. Close the Output dialog box.
The central repository now contains all objects in the user1 local repository. Developers who have access to the
central repository can check out, check in, label, and get those objects.
When you check out an object from the central repository, it becomes unavailable for other users to change it.
You can check out a single object or check out an object with dependents.
● If you check out a single object such as WF_EmpPos, it is not available for any user to change it. However,
the dependent object DF_EmpDept, remains in the central repository and it can be checked out by other
users.
Tutorial
144 PUBLIC Multiuser Development
● If you check out WF_EmpPos and the dependent DF_EmpDept, no one else can check out those objects.
Change the objects and save your changes locally, and then check the objects with your changes back into
the central repository. The repository creates a new version of the objects that include your changes.
After you make your changes and check the changed objects back into the central repository, other users can
view your changes, and check out the objects to make additional changes.
Related Information
Perform the following steps while you are logged in to the user1 repository.
1. Open the Central Object Library and open the Work Flow tab.
A warning appears telling you that checking out WF_EmpPos does not include the datastores. To include the
datastores in the checkout, use the Check Out with Filtering check out option.
Note
The software does not include the datastore DW_DS in the checkout as the message states. However,
the tables NAMEDEPT and POSLOC, which are listed under the Tables node of DW_DS, are included in the
dependent objects that are checked out.
Tutorial
Multiuser Development PUBLIC 145
3. Click Yes to continue.
Alternatively, you can select the object in the Central Object Library, then click the Check out object and
Data Services copies the most recent version of WF_EmpPos and its dependent objects from the central
repository into the user1 local repository. A red check mark appears on the icon for objects that are checked
out in both the local and central repositories.
User1 can modify the WF_EmpPos work flow and the checked out dependents in the local repository while it is
checked out of the central repository.
Related Information
Related Information
Tutorial
146 PUBLIC Multiuser Development
12.4.5 Checking in objects to the central repository
You can check in an object by itself or check it in along with all associated dependent objects. When an object
and its dependents are checked out and you check in the single object without its dependents, the dependent
objects remain checked out.
Related Information
After you change an existing object, check it into the central repository so that other users can access it.
Data Services copies the object from the user1 local repository to the central repository and removes the
check-out marks.
6. In the Central Object Library window, right-click DF_EmpLoc and click Show History.
The History dialog box contains the user name, date, action, and version number for each time the file was
checked out and checked back in. The dialog box also lists the comments that the user included when they
checked the object into the central repository. This information is helpful for many reasons, including:
Tutorial
Multiuser Development PUBLIC 147
○ Providing information to the next developer who checks out the object.
○ Helping you decide what version to choose when you want to roll back to an older version.
○ Viewing the difference between versions.
For more information about viewing history, see the Designer Guide.
7. After you have reviewed the history, click Close.
Related Information
Set up the environment for user2 so that you can perform the remaining tasks in Multiuser development.
Log into SAP Data Services Designer and choose the user2 repository. Enter user2 for the password.
Set up the user2 developer environment in the same way that you set up the environment for user1. The
following is a summary of the steps:
Related Information
Tutorial
148 PUBLIC Multiuser Development
Activating a connection to the central repository [page 139]
Undo a checkout to restore the object in the central repository to the condition in which it was when you
checked it out.
In this exercise, you check out DF_PosHireDate from the central repository, modify it, and save your changes
to your local repository. Then you undo the checkout of DF_PosHireDate from the central repository.
When you undo a checkout, you restore the object in the central repository to the way it was when you checked
it out. SAP Data Services does not save changes or create a new version in the central repository. Your local
repository, however, retains the changes that you made. To undo changes in your local repository, “get” the
object from the central repository after you undo the checkout. The software overwrites your local copy and
replaces it with the restored copy of the object in the central repository.
Undo checkout works for both a single object as well as objects with dependents.
Related Information
Tutorial
Multiuser Development PUBLIC 149
12.4.7.1 Checking out and modifying an object
Check out the DF_PosHireDate and modify the output mapping in the query.
The DF_PosHireDate object appears with a red checkmark in both the Local Object Library and the
Central Object Library indicating that it is checked out.
4. In the local object library, double-click DF_PosHireDate to open it in the workspace.
5. Double-click the query in the data flow to open the Query Editor.
6. In the Schema Out pane, right-click LName and click Cut.
Undo an object checkout when you do not want to save your changes, and you want to revert the object back to
the original content when you checked it out.
1. Open the Data Flow tab in the Central Object Library and expand Data Flows.
Data Services removes the check-out symbol from DF_PosHireDate in the Local and Central Object Library,
without saving your changes in the central repository. The object in your local repository still has the output
mapping change.
Related Information
Compare two objects, one from the local repository, and the same object from the central repository to view
the differences between the objects.
Make sure that you have followed all of the steps in the Undo checkout section.
Tutorial
150 PUBLIC Multiuser Development
1. Expand the Data Flow tab in the Local Object Library and expand Data Flows.
The Difference Viewer opens in the workspace. It shows the local repository contents for DF_PosHireDate
on the left and the central repository contents for DF_PosHireDate on the right.
3. Examine the data in the Difference Viewer.
The Difference Viewer helps you find the differences between the local object and the object in the central
repository.
Expand the Query node and then expand the Query table icon. The Difference Viewer indicates that the
LName column was removed in the local repository on the left, but it was added back in the central
repository. The text is in green, and the green icon appears signifying that there was an insertion.
Related Information
The Difference Viewer shows the difference between an object in the local repository and the central repository.
In the following screen capture, the Difference Viewer shows the differences between the DF_PosHireDate
objects in the left and right panes. Notice the following areas of the dialog box:
Tutorial
Multiuser Development PUBLIC 151
The Difference Viewer contains a status line at the bottom of the dialog box as shown in the image below. The
status line indicates the number of differences. If there are no differences, the status line indicates Difference [ ]
of 0. To the left of the status line is a key to the colored status icons.
Deleted The item does not appear in the object in the right pane.
Changed The differences between the items are highlighted in blue (the default) text.
Inserted The item has been added to the object in the right pane.
Consolidated The items within the line have differences. Expand the item by clicking its plus
sign to view the differences
Example
For example, you may need to use the checkout without replacement option when you change an object in
your local repository before you check it out from the central repository.
Tutorial
152 PUBLIC Multiuser Development
The option prevents Data Services from overwriting the changes that you made in your local copy.
After you have checked out the object from the central repository the object in both the central and local
repository has a red check out icon. But the local copy is not replaced with the version in the central repository.
You can then check your local version into the central repository so that it is updated with your changes.
Do not ues the check out without replacement option if another user checked out the file from the central
repository, made changes, and then checked in the changes.
Example
For example, you make changes to your local copy of Object-A without realizing you are working in your
local copy.
Meanwhile, another developer checks out Object-A from the central repository, makes extensive changes
and checks it back in to the central repository.
You finally remember to check out Object-A from the central repository. Instead of checking the object
history, you assume that you were the last developer to work in the master of Object-A, so you check
Object-A out of the central repository using the without replacement option. When you check your local
version of Object-A into the central repository, all changes that the other developer made are overwritten.
Caution
Before you use the Object without replacement option in a multiuser environment, check the history of the
object in the central repository. Make sure that you are the last person who worked on the object.
In the next exercise, user2 uses the check out option Object without replacement to be able to update the
master version in the central repository with changes from the version in the local repository.
Related Information
Tutorial
Multiuser Development PUBLIC 153
Undo checkout [page 149]
Comparing objects [page 150]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]
Use the checkout option without replacement to check out an object from the central repository without
overwriting the local copy that has changed.
1. Open the Data Flow tab in the Local Object Library and expand Data Flows.
2. Double-click DF_EmpLoc to open it in the workspace.
3. Double-click the query in the workspace to open the Query Editor.
4. Right-click FName in the Schema Out pane and click Cut.
5. Save your work.
6. Open the Data Flow tab of the Central Object Library and expand Data Flows.
The software marks the DF_EmpLoc object in the Central Object Library and the Local Object Library as
checked out. The software does not overwrite the object in the Local Object Library, but preserves the
object as is.
Check in the local version of DF_EmpLoc to update the central repository version to include your changes.
These steps continue from the topic Checking out an object without replacement [page 154].
1. In the Central Object Library, right-click DF_EmpLoc and select Check in Object .
2. Type a comment as follows in the Comment dialog box and click Continue.
Now the central repository contains a third version of DF_EmpLoc. This version is the same as the copy of
DF_EmpLoc in the user2 local object library.
3. Right-click DF_EmpLoc in your Local Object Library and select Compare Object to central .
The Difference Viewer should show the two objects as the same.
Tutorial
154 PUBLIC Multiuser Development
12.4.9.3 Checking in DF_EmpDept and WF_EmpPos
Related Information
When you get an object from the central repository, you are making a copy of a specific version for your local
repository.
You might want to copy a specific version of an object from the central repository into your local repository.
Getting objects allows you to select a version other than the most recent version to copy. When you get an
object, you replace the version in your local repository with the version that you copied from the central
repository. The object is not checked out of the central repository, and it is still available for others to lock and
check out.
Tutorial
Multiuser Development PUBLIC 155
Related Information
Obtain a copy of the latest version of an object from the central repository.
Perform the following steps in Designer. You can use either the user1 or user2 repository.
7. Right-click DF_EmpLoc and select Get Latest Version Object from the dropdown menu.
Data Services copies the most recent version of the data flow from the central repository to the local
repository.
8. Open the DF_EmpLoc data flow in the Local Object Library,.
9. Open the query to open the Query Editor.
10. Notice that there are now three columns in the Schema Out pane: LName, Pos, and Loc.
The latest version of DF_EmpLoc from the central repository overwrites the previous copy in the local
repository.
11. Click the Back arrow in the icon menu bar to return to the data flow.
Obtain a copy of a select previous version of an object from the central repository.
Perform the following steps in Designer. You can use either the user1 or user2 repository.
Tutorial
156 PUBLIC Multiuser Development
When you get a previous version of an object, you get the object but not its dependents.
Version 1 of DF_EmpLoc is the version that you first added to the central repository at the beginning of this
section. The software overwrote the altered version in your local repository with Version 1 from the central
repository.
Use filtering to select the dependent objects to include, exclude, or replace when you add, check out, or check
in objects in a central repository.
When multiple users work on an application, some objects can contain repository-specific information. For
example, datastores and database tables might refer to a particular database connection unique to a user or a
phase of development. After you check out an object with filtering, you can change or replace the following
configurations:
Related Information
Tutorial
Multiuser Development PUBLIC 157
Deleting objects [page 158]
Checking out objects with filtering [page 158]
Deleting objects [page 158]
The Version Control Confirmation dialog box opens with a list of dependent object types. Expand each node
to see a list of dependent objects of that object type.
4. Select NamePos_Format under Flat Files.
5. Select Exclude from the Target status dropdown list.
The word “excluded” appears next to NamePos_Format in the Action column. Data Services excludes the
flat file NamePos_Format from the dependent objects to be checked out.
6. Click Next.
The Datastore Options dialog box opens listing the datastores that are used by NamePos_Format.
7. Click Finish.
You may see a Check Out Alert dialog box stating that there are some dependent objects checked out by other
users. For example, if user1 checked in the WF_EmpPos workflow to the central repository without selecting to
include the dependent objects, the dependent objects could still be checked out. The Check Out Alert lists the
reasons why each listed object cannot be checked out. For example, “The object is checked out by the
repository: user1”. This reason provides you with the information to decide what to do next:
● Select Yes to get copies of the latest versions of the selected objects into your repository.
● Select No to check out the objects that are not already checked out by another user.
● Select Cancel to cancel the checkout.
You can delete objects from the local or the central repository.
Tutorial
158 PUBLIC Multiuser Development
Related Information
When you delete objects from the central repository, dependent objects, and objects in your local repositories
are not always deleted.
When you delete objects from the central repository, you delete only the selected object and all versions of
it; you do not delete any dependent objects.
7. Open the Work Flows tab in the local object library to verify that WF_PosHireDate was not deleted from the
user2 local object library.
When you delete an object from a central repository, it is not automatically deleted from the connected
local repositories.
When you delete an object from a local repository, it is not deleted from the central repository.
Tutorial
Multiuser Development PUBLIC 159
When you delete an object from a local repository, the software does not delete it from the central
repository. If you delete an object from your local repository by accident, recover the object by selecting to
“Get” the object from the central repository, if it exists in your central repository.
4. Open the Central Object Library.
5. Click the Refresh icon on the object library toolbar.
6. Open the Data Flows tab in the Central Object Library and verify that DF_EmpDept was not deleted from
the central repository.
7. Exit Data Services.
12.5 Summary
For more information about the topics covered in this section, see the Designer Guide.
Tutorial
160 PUBLIC Multiuser Development
13 Extracting SAP application data
In this section you learn how to use the SAP Data Services objects that extract data from SAP applications.
To extract data from your SAP applications using Data Services, use the following objects:
Note
To perform the exercises in this section, your implementation of Data Services must be able to connect to
an SAP remote server. Ask your administrator for details.
Note
The structure of standard SAP tables varies between versions. Therefore, the sample tables for these
exercises may not work with all versions of SAP applications. If the exercises in this section are not working
as documented, it may be because of the versions of your SAP applications.
In this section, we work with the data sources that are circled in the star schema:
Tutorial
Extracting SAP application data PUBLIC 161
Importing metadata [page 163]
Import SAP application tables into the new datastore SAP_DS for the exercises in this section.
SAP applications are the main building blocks of the SAP solution portfolios for industries.
SAP applications provide the software foundation with which organizations address their business issues. SAP
delivers the following types of applications:
Ask your system administrator about the types of SAP applications that your organization uses.
Related Information
Tutorial
162 PUBLIC Extracting SAP application data
Repopulating the Sales Fact table [page 179]
Summary [page 188]
Use the SAP application datastore to connect Data Services to the SAP application server.
Log on to Designer and to the tutorial repository. Do not use the user1, user2, or central repositories that you
created for the multiuser exercises.
The new datastore appears in the Datastore tab of the Local Object Library.
Related Information
Import SAP application tables into the new datastore SAP_DS for the exercises in this section.
Create and configure the SAP application datastore named SAP_DS before you import the metadata.
Tutorial
Extracting SAP application data PUBLIC 163
2. Right-click SAP_DS and click Import by Name.
MAKT
MARA
VBAK
VBUP
The software adds the tables to the Datastores tab of the Local Object Library under Tables.
Related Information
Tutorial
164 PUBLIC Extracting SAP application data
13.4 Repopulate the customer dimension table
Repopulate the customer dimension table by configuring a data flow that outputs SAP application data to a
datastore table.
Configure a Data Services job that includes a work flow and an ABAP data flow. The ABAP data flow extracts
SAP data and loads it into the customer dimension table.
To configure the Data Services job so that it communicates with the SAP application, configure an ABAP data
flow. The ABAP data flow contains Data Services supplied commands so you do not need to know ABAP.
For more information about configuring an ABAP data flow, see the Supplement for SAP.
1. Adding the SAP_CustDim job, work flow, and data flow [page 166]
The job for repopulating the customer dimension table includes a work flow and a data flow.
2. Adding ABAP data flow to Customer Dimension job [page 166]
Add the ABAP data flow to JOB_SAP_CustDim and set options in the ABAP data flow.
3. Defining the DF_SAP_CustDim ABAP data flow [page 167]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_CustDim job [page 171]
Validate and then execute the JOB_SAP_CustDim job.
5. ABAP job execution errors [page 172]
There are some common ABAP job execution errors that have solutions.
Related Information
Tutorial
Extracting SAP application data PUBLIC 165
13.4.1 Adding the SAP_CustDim job, work flow, and data flow
The job for repopulating the customer dimension table includes a work flow and a data flow.
Next task: Adding ABAP data flow to Customer Dimension job [page 166]
The SAP_CustDim data flow needs an ABAP dataflow to extract SAP application data.
The ABAP data flow interacts directly with the SAP application database layer. Because the database layer is
complex, Data Services accesses it using ABAP code.
Data Services executes the SAP_CustDim batch job in the following way:
Add the ABAP data flow to JOB_SAP_CustDim and set options in the ABAP data flow.
2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.
Tutorial
166 PUBLIC Extracting SAP application data
Option Action
Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.
ABAP program name Specify the name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following name requirements:
○ Begins with the letter Y or Z
○ Cannot exceed 8 characters
Job name Type SAP_CustDim. The name is for the job that runs in
the SAP application.
4. Open the General tab and name the data flow DF_SAP_CustDim.
5. Click OK.
6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the CUST_DIM table on to the workspace using drag and drop.
Previous task: Adding the SAP_CustDim job, work flow, and data flow [page 166]
Next task: Defining the DF_SAP_CustDim ABAP data flow [page 167]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Perform the following group of tasks to define the ABAP data flow:
Tutorial
Extracting SAP application data PUBLIC 167
2. Defining the query [page 169]
Complete the output schema in the query to define the data to extract from the SAP application.
3. Defining the details of the data transport [page 170]
A data transport defines a staging file for the data that is extracted from the SAP application.
4. Setting the execution order [page 171]
Set the order of execution by joining the objects in the data flow.
Previous task: Adding ABAP data flow to Customer Dimension job [page 166]
Add the necessary objects to complete the DF_SAP_CustDim ABAP data flow.
2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the KNA1 table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Add a query from the tool pallet to the right of the KNA1 table in the workspace.
6. Add a data transport from the tool pallet to the right of the query in the workspace.
7. Connect the icons in the data flow to indicate the flow of data as shown.
Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]
Tutorial
168 PUBLIC Extracting SAP application data
13.4.3.2 Defining the query
Complete the output schema in the query to define the data to extract from the SAP application.
1. Open the query In the workspace to open the Query Editor dialog box.
2. Expand the KNA1 table in the Schema In pane to see the columns.
3. Click the column head (above the table name) to sort the list in alphabetical order.
4. Map the following seven source columns to the target schema. Use Ctrl + Click to select multiple
columns and drag them to the output schema.
KUKLA
KUNNR
NAME1
ORT01
PSTILZ
REGIO
STRAS
The icon next to the source column changes to an arrow to indicate that the column has been mapped. The
Mapping tab in the lower pane of the Query Editor shows the mapping relationships.
5. Rename the target columns and verify or change the data types and descriptions using the information in
the following table. To change these settings, right-click the column name and select Properties from the
dropdown list.
Note
Microsoft SQL Server and Sybase ASE DBMSs require that you specify the columns in the order shown
in the following table and not alphabetically.
Tutorial
Extracting SAP application data PUBLIC 169
6. Click the Back arrow icon in the icon toolbar to return to the data flow and to close the Query Editor.
7. Save your work.
Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]
Previous task: Adding objects to DF_SAP_CustDim ABAP data flow [page 168]
Next task: Defining the details of the data transport [page 170]
A data transport defines a staging file for the data that is extracted from the SAP application.
Tutorial
170 PUBLIC Extracting SAP application data
4. Select Replace File.
Replace File truncates this file each time the data flow is executed.
5. Click the Back icon in the icon toolbar to return to the data flow.
6. Save your work.
Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]
Set the order of execution by joining the objects in the data flow.
The data flow contains the ABAP data flow and the target table named Cust_Dim.
2. Connect the ABAP data flow to the target table.
3. Save your work.
Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]
Previous task: Defining the details of the data transport [page 170]
Related Information
1. With the job selected in the Project Area, click the Validate All icon on the icon toolbar.
If your design contains errors, a message appears describing the error. The software requires that you
resolve the error before you can proceed.
If the job has warning message, you can continue. Warnings do not prohibit job execution.
If your design does not have errors, the following message appears:
Tutorial
Extracting SAP application data PUBLIC 171
2. Right-click the job name in the project area and click Execute.
If you have not saved your work, a save dialog box appears. Save your work and continue. The Execution
Properties dialog box opens.
3. Leave the default selections and click OK.
After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the cust_dim table in your DBMS.
Previous task: Defining the DF_SAP_CustDim ABAP data flow [page 167]
There are some common ABAP job execution errors that have solutions.
The following table lists a few common ABAP job execution errors. probable causes, and how to fix them.
Cannot open Lack of permissions for Job Server 1. Open the Services Control Panel
ABAP output file service account. 2. Double-click the Data Services service and select a user
account that has permissions to the working folder on
the SAP server
Cannot create Working directory on SAP server speci Open the Datastores tab in the Local Object Library and fol
ABAP output file fied incorrectly. low these steps:
If you have other ABAP errors, read about debugging and testing ABAP jobs in the Supplement for SAP.
Tutorial
172 PUBLIC Extracting SAP application data
13.5 Repopulating the material dimension table
For this exercise, you create a data flow that is similar to the dataflow that you created to repopulate the
customer dimension table. However, in this process, the data for the material dimension table is the result of a
join between two SAP application tables.
1. Adding the material dimension job, work flow, and data flow [page 173]
Create the material dimension job and add a work flow and a data flow.
2. Adding ABAP data flow to Material Dimension job [page 174]
Add the ABAP data flow to JOB_SAP_MtrlDim and set options in the ABAP data flow.
3. Defining the DF_SAP_MtrlDim ABAP data flow [page 175]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_MtrlDim job [page 178]
Validate and then execute the JOB_SAP_MtrlDim job.
Related Information
13.5.1 Adding the material dimension job, work flow, and data
flow
Create the material dimension job and add a work flow and a data flow.
Log on to Designer and open the Class Exercises project in the Project Area.
Tutorial
Extracting SAP application data PUBLIC 173
Task overview: Repopulating the material dimension table [page 173]
Next task: Adding ABAP data flow to Material Dimension job [page 174]
Add the ABAP data flow to JOB_SAP_MtrlDim and set options in the ABAP data flow.
2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.
Option Action
Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.
ABAP program name Specify the name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following name requirements:
○ Begins with the letter Y or Z
○ Cannot exceed 8 characters
Job name Type SAP_MtrlDim. The name is for the job that runs in
the SAP application.
4. Open the General tab and name the data flow DF_SAP_MtrlDim.
5. Click OK.
6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the MTRL_DIM table to the workspace using drag and drop.
Previous task: Adding the material dimension job, work flow, and data flow [page 173]
Next task: Defining the DF_SAP_MtrlDim ABAP data flow [page 175]
Tutorial
174 PUBLIC Extracting SAP application data
Related Information
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Perform the following group of tasks to define the ABAP data flow:
Defining the query with a join between source tables [page 176]
Set up a join between the two source tables and complete the output schema to define the data to
extract from the SAP application
Previous task: Adding ABAP data flow to Material Dimension job [page 174]
Related Information
Tutorial
Extracting SAP application data PUBLIC 175
13.5.3.1 Adding objects to the DF_SAP_MtrlDim ABAP data
flow
Add the necessary objects to complete the DF_SAP_MtrlDim ABAP data flow.
2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the MARA table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Move the MAKT table to the workspace using drag and drop. Position it under the MARA table.
6. Select Make Source.
7. Add a query from the tool pallet to the right of the table in the workspace.
8. Add a data transport from the tool pallet to the right of the query in the workspace.
9. Connect the icons in the data flow to indicate the flow of data as shown.
Related Information
Set up a join between the two source tables and complete the output schema to define the data to extract from
the SAP application
1. Double-click the query in the workspace to open the Query Editor dialog box.
Tutorial
176 PUBLIC Extracting SAP application data
2. Open the FROM tab in the lower pane.
3. In the Join pairs group, select MARA from the Left dropdown list.
4. Select MAKT from the Right dropdown list.
The source rows must meet the requirements of the condition to be passed to the target, including the join
relationship between sources. The MARA and MAKT tables are related by a common column named
MATNR. The MATNR column contains the material number and is the primary key between the two tables.
(MARA.MATNR = MAKT.MATNR)
This command filters the material descriptions by language. Only the records with the material
descriptions in English are output to the target.
7. Click OK to close the Smart Editor.
8. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop.
Table Column
MARA MATNR
MTART
MBRSH
MATKL
MAKT MAKTX
9. Rename the target columns, verify data types, and add descriptions based on the information in the
following table.
10. Click the Back arrow in the icon toolbar to return to the data flow.
11. Save your work.
Related Information
Tutorial
Extracting SAP application data PUBLIC 177
13.5.3.3 Defining data details of the data transport
A data transport defines a staging file for the data that is extracted from the SAP application.
Set the order of execution by joining the objects in the data flow.
The data flow contains the ABAP data flow and the target table named Mtrl_Dim.
2. Connect the ABAP data flow to the target table.
3. Save your work.
Related Information
1. With JOB_SAP_MtrlDim selected in the Project Area, click the Validate All icon on the icon toolbar.
If your design contains errors, a message appears describing the error, which requires solving before you
can proceed.
If your design contains warnings, a warning message appears. Warnings do not prohibit job execution.
If your design does not have errors, the following message appears:
2. Right-click the job name in the Project Area and click the Execute icon in the toolbar.
Tutorial
178 PUBLIC Extracting SAP application data
If you have not saved your work, a save dialog box appears. Save your work and continue.
After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the Mtrl_Dim table in your DBMS.
Previous task: Defining the DF_SAP_MtrlDim ABAP data flow [page 175]
Related Information
Repopulate the Sales Fact table from two SAP application sources.
This task extracts data from two source tables, and it extracts a single column from a third table using a lookup
function.
1. Adding the Sales Fact job, work flow, and data flow [page 180]
Create the Sales Fact job and add work flow and a data flow objects.
2. Adding ABAP data flow to Sales Fact job [page 180]
Add the ABAP data flow to JOB_SAP_SalesFact and set options in the ABAP data flow.
3. Defining the DF_ABAP_SalesFact ABAP data flow [page 181]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_SalesFact job [page 187]
Validate and then execute the JOB_SAP_SalesFact job.
Related Information
Tutorial
Extracting SAP application data PUBLIC 179
Repopulating the material dimension table [page 173]
Summary [page 188]
Adding the Sales Fact job, work flow, and data flow [page 180]
13.6.1 Adding the Sales Fact job, work flow, and data flow
Create the Sales Fact job and add work flow and a data flow objects.
Log on to Designer and open the Class Exercises project in the Project Area.
Next task: Adding ABAP data flow to Sales Fact job [page 180]
Add the ABAP data flow to JOB_SAP_SalesFact and set options in the ABAP data flow.
2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.
Option Action
Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.
ABAP program name Specify a name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following naming requirements:
○ Begins with the letter Y or Z
○ Cannot exceed 8 characters
Tutorial
180 PUBLIC Extracting SAP application data
Option Action
Job name Type SAP_SalesFact. The name is for the job that runs
in the SAP application.
4. Open the General tab enter DF_ABAP_SalesFact for the ABAP data flow.
5. Click OK.
6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the SALES_FACT table to the workspace using drag and drop.
Previous task: Adding the Sales Fact job, work flow, and data flow [page 180]
Next task: Defining the DF_ABAP_SalesFact ABAP data flow [page 181]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
Perform the following group of tasks to define the ABAP data flow:
Defining the query with a join between source tables [page 183]
Set up a join between the two source tables and complete the output schema to define the data to
extract from the SAP application
Defining the lookup function to add output column with a value from another table [page 184]
Use a lookup function to extract data from a table that is not defined in the job.
Tutorial
Extracting SAP application data PUBLIC 181
Task overview: Repopulating the Sales Fact table [page 179]
Previous task: Adding ABAP data flow to Sales Fact job [page 180]
Related Information
Add the necessary objects to complete the DF_ABAP_SalesFact ABAP data flow.
2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the VBAP table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Move the VBAK table to the workspace using drag and drop. Place it under the VBAP table.
6. Select Make Source.
7. Add a query from the tool palette to the right of the tables in the workspace.
8. Add a data transport from the tool palette to the right of the query in the workspace.
9. Connect the icons in the dataflow to indicate the flow of data as shown.
Tutorial
182 PUBLIC Extracting SAP application data
10. Save your work.
Related Information
Defining the query with a join between source tables [page 183]
Set up a join between the two source tables and complete the output schema to define the data to extract from
the SAP application
VBAP.VBELN = VBAK.VBELN
This statement filters the sales orders by date and brings the sales orders from one year into the target
table.
7. Click OK.
8. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop:
Table Column
VBAP VBELN
POSNR
MATNR
NETWR
Tutorial
Extracting SAP application data PUBLIC 183
Table Column
VBAK KVGR1
AUDAT
9. Rename the target columns, verify data types, and add descriptions as shown in the following table:
Use a lookup function to extract data from a table that is not defined in the job.
Option Value
Name ord_status
Length 1
Tutorial
184 PUBLIC Extracting SAP application data
Note
4. Click OK.
Restriction
The LOOKUP function is case sensitive. Enter the values using the case as listed in the following table.
Type the entries in the text boxes instead of using the dropdown arrow or the Browse button.
Result column GBSTA The column from the VBUP table that contains the value for
the target column ord_status.
Tutorial
Extracting SAP application data PUBLIC 185
Option Value Description
Default value 'none' The value used if the lookup isn't successful. Use single quotes
as shown.
Cache spec 'NO_CACHE' Specifies whether to cache the table. Use single quotes as
shown.
Note
The value for the ord_status column comes from the GBSTA column in the VBUP table. The value in
the GBSTA column indicates the status of a specific item in the sales document. The software needs
both an order number and an item number to determine the correct value to extract from the table.
The function editor provides fields for only one dependency, which you defined using the values from
the table.
The Lookup function can process any number of comparison value pairs. To include the dependency on the
item number to the Lookup expression, add the item number column from the translation table and the
item number column from the input (source) schema as follows:
POSNR, VBAP.POSNR
13. Click the Back arrow in the icon toolbar to close the Query Editor.
14. Save your work.
Related Information
Tutorial
186 PUBLIC Extracting SAP application data
13.6.3.4 Defining the details of the data transport
A data transport defines a staging file for the data that is extracted from the SAP application.
Related Information
Set the order of execution by joining the objects in the data flow.
Related Information
1. With JOB_SAP_SalesFact selected in the Project Area, click the Validate All icon in the toolbar.
If your design contains errors, a message appears describing the error, which requires solving before you
can proceed.
If your design contains warnings, a warning message appears. Warnings do not prohibit job execution.
Tutorial
Extracting SAP application data PUBLIC 187
If your design does not have errors, the following message appears:
2. Right-click JOB_SAP_SalesFact in the Project Area and click the Execute icon in the toolbar.
If you have not saved your work, a save dialog box appears. Save your work and continue.
After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the Sales_Fact table in your DBMS.
Previous task: Defining the DF_ABAP_SalesFact ABAP data flow [page 181]
Related Information
13.7 Summary
This section showed you how to use SAP applications as a source by creating a datastore that contains
connection information to a remote server.
In this section you used some advanced features in the software to work with SAP application data:
● Use ABAP code in an ABAP data flow to define the data to extract from SAP applications.
● Use a data transport object to carry data from the SAP application into Data services.
● Use a lookup function and additional lookup values in the mapping expression to obtain data from a source
not included in the job.
For more information about using SAP application data in Data Services, see the Supplement for SAP.
Related Information
Tutorial
188 PUBLIC Extracting SAP application data
Repopulate the customer dimension table [page 165]
Repopulating the material dimension table [page 173]
Repopulating the Sales Fact table [page 179]
Tutorial
Extracting SAP application data PUBLIC 189
14 Real-time jobs
In this section you execute a real-time job to see the basic functionality.
For real-time jobs, Data Services receives requests from ERP systems and Web applications and sends replies
immediately after receiving the requested data. Requested data comes from a data cache or a second
application. You define operations for processing on-demand messages by building real-time jobs in the
Designer.
● A single real-time data flow (RTDF) that runs until explicitly stopped
● Requests in XML message format and SAP applications using IDoc format
Note
The tutorial exercise focuses on a simple XML-based example that you import.
For more information about real-time jobs, see the Reference Guide.
1. Copy the following files from <LINK_DIR>\ConnectivityTest and paste them into your temporary
directory. For example, C:\temp:
○ TestOut.dtd
○ TestIn.dtd
○ TestIn.xml
○ ClientTest.txt
2. Copy the file ClientTest.exe from <LINK_DIR>\bin and paste it to your temporary directory.
Note
ClientTest.exe uses DLLs in your <LINK_DIR>\bin directory. If you encounter problems, ensure
that you have included <LINK_DIR>\bin in the Windows environment variables path statement.
Tutorial
190 PUBLIC Real-time jobs
3. Log on to SAP Data Services Designer and the tutorial repository.
4. Right-click in a blank space in the Local Object Library and select Repository Import From File .
Run a real time job that transforms an input string of Hello World to World Hello.
Use the files that you imported previously to create a real-time job.
The workspace contains one XML message source named TestIn (XML request) and one XML message
target named TestOut (XML reply).
6. Double-click TestIn to open it. Verify that the Test file option in the Source tab is C:\temp\TestIn.XML.
7. In Windows Explorer, open Testin.XML in your temporary directory. For example, C:\temp\TestIn.XML.
Confirm that it contains the following message:
<test>
Tutorial
Real-time jobs PUBLIC 191
<Input_string>Hello World</Input_string>
</test>
8. Back in Designer, double-click TestOut in the workspace to open it. Verify that the Test file option in the
Target tab is C:\temp\TestOut.XML.
9. Execute the job Job_TestConnectivity
10. Click Yes to save all changes if applicable.
11. Accept the default settings in the Execution Properties dialog box and click OK.
12. When the job completes, open Windows Explorer and open C:\temp\TestOut.xml. Verify that the file
contains the following text:
<test>
<output_string>World Hello</output_string>
</test>
Tutorial
192 PUBLIC Real-time jobs
Important Disclaimers and Legal Information
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
● Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
● The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
● SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
● Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering a SAP-hosted Web site. By using such
links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Gender-Related Language
We try not to use gender-specific word forms and formulations. As appropriate for context and readability, SAP may use masculine word forms to refer to all genders.
Tutorial
Important Disclaimers and Legal Information PUBLIC 193
www.sap.com/contactsap
SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.