Ds 42 Tutorial en PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 194

PUBLIC

SAP Data Services


Document Version: 4.2 Support Package 13 (14.2.13.0) – 2020-03-31

Tutorial
© 2020 SAP SE or an SAP affiliate company. All rights reserved.

THE BEST RUN


Content

1 Documentation changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Introduction to the tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


2.1 Audience and assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Tutorial objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Product overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Product components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
The Designer user interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
About objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Preparation for this tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Tasks required to prepare for the tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Tutorial structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Exiting the tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Resuming the tutorial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

3 Source and target metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


3.1 Logging in to the Designer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 About datastores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Defining a datastore for the source (ODS) database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Defining a datastore for the target database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Importing metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Importing metadata for ODS source tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Importing metadata for target tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Defining a file format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
3.5 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Populate the Sales Organization dimension from a flat file. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37


4.1 Creating a new project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Adding a new job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Adding a workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Adding a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Define the data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Adding objects to the DF_SalesOrg data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Defining the order of steps in a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Configuring the query transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6 Validating the DF_SalesOrg data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.7 Addressing warnings and errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8 Saving the project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

Tutorial
2 PUBLIC Content
4.9 Ensuring that the Job Server is running. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
4.10 Executing the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.11 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Populate the Time dimension table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


5.1 Opening the Class_Exercises project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
5.2 Adding a job and data flow to the project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Adding the components of the time data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4 Defining the flow of data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Defining the output of the Date_Generation transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 Defining the output of the query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.7 Saving and executing the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.8 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Populate the Customer dimension table from a relational table. . . . . . . . . . . . . . . . . . . . . . . . 58


6.1 Adding the CustDim job and workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
6.2 Adding the CustDim data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3 Define the data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Adding objects to a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Configuring the query transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 Validating the CustDim data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.5 Executing the CustDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
6.6 The interactive debugger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
Setting a breakpoint in a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Debugging Job_CustDim with interactive debugger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Setting a breakpoint condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.7 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7 Populate the Material Dimension from an XML File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


7.1 Nested data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.2 Adding MtrlDim job, workflow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Importing a document type definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.4 Define the MtrlDim data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Adding objects to DF_MtrlDim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Configuring the qryunnest query. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.5 Validating that the MtrlDim data flow has been constructed properly. . . . . . . . . . . . . . . . . . . . . . . . 75
7.6 Executing the MtrlDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.7 Leveraging the XML_Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76
Setting up a job and data flow that uses the XML_Pipeline transform. . . . . . . . . . . . . . . . . . . . . 77
Configuring the XML_Pipeline and Query_Pipeline transforms. . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.8 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 Populate the Sales Fact Table from Multiple Relational Tables. . . . . . . . . . . . . . . . . . . . . . . . . 81

Tutorial
Content PUBLIC 3
8.1 Adding the SalesFact job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2 Creating the SalesFact data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.3 Defining the details of the Query transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
8.4 Using a lookup_ext function for order status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.5 Validating the SalesFact data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.6 Executing the SalesFact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.7 Viewing Impact and Lineage Analysis for the SALES_FACT target table. . . . . . . . . . . . . . . . . . . . . . .91
8.8 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

9 Changed data capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94


9.1 Global variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.2 Adding the initial load job and defining global variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
Adding a workflow, scripts, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.3 Replicating the initial load data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
9.4 Building the delta load job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101
Adding the job and defining the global variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Defining the scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.5 Execute the initial and delta load jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Executing the initial load job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Changing the source data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Executing the delta-load job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.6 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

10 Data Assessment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


10.1 Default profile statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
10.2 Viewing profile statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.3 The Validation transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Creating a validation job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111
Adding a job and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Configuring the Validation transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
10.4 Audit objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Adding a fail target table to the data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Creating audit functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.5 Viewing audit details in Operational Dashboard reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
View audit results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.6 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

11 Recovery Mechanisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


11.1 Recoverable job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122
11.2 Creating local variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
11.3 Creating the script that determines the status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
11.4 Conditionals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Tutorial
4 PUBLIC Content
Adding the conditional. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Specifying the If-Then work flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.5 Creating the script that updates the status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.6 Verify the job setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
11.7 Executing the job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129
11.8 Data Services automated recovery properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
11.9 Summary and what to do next. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

12 Multiuser Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


12.1 Central Object Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Central Object Library layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
12.2 How multiuser development works. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
12.3 Preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134
Configuring the central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Configuring two local repositories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Associating repositories to your job server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Defining connections to the central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .137
12.4 Working in a multiuser environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Activating a connection to the central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Importing objects into your local repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Adding objects to the central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Check out objects from the central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Checking in objects to the central repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Setting up the user2 environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Undo checkout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149
Comparing objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Check out object without replacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Get objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Filter dependent objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Deleting objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
12.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

13 Extracting SAP application data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161


13.1 SAP applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
13.2 Defining an SAP application datastore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
13.3 Importing metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
13.4 Repopulate the customer dimension table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Adding the SAP_CustDim job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Adding ABAP data flow to Customer Dimension job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .166
Defining the DF_SAP_CustDim ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Executing the JOB_SAP_CustDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
ABAP job execution errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

Tutorial
Content PUBLIC 5
13.5 Repopulating the material dimension table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Adding the material dimension job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Adding ABAP data flow to Material Dimension job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Defining the DF_SAP_MtrlDim ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Executing the JOB_SAP_MtrlDim job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
13.6 Repopulating the Sales Fact table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Adding the Sales Fact job, work flow, and data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Adding ABAP data flow to Sales Fact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180
Defining the DF_ABAP_SalesFact ABAP data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Executing the JOB_SAP_SalesFact job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.7 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

14 Real-time jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .190


14.1 Importing a real-time job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
14.2 Running a real time job in test mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

Tutorial
6 PUBLIC Content
1 Documentation changes

Significant changes to the Tutorial since the last update.

The following table contains changes to the documentation, and the related SAP Data Services version in which
the changes were made. The list begins with the most recent changes.

Change Notes Version

Removed the following topics: Topic obsolete 4.2 SP 9 Patch 1

● Accessing documentation from the


Web
● documentation set for SAP Data
Services

Added the following topics: Standard topics in all Data Services 4.2 SP 9 Patch 1
documents.
● Welcome
● SAP information resources

Removed the following topics: Topics were under Product overview. 4.2 SP 8 Patch 2
Removed because the concepts were 4.2 SP 9
● System configurations
for advanced users.
● Windows and UNIX implementa­
tion

Removed: Blueprints Topic was under Product overview, 4.2 SP 8 Patch 2


About objects. Tutorial participants do 4.2 SP 9
not need to download blueprints.

Removed: Environment requirements Topic was under Preparation for this tu­ 4.2 SP 8 Patch 2
torial. Removed because the concept 4.2 SP 9
was for advanced users.

Removed the following topics: Topics were under Setting up for the tu­ 4.2 SP 8 Patch 2
torial. Removed because an administra­ 4.2 SP 9
● BI platform and the Central Man­
tor should perform the tasks.
agement Server (CMS)
● Opening the Central Management
Console
● Installing SAP Data Services
● Verifying the Windows service
● Creating a new Data Services user
account

Tutorial
Documentation changes PUBLIC 7
Change Notes Version

Renamed the topic Setting up for the Topic is located under Preparation for 4.2 SP 8 Patch 2
tutorial to Tasks required to prepare for this tutorial. 4.2 SP 9
the tutorial.

Added the following topics to the Tuto­ These topics make it easier for readers 4.2 SP 8 Patch 2
rial: to find additional documents that we 4.2 SP 9
reference in other topics.
● SAP Information resources
● Accessing documentation from the
Web
● Documentation set for SAP Data
Services

Removed all Terminology topics These topics were not consistent, and 4.2 SP8 Patch 2
many terms were used in more than
one section.

Tutorial
8 PUBLIC Documentation changes
2 Introduction to the tutorial

This tutorial introduces you to the basic use of SAP Data Services Designer by explaining key concepts and
providing a series of related exercises and sample data.

Data Services Designer is a graphical user interface (GUI) development environment in which you extract,
transform, and load batch data from flat-file and relational database sources for use in a data warehouse. You
can also use Designer for real-time data extraction and integration.

2.1 Audience and assumptions

The tutorial is for users experienced in many areas of database management, SQL, and Microsoft Windows.

The tutorial introduces core SAP Data Services Designer functionality. We wrote the tutorial assuming that you
have experience in some of the following areas:

● Database management or administration


● Data extraction, data warehousing, data integration, or data quality
● Source data systems
● Business intelligence

2.2 Tutorial objectives

After you complete this tutorial, you will be able to extract, transform, and load data from various source and
target types, and understand the concepts and features of SAP Data Services Designer.

You will know about the various Data Services objects such as datastores and transforms, and you will be able
to define a file format, import data, and analyze data results.

You will learn how to use Data Services Designer features and functions to do the following:

● Verify and improve your source data quality


● Capture changed data
● View and print metadata reports
● Examine data through a job using the debugger
● Recover from runtime errors
● Set up a multiuser development environment
● Set up and run real-time data processing

Tutorial
Introduction to the tutorial PUBLIC 9
2.3 Product overview

Data Services extracts, transforms, and loads (ETL) data from heterogeneous sources into a target database or
data warehouse. You specify data mappings and transformations by using Data Services Designer.

Data Services combines industry-leading data quality and integration into one platform. It transforms your
data in many ways. For example, it standardizes input data, adds additional address data, cleanses data, and
removes duplicate entries.

Data Services provides additional support for real time data movement and access. It performs predefined
operations in real time, as it receives information. The Data Services real time components also provide
services to Web applications and other client applications.

For a complete list of Data Services resources, see the Designer Guide.

Product components [page 10]


Descriptions of the components that are a part of SAP Data Services.

The Designer user interface [page 12]


Use the many tools in SAP Data Services Designer to create objects, projects, data flows, and
workflows to process data.

About objects [page 13]


SAP Data Services objects are entities that you create, add, define, modify, or work with in the software.

2.3.1 Product components

Descriptions of the components that are a part of SAP Data Services.

Data Services component descriptions


Component Description

Designer Data Services user interface that enables users to:

● Create, test, and execute jobs that populate a data warehouse


● Create objects and combine them by dragging their icons onto the workspace to create source-to-
target flow diagrams
● Configure objects by opening their editors from the data flow diagram
● Define data mappings, transformations, and control logic
● Create applications by combining objects in to workflows (job execution definitions) and data flows
(data transformation definitions)

Job Server Application that launches the Data Services processing engine and serves as an interface to the engine
and other components in the Data Services suite.

Engine Executes individual jobs that you define in the Designer to effectively accomplish the defined tasks.

Tutorial
10 PUBLIC Introduction to the tutorial
Component Description

Repository Database that stores Designer predefined system objects and user-defined objects including source and
target metadata and transformation rules. Create a local repository and then a central repository to
share objects with other users and for version control.

Access Server Passes messages between Web applications and the Data Services Job Server and engines. Provides a
reliable and scalable interface for request-response processing.

Administrator Web administrator that provides the following browser-based administration of Data Services resources:

● Scheduling, monitoring, and executing batch jobs


● Configuring, starting, and stopping real-time services
● Configuring Job Server, Access Server, and repository usage
● Configuring and managing adapters
● Managing users
● Publishing batch jobs and real-time services via web services

The following diagram illustrates Data Services product components and relationships.

Parent topic: Product overview [page 10]

Tutorial
Introduction to the tutorial PUBLIC 11
Related Information

The Designer user interface [page 12]


About objects [page 13]

2.3.2 The Designer user interface

Use the many tools in SAP Data Services Designer to create objects, projects, data flows, and workflows to
process data.

The Designer interface contains key work areas that help you set up and run jobs. The following illustration
shows the key areas of the Designer user interface.

Parent topic: Product overview [page 10]

Tutorial
12 PUBLIC Introduction to the tutorial
Related Information

Product components [page 10]


About objects [page 13]

2.3.3 About objects

SAP Data Services objects are entities that you create, add, define, modify, or work with in the software.

Each Data Services object has similar characteristics for creating and configuring objects.

Characteristics of Data Services objects

Characteristic Description

Properties Text that describes the object. For example, the name, de­
scription, and creation date describes aspects of an object.

Attributes Properties that organize objects and make them easier for
you to find. For example, organize objects by attributes such
as object types.

Classes Determines whether an object can be used again in a differ-


ent job. Object classes are “reusable” and “single-use”.

The Designer contains a Local Object Library that is divided by tabs. Each tab is labeled with an object type.
Objects in a tab are listed in groups. For example, the Project tab groups projects by project name and further
by job names that exist in the project.

Local Object Library tabs:

● Projects
● Jobs
● Workflows
● Data flows
● Transforms
● Datastores
● Formats
● Functions

Parent topic: Product overview [page 10]

Related Information

Product components [page 10]


The Designer user interface [page 12]

Tutorial
Introduction to the tutorial PUBLIC 13
2.3.3.1 Object hierarchy
Object relationships are hierarchical.

The highest object in the hierarchy is the project. The subordinate objects appear as nodes under a project. You
add subordinate objects to the project in a specific order. For example, A project contains jobs, jobs contain
workflows, and workflows contain data flows.

The following diagram shows the hierarchical relationships for the key object types within Data Services.

Tutorial
14 PUBLIC Introduction to the tutorial
Related Information

About objects [page 13]


Projects and subordinate objects [page 15]
Data flows [page 16]

2.3.3.1.1 Projects and subordinate objects

Projects contain jobs, workflows, and data flows as subordinate objects.

A project is the highest-level object in Designer hierarchy. Projects provide a way to organize the subordinate
objects, which are jobs, workflows, and data flows.

A project is open when you can view it in the project area. If you open a different project from the Project tab in
the object library, the project area closes the current project and shows the project that you just opened.

Projects and subordinates

Object Subordinate objects Subordinate description

Project Job The smallest unit of work that you can


schedule independently for execution.
Jobs are made up of workflows and
data flows that direct the software in
the order and manner of processing.

Job Workflow Incorporates data flows into a coherent


flow of work for an entire job.

Data flow Process flow by which the software


transforms source data into target data.

2.3.3.1.2 Work flows

A work flow specifies the order in which SAP Data Services processes subordinate data flows.

Arrange the subordinate data flows under the work flow so that the output from one data flow is ready for input
to the intended data flow.

Tutorial
Introduction to the tutorial PUBLIC 15
A work flow is a reusable object. It executes only within a Job. Use work flows to:

● Call data flows


● Call another work flow
● Define the order of steps to be executed in your job
● Pass parameters to and from data flows
● Define conditions for executing sections of the project
● Specify how to handle errors that occur during execution

Work flows are optional.

The Data Services objects you can use to create work flows appear as icons on the tool palette to the right of
the workspace. If the object isn't applicable to what you have open in the workspace, the software disables the
icon. The following table contains the programming analogy of each object to describe the role the object plays
in the work flow.

Object Programming Analogy

Procedure
Workflow

Declarative SQL select statement


Data Flow

Subset of lines in a procedure


Script

If, then, else logic


Conditional

A sequence of steps that repeats as long as a condition is


While Loop true

Try block indicator


Try

Try block terminator and exception handler


Catch

Description of a job, work flow, data flow, or a diagram in a


Annotation
workspace

2.3.3.1.3 Data flows


A data flow is the process by which the software transforms source data into target data.

Data flows process data in the order in which they are arranged in a work flow.

A data flow defines the basic task that Data Services accomplishes. The basic task is moving data from one or
more sources to one or more target tables or files.

You define data flows by identifying the sources from which to extract data, the transformations that the data
should undergo, and the targets.

Tutorial
16 PUBLIC Introduction to the tutorial
Use data flows to:

● Identify the source data to read


● Define the transformations to perform on the data
● Identify the target table to load data

A data flow is a reusable object. It is always called from a work flow or a job.

2.3.3.2 Object naming conventions

A consistent naming convention for Data Services objects helps you easily identify objects listed in an object
hierarchy.

SAP uses the following naming conventions:

Object Prefix Suffix Example

Job JOB JOB_SalesOrg

Work flow WF WF_SalesOrg

Data flow DF DF_Currency

Datastore DS ODS_DS

Related Information

Naming conventions for objects in jobs

2.3.3.3 About deleting objects

To delete an object, first decide whether to delete the object from the project or delete the object from the
repository.

When you delete an object from a project in the project area, the software removes the object from the project.
The object is still available in the object library and the repository.

Tutorial
Introduction to the tutorial PUBLIC 17
When you delete the object from the object library, the software deletes all occurrences of the object from the
repository. If the object is called in separate data flows, the software deletes the object from each data flow.
The deletion may adversly affect all related objects.

To protect you from deleting objects unintentionally, the software issues a notice before it deletes the object
from the repository. The notice states that the object is used in multiple locations, and it provides the following
options:

● Yes: Continues with the delete of the object from the repository.
● No: Discontinues the delete process.
● View Where Used: Displays a list of the related objects in which the object will be deleted.

2.4 Preparation for this tutorial


Ensure that you perform all of the preparation for the tutorial so that you can successfully complete each
exercise.

The preparation may include some steps that your administrator has already completed. You may need to
contact your administrator for important connection information and access information related to those
tasks.

We have a complete documentation set for SAP Data Services available on our User Assistance Customer
Portal. If you are unclear about a process in the tutorial, or if you don't understand a concept, refer to the online
documentation at http://help.sap.com/bods.

Tasks required to prepare for the tutorial [page 18]


An overview of the steps to prepare for the SAP Data Services tutorial exercises, and who should
perform the steps.

Tutorial structure [page 25]


We use a simplified data model for the exercises in this tutorial to introduce you to SAP Data Services
features.

Exiting the tutorial [page 26]


You can exit the tutorial at any point in this tutorial.

Resuming the tutorial [page 26]


If you exited the tutorial, you can resume the tutorial at any point.

2.4.1 Tasks required to prepare for the tutorial


An overview of the steps to prepare for the SAP Data Services tutorial exercises, and who should perform the
steps.

 Note

If your administrator has already completed these steps, you may be able to skip the tutorial set up section.

You must have sufficient user permission to perform the exercises in the tutorial. For information about
permissions, see the Administrator Guide.

Tutorial
18 PUBLIC Introduction to the tutorial
You or an administrator sets up your system for this tutorial. Instructions for administrator-only tasks are not
included in the tutorial. The following table lists each task and who performs the task.

Task Who performs

Install Central Management Server (CMS) by installing either Administrator. More information in the Installation Guide.
the SAP BusinessObjects Business Intelligence platform (BI
platform) or the Information platform services platform (IPS
platform).

Install SAP Data Services Administrator. Steps are in the Installation Guide.

Create user account for tutorial participants. Administrator. Steps are in the Administrator Guide.

Create tutorial repository, source, and target databases You or a user who has permission to perform these tasks in
your RDBMS. Steps are in the tutorial.

Establish the tutorial repository as your local repository by Administrator or you, if you have sufficient permission.
using the Repository Manager, the Server Manager, and the Steps are in the tutorial.
Central Management Console (CMC)

Run the tutorial scripts to create source and target tables. Administrator or you. Steps are in the tutorial.

1. Creating repository, source, and target databases on an existing RDBMS [page 20]
Create the three databases using your preferred RDBMS.
2. Creating a local repository [page 21]
Use the repository database that you created earlier in your RDBMS to create a local repository.
3. Defining a job server and associating your repository [page 21]
Use the Data Services Server Manager to configure a new job server and associate the job server with
the local repository.
4. Configuring the local repository in the CMC [page 22]
To continue preparing the SAP Data Services local repository, you enter connection information in the
Central Management Console (CMC)
5. Running the provided SQL scripts [page 23]
Run the tutorial SQL scripts to create the sample source and target tables.

Task overview: Preparation for this tutorial [page 18]

Related Information

Tutorial structure [page 25]


Exiting the tutorial [page 26]
Resuming the tutorial [page 26]

Tutorial
Introduction to the tutorial PUBLIC 19
2.4.1.1 Creating repository, source, and target databases
on an existing RDBMS

Create the three databases using your preferred RDBMS.

An administrator, or a user with sufficient permissions to your RDBMS must perform these steps.

1. Log on to your RDBMS.


2. (Oracle only). Optionally create a service name alias.
Set the protocol to TCP/IP and enter a service name; for example, training.sap. The service name can
act as your connection name.
3. Create three databases and create a user account and password for each one.
We suggest that you use the values in the following table for the databases. We use these values in the
tutorial SQL scripts and throughout the exercises in the tutorial:

Database user names and passwords

Database type User name Password

Repository repo repo

Source ods ods

Target target target

4. Grant access privileges for the user account. For example, grant connect and resource roles for Oracle.
5. Use the following table as a worksheet to note the connection names, database versions, user names, and
passwords for the three databases that you create. We refer you to this information in several of the
exercises in the tutorial.

Value Repository Source Target

Database connection name (Oracle)


or database server name

Database name (such as Oracle or


MS-SQL Server)

Database version

User name

Password

Task overview: Tasks required to prepare for the tutorial [page 18]

Next task: Creating a local repository [page 21]

Tutorial
20 PUBLIC Introduction to the tutorial
2.4.1.2 Creating a local repository

Use the repository database that you created earlier in your RDBMS to create a local repository.

1. Select Start Programs SAP Data Services 4.2 Data Services Repository Manager.

The SAP Data Services Repository Manager opens.


2. Choose Local from the Repository type dropdown list.
3. Select the name of the RDBMS that you used to create the repository database from the Database type
dropdown list.

The remaining connection options are based on the database type you choose.
4. Enter the connection information for the RDBMS repository database that you created.

Use the information in the worksheet that you completed in Creating repository, source, and target
databases on an existing RDBMS [page 20].
5. Type repo for both User and Password.
6. Click Create.

The repository database that you created earlier is now your local repository.

Next, define a Job Server and associate the repository with it.

Task overview: Tasks required to prepare for the tutorial [page 18]

Previous task: Creating repository, source, and target databases on an existing RDBMS [page 20]

Next task: Defining a job server and associating your repository [page 21]

2.4.1.3 Defining a job server and associating your


repository

Use the Data Services Server Manager to configure a new job server and associate the job server with the local
repository.

1. Select Start Programs SAP Data Services 4.2 Data Services Server Manager .

The SAP Data Services Server Manager opens.


2. In the Job Server tab click Configuration Editor.

The Job Server Configuration Editor opens.


3. Click Add.

The Job Server Properties opens.


4. Enter a unique name in Job Server name.
5. Enter a port number in Job Server port.

Enter a port number that is not used by another process on the computer. If you are unsure of which port
number to use, increment the default port number.

Tutorial
Introduction to the tutorial PUBLIC 21
6. Click Add in the Associated Repositories group on the left.

The options under Repository Information at right become active.


7. Select the RDBMS database type that you used for the repository from the Database type dropdown list.

The remaining connection options that appear are applicable to the database type you choose.
8. Enter the remaining connection information based on the information that you noted in the worksheet in
Creating repository, source, and target databases on an existing RDBMS [page 20].

 Example

Type repo for the User name and Password.

9. Select the Default repository checkbox.

Only select Default repository for the local repository. There can be only one default repository. If you are
following these steps to set up a different repository other than the local repository, do not select the
Default repository option.
10. Click Apply to save your entries.

You should see <database_server>_repo_repo in the Associated Repositories group at left.


11. Click OK to close the Job Server Properties.
12. Click OK to close the Job Server Configuration Editor.
13. Click Close and Restart in the Server Manager.
14. Click OK to confirm that you want to restart the Data Services service.

Task overview: Tasks required to prepare for the tutorial [page 18]

Previous task: Creating a local repository [page 21]

Next task: Configuring the local repository in the CMC [page 22]

2.4.1.4 Configuring the local repository in the CMC


To continue preparing the SAP Data Services local repository, you enter connection information in the Central
Management Console (CMC)

Before you can grant repository access to your user, configure the repository in the Central Management
Console (CMC).

1. Log in to the Central Management Console using your tutorial user name and password, tutorial_user
and tutorial_pass.
2. Click Data Services from the Organize list at left.
The Data Services management view opens.

3. Select Manage Configure Repository from the menu at the top.


The Add Data Services Repository view opens.
4. Enter a name in Repository Name.
For example, enter Tutorial Repository.

Tutorial
22 PUBLIC Introduction to the tutorial
5. Enter the connection information for the database you created for the local repository.
6. Click Test Connection.
A dialog appears indicating whether or not the connection to the repository database was successful. Click
OK. If the connection failed, verify your database connection information and re test the connection.
7. Click Save.
The Add Data Services Repository view closes.
8. In the Data Services view, click the Repositories folder node at left.
The existing configured repositories appear. Verify that the new repository is included in the list.
9. Click Log Off to exit the Central Management Console.

Task overview: Tasks required to prepare for the tutorial [page 18]

Previous task: Defining a job server and associating your repository [page 21]

Next task: Running the provided SQL scripts [page 23]

2.4.1.5 Running the provided SQL scripts

Run the tutorial SQL scripts to create the sample source and target tables.

Data Services installation includes a batch file (CreateTables_<databasetype>.bat) for several of the
supported database types. The batch files run SQL scripts that create and populate tables on your source
database and create the target schema on the target database. If you used the suggested file names, user
names, and passwords for the “ods” and “target” databases, you only add the connection name to the
appropriate areas in the script.

1. Locate the CreateTables batch file for your specific RDBMS in the Data Services installation directory.
The default location is <LINK_DIR>\Tutorial Files\Scripts.
2. Right-click and select Edit.

 Tip

Use a copy of the original script file. Rename the original script file indicating that it is the original.

3. If you are not using the suggested user name and password, ods/ods and target/target, update the
script file with the user name and password that you used for the database name and target name.

The Oracle batch file is CreateTables_ORA.bat. It contains the following commands:

sqlplus ods/ods@<connection> @ODS_ORA.sql > CreateTables_ORA.out


sqlplus target/target@<connection> @Target_ORA.sql >> CreateTables_ORA.out

The Microsoft SQL Server batch is CreateTables_MSSQL2005.bat. It contains the following commands:

isql /e /n /U ods /S <servername> /d ods /P ods /i ODS_MSSQL.sql /o


Tutorial_MSSQL.out
isql /e /n /U target /S <servername> /d target /P target /i
Target_MSSQL.sql /o Target_MSSQL.out

Tutorial
Introduction to the tutorial PUBLIC 23
 Note

For Microsoft SQL Server 2008, use the CreateTables_MSSQL2005.bat file.

The output files provide logs that contain success or error notifications that you can examine.
4. Save and close the .bat file.
5. Double-click the batch file name to run the SQL scripts.
6. Use the applicable RDBMS query tool to check your source ODS database.
The following tables should exist on your source database after you run the script. These tables should
include a few rows of sample data.

Descriptive name Table name in database

Customer ods_customer

Material ods_material

Sales Order Header ods_salesorder

Sales Order Line Item ods_salesitem

Sales Delivery ods_delivery

Employee ods_employee

Region ods_region

7. Use an RDBMS query tool to check your target data warehouse.


The following tables should exist on your target database after you run the script.

Descriptive name Table name in database

Sales Org Dimension salesorg_dim

Customer Dimension cust_dim

Material Dimension mtrl_dim

Time Dimension time_dim

Employee Dimension employee_dim

Sales Fact sales_fact

Recovery Status status_table

CDC Status CDC_time

Task overview: Tasks required to prepare for the tutorial [page 18]

Previous task: Configuring the local repository in the CMC [page 22]

Tutorial
24 PUBLIC Introduction to the tutorial
2.4.2 Tutorial structure

We use a simplified data model for the exercises in this tutorial to introduce you to SAP Data Services features.

The tutorial data model is a sales data warehouse with a star schema that contains one fact table and some
dimension tables.

In the tutorial, you perform tasks on the sales data warehouse. We divided the tasks in to the following
segments:

Tutorial segments

Segment Lessons

Populate the Sales Organization Dimension from a flat file Introduces basic data flows, query transforms, and source
and target tables. The exercise populates the Sales Organi­
zation Dimension table from flat-file data.

Populate the Time Dimension table using a transform Introduces Data Services functions. This exercise creates a
data flow for populating the Time Dimension table.

Populate the Customer Dimension from a relational table Introduces data extraction from relational tables. This exer­
cise defines a job that populates the Customer Dimension.

Populate the Material Dimension from an XML File Introduces data extraction from nested sources. This exer­
cise defines a job that populates the Material Dimension.

Populate the Sales Fact table from multiple relational tables Continues data extraction from relational tables and introdu­
ces joins and the lookup function. The exercise populates
the Sales Fact table.

Complete each segment before going on to the next segment. Each segment creates the jobs and objects that
you need in the next segment. And we reinforce each skill in subsequent segments. As you progress, we
eliminate detailed steps for some of the basic skills that we introduced earlier.

Tutorial
Introduction to the tutorial PUBLIC 25
Parent topic: Preparation for this tutorial [page 18]

Related Information

Tasks required to prepare for the tutorial [page 18]


Exiting the tutorial [page 26]
Resuming the tutorial [page 26]
Product overview [page 10]

2.4.3 Exiting the tutorial

You can exit the tutorial at any point in this tutorial.

To exit the tutorial, follow these steps:

1. From the Project menu, click Exit.

If you haven't saved your changes, the software prompts you to save your work before you exit.
2. Click Yes to save your work.

After the software saves your work, it exits Data Services.

Task overview: Preparation for this tutorial [page 18]

Related Information

Tasks required to prepare for the tutorial [page 18]


Tutorial structure [page 25]
Resuming the tutorial [page 26]

2.4.4 Resuming the tutorial

If you exited the tutorial, you can resume the tutorial at any point.

1. Log in to the Designer and select the repository in which you saved your work.
The Designer window opens.
2. From the Project menu, click Open.
3. Click the name of the tutorial project you want to work with, then click Open.

The Designer window opens with the project and the objects within it displayed in the project area.

Tutorial
26 PUBLIC Introduction to the tutorial
Task overview: Preparation for this tutorial [page 18]

Related Information

Tasks required to prepare for the tutorial [page 18]


Tutorial structure [page 25]
Exiting the tutorial [page 26]

Tutorial
Introduction to the tutorial PUBLIC 27
3 Source and target metadata

The software uses metadata for the source and target objects to connect to the data location and to access the
data.

Source and target metadata is especially important when you access data that is in a separate environment
than your Data Services environment.

For the tutorial, you set up logical connections between Data Services, a flat-file source, and a target data
warehouse.

Logging in to the Designer [page 28]


You need to log into the Designer to perform the exercises in the tutorial. After you log in to the
Designer a few times, you won't need to refer to these steps, and you will remember your log in
credentials.

About datastores [page 29]


Datastores contain connection configurations to databases and applications in which you have data.

Importing metadata [page 32]


Import metadata for individual tables using a datastore object.

Defining a file format [page 34]


File formats are a set of properties that describe the structure of a flat file.

Summary and what to do next [page 36]


After you complete the tasks in the Source and target metadata section, make sure that you select to
save the project. The information that you created in this section is saved to the local repository and is
available the next time you log in to Data Services.

3.1 Logging in to the Designer

You need to log into the Designer to perform the exercises in the tutorial. After you log in to the Designer a few
times, you won't need to refer to these steps, and you will remember your log in credentials.

Obtain the required repository user credentials from your administrator. Before you begin, review the options in
step 2 to ensure you have the correct information.

1. Open SAP Data Services Designer.

The SAP Data Services Repository Login opens.


2. Enter the repository user credentials as described in the following table.

Option Description

System-host[:port] The name of the Central Management Server (CMS) system. You may also
need to specify the port when applicable.

Tutorial
28 PUBLIC Source and target metadata
Option Description

User name The name that your administrator used to define you as a user in the Central
Management Console (CMC).

Password The password that your administrator used to define you as a user in the CMC.

Authentication The authentication type used by the CMS.

3. Click Log On.

The software displays a list of existing local repositories.


4. Select the applicable repository and click OK.
A prompt appears asking for the repository password.
5. Enter the repository password and click OK.

Data Services Designer opens. See an example of the Designer interface in The Designer user interface
[page 12].

Next you learn about how to use a datastore to define the connections to the source and target databases.

Task overview: Source and target metadata [page 28]

Related Information

About datastores [page 29]


Importing metadata [page 32]
Defining a file format [page 34]
Summary and what to do next [page 36]
About datastores [page 29]

3.2 About datastores


Datastores contain connection configurations to databases and applications in which you have data.

After you create a datastore, import metadata from the database or application for which you created the
datastore. Use the objects from the import for sources or targets in jobs. Keep in mind that you are importing
only metadata and not the data itself. On a basic level, use imported objects in SAP Data Services as follows:

● As a source object, Data Services accesses the data through the connection information in the datastore
and loads the data into the data flow.
● As a target object, Data Services outputs processed data from the data flow into the target object and, if
configured to do so, uploads the data to the database or application using the datastore connection
information.

In addition to other elements such as functions and connection information, the metadata in a datastore
consists of the following table elements:

Tutorial
Source and target metadata PUBLIC 29
● Table name
● Column names
● Column data types
● Primary key columns
● Table attributes

Data Services datastores can connect to any of the following databases or applications:

● Databases
● Mainframe file systems
● Applications that have prepackaged or user-written adapters
● J.D. Edwards, One World, J.D. Edwards World, Oracle applications, PeopleSoft, SAP applications, SAP Data
Quality Management, microservices for location data, Siebel applications, and Google BigQuery.
● Remote servers using FTP, SFTP, and SCP
● SAP systems: SAP applications, SAP NetWeaver Business Warehouse (BW) Source, and BW Target

For complete information about datastores, see the Designer Guide. See the various supplements for
information about specific databases and applications. For example, for applications with adapters, see the
Supplement for Adapters.

Defining a datastore for the source (ODS) database [page 30]


Create a database datastore to use as a connection to the ODS database that you created when you set
up for the tutorial.

Defining a datastore for the target database [page 32]


Create a database datastore to use as a connection to the database that you named “target” when you
set up for the tutorial.

Parent topic: Source and target metadata [page 28]

Related Information

Logging in to the Designer [page 28]


Importing metadata [page 32]
Defining a file format [page 34]
Summary and what to do next [page 36]

3.2.1 Defining a datastore for the source (ODS) database

Create a database datastore to use as a connection to the ODS database that you created when you set up for
the tutorial.

Use the information from the worksheet that you completed in Creating repository, source, and target
databases on an existing RDBMS [page 20] for completing the options in the datastore that you create in the
following steps.

Tutorial
30 PUBLIC Source and target metadata
1. Open the Datastores tab in the Local Object Library in Designer and right-click in the blank area.
2. Select New from the popup menu.

The Create New Datastore window opens.


3. Type ODS_DS for Datastore name.
4. Select Database from the Datastore type dropdown list.
5. Select the applicable database from the Database type dropdown list.
Choose the database type that you used to create the source database. For example, if you created the
ODS database file using Microsoft SQL Server, select Microsoft SQL Server from the list.

The remaining options change based on the database type you choose.
6. Complete the remaining options that appear after you choose the database type.

The options in the following table are options that are present for most database types. Find your database
in the heading and go down the column to see the options related to that database. If the table doesn't
have your database type, see the Reference Guide for a list of options for each supported database type.

Oracle DB2 MS SQL Server Sybase ASE

Database Subtype Options:


○ Azure PaaS
○ Azure VM
○ On Premise

Database version Database Version Database Version Database version

CDC Options: CDC Options: CDC Options: CDC Options:


○ No CDC ○ No CDC ○ No CDC ○ No CDC
○ Native CDC ○ Replication Server CDC ○ Native CDC ○ Replication Server CDC
○ Replication Server CDC ○ Replication Server CDC

Use TNS name Use data source name Database Subtype Database Version
(DSN)

Hostname Database server name Database Version Database server name

SID or Service Name Database name Database server name Database Name

Port Port Database Name User Name

User Name User Name User name Password

Password Password Password Enable Automatic Data


Transfer

Enable Automatic Data


Transfer

7. Click OK.
Data Services saves a datastore for your source in the repository.

Related Information

Importing metadata [page 32]

Tutorial
Source and target metadata PUBLIC 31
Object naming conventions [page 17]

3.2.2 Defining a datastore for the target database

Create a database datastore to use as a connection to the database that you named “target” when you set up
for the tutorial.

Define a datastore for the “target” database using the same procedure as for the source (ODS) database.
Name the datastore Target_DS.

Related Information

Defining a datastore for the source (ODS) database [page 30]

3.3 Importing metadata

Import metadata for individual tables using a datastore object.

After you create a database datastore for the database, you can import table information from the database.

You can import metadata in the following ways:

● By browsing
● By file name
● By searching

All of these methods are explained in the Reference Guide.

For the tutorial, we take you through the steps to browse for metadata.

1. Importing metadata for ODS source tables [page 33]


Access the ODS datastore external metadata in Designer to import all of the table metadata.
2. Importing metadata for target tables [page 33]
Access the Target datastore external metadata in Designer to import all of the table metadata.

Parent topic: Source and target metadata [page 28]

Related Information

Logging in to the Designer [page 28]


About datastores [page 29]

Tutorial
32 PUBLIC Source and target metadata
Defining a file format [page 34]
Summary and what to do next [page 36]

3.3.1 Importing metadata for ODS source tables

Access the ODS datastore external metadata in Designer to import all of the table metadata.

1. In the Datastores tab, right-click the ODS_DS datastore and click Open.

The names of all the tables in the database defined by the datastore named ODS_DS display in the
workspace. Notice that the External Metadata option at the top of the workspace is automatically selected.
2. Optional. Resize the Metadata column by double-clicking with the resize cursor on the column separator.
3. Select all of the tables to highlight them:
○ ods.ods_customer
○ ods.ods_delivery
○ ods.ods_employee
○ ods.ods_material
○ ods.ods_region
○ ods.ods_salesitem
○ ods.ods_salesorder

Right-click and select Import.

Data Services imports the metadata for each table into the local repository.

 Note

For Microsoft SQL Server databases, the owner prefix might be “dbo” instead of “ods”. For example,
dbo.ods_customer instead of ods.ods_customer.

4. In the Object Library Datastores tab, expand the Tables node under ODS_DS to verify that the tables have
been imported into the repository.

Task overview: Importing metadata [page 32]

Next task: Importing metadata for target tables [page 33]

3.3.2 Importing metadata for target tables

Access the Target datastore external metadata in Designer to import all of the table metadata.

1. In the Datastores tab, right-click the Target_DS datastore and click Open.

The names of all the tables in the database defined by the datastore named Target_DS display in a window
in the workspace. Notice that the External Metadata option at the top of the workspace is automatically
selected.

Tutorial
Source and target metadata PUBLIC 33
2. Optional. Resize the Metadata column by double-clicking with the resize cursor on the column separator.
3. Select all of the tables to highlight them:
○ target.CDC_time
○ target.cust_dim
○ target.employee_dim
○ target.mtrl_dim
○ target.sales_fact
○ target.salesorg_dim
○ target.status_table
○ target.time_dim

Right-click and select Import.

Data Services imports the metadata for each table into the local repository.

 Note

For Microsoft SQL Server databases, the owner prefix might be “dbo” instead of “target”. For example,
dbo.cust_dim instead of target.cust_dim.

4. In the Object Library Datastores tab, expand the Tables node under Target_DS to verify the tables have
been imported into the repository.

Task overview: Importing metadata [page 32]

Previous task: Importing metadata for ODS source tables [page 33]

3.4 Defining a file format

File formats are a set of properties that describe the structure of a flat file.

Use the Data Services file format editor to create a flat file format for sales_org.txt.

1. Open the Formats tab In the Local Object Library and right-click in a blank area in the tab.

2. Select New File Format .


The File Format Editor opens.
3. In the General group at left, make the following settings:
○ Type: Delimited
○ Name: Format_SalesOrg
4. In the Data File(s) group, set the Location option to Local.
5. For the File name(s) option, click the file icon to browse for and select %LINK_DIR%\Tutorial Files
\sales_org.txt. Click Open.
6. Select Yes for the prompt asking to overwrite the current schema.
The file format editor displays sample data from the sales_org.txt file in the data pane in the lower
right.

Tutorial
34 PUBLIC Source and target metadata
7. In the Default Format group select ddmmyyyy from the Date parameter. If ddmmyyyy is not in the
dropdown list, type it in the Default Format space.

The date format matches the date data under the Field3 column in the data pane in the lower right.
8. In the Input/Output group, set Skip row header to Yes.
9. Select Yes for the prompt asking to overwrite the current schema.

The software replaces the original column headers, Fieldx, in the data pane with contents from the 2nd row.
The software also changes the values in the Field Name column in the upper right to the column names.
10. In the schema attributes pane in the upper right, click the cell under the Data Type column in the DateOpen
row. Select date from the dropdown list to change the data type.
The following screen capture shows the completed File Format Editor.

11. Click Save & Close.

Task overview: Source and target metadata [page 28]

Tutorial
Source and target metadata PUBLIC 35
Related Information

Logging in to the Designer [page 28]


About datastores [page 29]
Importing metadata [page 32]
Summary and what to do next [page 36]

3.5 Summary and what to do next

After you complete the tasks in the Source and target metadata section, make sure that you select to save the
project. The information that you created in this section is saved to the local repository and is available the next
time you log in to Data Services.

Also, before you exit Designer, close any open workspace tabs by clicking the X icon in the upper right of each
workspace.

What you have learned in the Source and target metadata section:

● How to define a datastore from Data Services to your target data warehouse
● How to import metadata from target tables into the local repository
● How to define a flat file format and a connection to flat-file source data

What is next: In the next section you populate the Sales Org.Dimension table with data from the
sales_org.txt flat file.

Parent topic: Source and target metadata [page 28]

Related Information

Logging in to the Designer [page 28]


About datastores [page 29]
Importing metadata [page 32]
Defining a file format [page 34]
Populate the Sales Organization dimension from a flat file [page 37]

Tutorial
36 PUBLIC Source and target metadata
4 Populate the Sales Organization
dimension from a flat file

Populate the Sales Org. dimension table with data from a source flat file named Format_SalesOrg.

The following diagram shows the Star Schema with the Dimension file circled.

Each task in this segment builds a Data Services project. Each project contains objects in a specific hierarchical
order.

At the end of each task, save your work. You can either proceed to the next task or exit Data Services. If you exit
Data Services before you save your work, the software asks that you save your work before you exit.

1. Creating a new project [page 38]


Begin the tutorial by creating a new project and opening it in the Project Area of the SAP Data Services
Designer.
2. Adding a new job [page 39]
Create a job for the Class_Exercises project.
3. Adding a workflow [page 39]
Workflows contain the order of steps in which the software executes a job.
4. Adding a data flow [page 40]
Create a data flow named DF_SalesOrg inside the workflow WF_SalesOrg.
5. Define the data flow [page 41]
To define the instructions for building the sales organization dimension table, add objects to
DF_SalesOrg in the workspace area.
6. Validating the DF_SalesOrg data flow [page 44]
Perform a design-time validation, which checks for construction errors such as syntax errors.
7. Addressing warnings and errors [page 45]

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 37
If there are warnings and errors after you validate your job, fix the errors. The job does not execute with
existing errors. You do not need to fix the cause for the warnings because warnings do not prohibit the
job from running.
8. Saving the project [page 46]
You can save the steps you have completed and close Data Services at any time.
9. Ensuring that the Job Server is running [page 46]
Before you execute a job (either as an immediate or scheduled task), ensure that the Job Server is
associated with the repository where the client is running.
10. Executing the job [page 47]
Execute the job to move data from your source to your target.
11. Summary and what to do next [page 49]
In the exercises to populate the Sales Organization dimension table, you learned new skills that you will
use for just about any data flow, and you learned about using functions in an output schema and much
more.

Related Information

Object hierarchy [page 14]

4.1 Creating a new project

Begin the tutorial by creating a new project and opening it in the Project Area of the SAP Data Services
Designer.

Log in to the Designer and follow these steps to create a new project:

1. Select Project New Project .

A list of your existing projects appears. If you do not have any projects created, the list is empty.
2. Enter the following name in Project name: Class_Exercises.
3. Click Create.

The project Class_Exercises appears in the Project Area of the Designer, and in the Project tab of the Local
Object Library.

Next, create a job for the new project. If you plan to exit Data Services, save the project.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Next task: Adding a new job [page 39]

Tutorial
38 PUBLIC Populate the Sales Organization dimension from a flat file
Related Information

Projects and subordinate objects [page 15]


Object naming conventions [page 17]

4.2 Adding a new job

Create a job for the Class_Exercises project.

If you are logged out of SAP Data Services Designer, log in and open the project named Class_Exercises.

Follow these steps to create a new job for the Class_Exercises project:

1. Right-click in the Project Area and select New Batch Job.


2. Rename the new job JOB_SalesOrg.

The job appears in the Project Area under Class_Exercises, and in the Jobs tab under the Batch Jobs node
in the Local Object Library.

Save the new job and proceed to the next exercise. Next you add a workflow to the job JOB_SalesOrg.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Creating a new project [page 38]

Next task: Adding a workflow [page 39]

Related Information

Projects and subordinate objects [page 15]


Object naming conventions [page 17]

4.3 Adding a workflow

Workflows contain the order of steps in which the software executes a job.

In Designer, open the Class_Exercises project and expand it to view the JOB_SalesOrg job.

Follow these steps to add a workflow to the JOB_SalesOrg job:

1. Select JOB_SalesOrg in the Project Area.

The job opens in the workspace and the tool palette appears to the right of the workspace.

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 39
2. Select the workflow button from the tool palette and click the blank workspace area.

A workflow icon appears in the workspace. The workflow also appears in the Project Area hierarchy under
the job JOB_SalesOrg.

 Note

Workflows are easiest to read in the workspace from left to right and from top to bottom. Keep this
arrangement in mind as you add objects to the workflow workspace.

3. Rename the workflow WF_SalesOrg.


4. Expand the JOB_SalesOrg job in the Project Area and click WF_SalesOrg to open it in the workspace.

An empty view of the workflow appears in a new workspace tab. Use this area to define the elements of the
workflow.

Next you create a data flow to add to WF_SalesOrg.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Adding a new job [page 39]

Next task: Adding a data flow [page 40]

Related Information

Work flows [page 15]

4.4 Adding a data flow

Create a data flow named DF_SalesOrg inside the workflow WF_SalesOrg.

Make sure the workflow is open in the workspace. If it is not open, click the WF_SalesOrg workflow in the
Project Area.

1. Click the data flow button on the tool palette to the right of the workspace.
2. Click the workspace.

The data flow icon appears in the workspace and the data flow icon also appears in the Project Area.
3. Enter DF_SalesOrg as the name for the dataflow in the text box above the icon in the workspace.

The project, job, workflow, and data flow objects display in hierarchical form in the Project Area. To navigate
to these levels, expand each node in the project area.
4. Click DF_SalesOrg in the Project Area to open a blank definition area in the workspace.

Next, define the data flow DF_SalesOrg in the definition area that appears in the workspace.

Tutorial
40 PUBLIC Populate the Sales Organization dimension from a flat file
Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Adding a workflow [page 39]

Next task: Define the data flow [page 41]

Related Information

Data flows [page 16]

4.5 Define the data flow

To define the instructions for building the sales organization dimension table, add objects to DF_SalesOrg in
the workspace area.

Build the sales organization dimension table by adding a source file, query object, and a target table to the DF
SalesOrg data flow in the workspace:

The next three tasks guide you through the steps necessary to define the content of a data flow:

1. Add objects to the data flow.


2. Connect them in the order that data flows through them.
3. Define the query that maps the source columns to the target columns.

Adding objects to the DF_SalesOrg data flow [page 42]


Add objects to the DF_SalesOrg data flow workspace to start building the data flow.

Defining the order of steps in a data flow [page 42]


To define the sequence for the dataflow DF_SalesOrg, connect the objects in a specific order.

Configuring the query transform [page 43]


Configure the query transform by mapping columns from the source to the target object.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Adding a data flow [page 40]

Next task: Validating the DF_SalesOrg data flow [page 44]

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 41
4.5.1 Adding objects to the DF_SalesOrg data flow

Add objects to the DF_SalesOrg data flow workspace to start building the data flow.

Make sure the workspace for DF_SalesOrg is open.

1. Open the Formats tab In the Local Object Library and expand the Flat Files node.
2. Click and drag Format_SalesOrg to the workspace and release it.

Position the object to the left of the workspace area to make room for other objects.

A notification appears for you to indicate whether to make the object a source or target.
3. Click Make Source.

4. Click the Query icon on the tool palette. and click in the workspace.

The Query icon appears in the workspace. Drag it to the right of the Format_SalesOrg source object in
the workspace.
5. Open the Datastores tab in the Local Object Library and expand the Target_DS node.
6. Click and drag SALESORG_DIM to the workspace and drop it to the right of the Query icon.

A notification appears for you to indicate whether to make the object a source or target.
7. Click Make Target.

All the objects necessary to create the sales organization dimension table are now in the workspace. In the next
section, you connect the objects in the order in which you want the data to flow.

4.5.2 Defining the order of steps in a data flow

To define the sequence for the dataflow DF_SalesOrg, connect the objects in a specific order.

1. Click the square on the right edge of the Format_SalesOrg source file and drag your pointer to the
triangle on the left edge of the query transform.

When you drag from the square to the triangle, the software connects the two objects with a line. If you
start with the triangle and go to the square, the software won't connect the two objects.
2. Use the same drag technique to connect the square on the right edge of the query transform to the triangle
on the left edge of the SALESORG_DIM target table.

Tutorial
42 PUBLIC Populate the Sales Organization dimension from a flat file
The order of operation is established after you connect all of the objects. Next you configure the query
transform.

4.5.3 Configuring the query transform

Configure the query transform by mapping columns from the source to the target object.

Before you can configure the query transform, you connect the source to the query, and the query to the target
in the workspace. Learn more about creating a data flow and configuring the query transform in the Reference
guide.

When you connect the objects in the data flow, the column information from the source and target files appears
in the Query transform to help you set up the query.

1. Double-click the query in the Project Area.

The query editor opens. The query editor contains the following areas:
○ Schema In pane: Lists the columns in the source file
○ Schema Out pane: Lists the columns in the target file
○ Options pane: Contains tabs for defining the query
Because the query is connected to the target in the data flow, the software automatically copies the target
schema to the Schema Out pane.
2. Select the column icon of the specified input column in the Schema In pane and drag it to the
corresponding column in the Schema Out pane. Map the columns as listed in the following table.

Input column Output column

SalesOffice SALESOFFICE

DateOpen DATEOPEN

Region REGION

 Note

We do not use the Country column for this data flow.

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 43
After you drag the input column to the output column, an arrow icon appears next to the source column to
indicate that the column has been mapped.

The following list names the areas of the query editor marked with red letters in the image.
○ A. Target schema
○ B. Source schema
○ C. Query option tabs
○ D. Column mapping definition

3. Select a field in the Schema Out area and view the column mapping definition in the Mapping tab of the
options pane. For example, in the image above the mapping for the SalesOffice input column to the
SALESOFFICE output column is: Format_SalesOrg.SalesOffice.
4. Click the Type column cell for the SALESOFFICE column in the Schema Out pane and select Decimal.
5. Set Precision to 10 and Scale to 2 in the Type:Decimal popup. Click OK.

6. Click the Back arrow icon from the toolbar at the top of the page to close the query editor and return to
the data flow worksheet.
7. Save your work.

4.6 Validating the DF_SalesOrg data flow

Perform a design-time validation, which checks for construction errors such as syntax errors.

The Validation menu provides design-time validation options. You can check for runtime errors later in the
process.

1. Click DF_SalesOrg in the Project Area.

2. Select Validation Validate Current View .

Tutorial
44 PUBLIC Populate the Sales Organization dimension from a flat file
 Note

There are two validation options to choose:


○ Current View validates the object definition open in the workspace.
○ All Objects in View validates the object definition open in the workspace and all of the objects that it
calls.

You can alternatively use the icon bar and click Validate Current and Validate All to perform the same
validations.

After the validation completes, Data Services displays the Output dialog with the Warning tab indicating any
warnings.

 Note

Two warning messages appear indicating the Data Services will convert the data type for the SALESOFFICE
column.

An Error tab contains any validation errors. You must fix the errors before you can proceed.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Define the data flow [page 41]

Next task: Addressing warnings and errors [page 45]

Related Information

Executing the job [page 47]

4.7 Addressing warnings and errors


If there are warnings and errors after you validate your job, fix the errors. The job does not execute with existing
errors. You do not need to fix the cause for the warnings because warnings do not prohibit the job from running.

After you validate a job, an output window appears listing warnings and errors if applicable.

1. Right-click an error or warning notification and click View.

Data Services displays the Message window in which you can read the expanded notification text.
2. For errors, double-click the error notification to open the editor of the object containing the error.

After you validate the job with no errors, you have completed the description of the data movement for the
sales organization dimension table.

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 45
Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Validating the DF_SalesOrg data flow [page 44]

Next task: Saving the project [page 46]

4.8 Saving the project

You can save the steps you have completed and close Data Services at any time.

● To save objects in a project, select Project Save All .


● To save objects that display in the workspace, select Project Save .

● To save all changed objects from the current session, click the Save All icon in the toolbar.
● Or, simply exit Designer. Data Services presents a list of all changed objects that haven't been saved. Click
Yes to save all objects in the list, or select specific objects to save. Data Services does not save the objects
that you deselect.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Addressing warnings and errors [page 45]

Next: Ensuring that the Job Server is running [page 46]

4.9 Ensuring that the Job Server is running

Before you execute a job (either as an immediate or scheduled task), ensure that the Job Server is associated
with the repository where the client is running.

When the Designer starts, it displays the status of the Job Server for the repository to which you are
connected.

Icon Description

Job Server is running

Job Server is inactive

The name of the active Job Server and port number appears in the status bar when the cursor is over the icon.

Tutorial
46 PUBLIC Populate the Sales Organization dimension from a flat file
Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Saving the project [page 46]

Next task: Executing the job [page 47]

4.10 Executing the job

Execute the job to move data from your source to your target.

Complete all of the steps to populate the Sales Organization Dimension from a flat file. Ensure that all errors
are fixed and that you save the job. If you exited Data Services, log back in to Data Services, and ensure that the
Job Server is running.

1. Select Project Open and select Class Exercises.

The project appears in the Project Area.


2. Right-click the job JOB_SalesOrg in the Project Area and select Execute.
3. If you have not saved changes that you made to the job, the software prompts you to save them. Click Yes.

The software validates the job and displays the Execution Properties.

 Note

If you followed the previous steps to validate your job and fix errors, you should not have errors.

Execution Properties includes parameters and options for executing the job and to set traces and global
variables. Do not change the default settings for this exercise.

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 47
4. Click OK.

Data Services displays a job log in the workspace. Trace messages appear while the software executes the
job.
5. Change the log view by clicking the applicable log button at the top of the job log.

Log files

Log file Description

Trace log A list of the job steps in the order they started.

Tutorial
48 PUBLIC Populate the Sales Organization dimension from a flat file
Log file Description

Monitor log A list of each step in the job, the number of rows proc­
essed by that step, and the time required to complete the
operation.

Error log A list of any errors produced by the RDBMS, Data


Services, or the computer operating system during the job
execution.

 Note
The error icon is not active when there are no errors.

 Note

Remember that you should periodically close the tabs in the workspace when you are finished working with
the objects in the tab. To close a tab, click the X icon in the upper right of the workspace.

Task overview: Populate the Sales Organization dimension from a flat file [page 37]

Previous: Ensuring that the Job Server is running [page 46]

Next: Summary and what to do next [page 49]

4.11 Summary and what to do next


In the exercises to populate the Sales Organization dimension table, you learned new skills that you will use for
just about any data flow, and you learned about using functions in an output schema and much more.

What you have learned in these exercises:

● How to output data to a datastore target


● Define a query transform including configuring input and output schemas, and setting processing options
● Validate a job and fix errors
● Execute a job to populate the Sales Org dimension table in the target data warehouse.

What is next: Populate the Time Dimension table with the following time attributes

● Year number
● Month number
● Business quarter

You can now exit Data Services or go to the next group of tutorial exercises. If you exit, the software reminds
you to save your work if you did not save it before. The software saves all projects, jobs, workflows, data flows,
and results in the local repository.

Tutorial
Populate the Sales Organization dimension from a flat file PUBLIC 49
Parent topic: Populate the Sales Organization dimension from a flat file [page 37]

Previous task: Executing the job [page 47]

Related Information

Populate the Time dimension table [page 51]

Tutorial
50 PUBLIC Populate the Sales Organization dimension from a flat file
5 Populate the Time dimension table

Time dimension tables contain date and time-related attributes such as season, holiday period, fiscal quarter,
and other attributes that are not directly ascertainable from traditional SQL style date and time data types.

The Time dimension table in this example is simple in that it contains only the year number, month number,
and business quarter as Time attributes. It uses a Julian date as a primary key.

1. Opening the Class_Exercises project [page 52]


We use the Class_Exercises project for all of the jobs created in the tutorial.
2. Adding a job and data flow to the project [page 52]
Prepare a new job and data flow to populate the Time dimension table.
3. Adding the components of the time data flow [page 53]
The components of the DF_TimeDim data flow consist of a transform as a source and a datastore as a
target.
4. Defining the flow of data [page 53]
Connect the objects in the DF_TimeDim data flow in the order in which you want Data Services to
process them.
5. Defining the output of the Date_Generation transform [page 54]
Define the Date_Generation transform so it produces a column of dates for a specific range and
increment.
6. Defining the output of the query [page 55]
Configure the query to apply functions to the output columns and to map those columns to an internal
data set.
7. Saving and executing the job [page 56]
After you save the data flow DF_TimeDim, execute the JOB_TimeDim job to populate the TIME_DIM
dimension table with the changed data.

Tutorial
Populate the Time dimension table PUBLIC 51
8. Summary and what to do next [page 56]
In the exercises to populate the Time Dimension table, you practiced the skills that you learned in the
first group of exercises, plus you learned how to use different objects as source and target in a data
flow.

5.1 Opening the Class_Exercises project

We use the Class_Exercises project for all of the jobs created in the tutorial.

If you closed Data Services after the last exercise, log in to Data Services and follow these steps to open the
tutorial project.

1. Click Project Open .


2. Click Class_Exercises.
3. Click OK.

The Class_Exercises project opens in the Project Area.

Task overview: Populate the Time dimension table [page 51]

Next task: Adding a job and data flow to the project [page 52]

5.2 Adding a job and data flow to the project

Prepare a new job and data flow to populate the Time dimension table.

1. Right-click the project name Class_Exercises in the Project Area and select New Batch Job.

The new job appears under the Class_Exercises project node in the Project Area and an empty
workspace opens.
2. Rename the job JOB_TimeDim.

3. Right-click in the empty Job_TimeDim workspace and select Add New Data Flow .
4. Rename the new data flow DF_TimeDim.

5. Select Project Save .

The data flow is now ready for you to define.

 Note

A workflow is an optional object that you can exclude from a data flow. For this job we do not add a
workflow.

Task overview: Populate the Time dimension table [page 51]

Tutorial
52 PUBLIC Populate the Time dimension table
Previous task: Opening the Class_Exercises project [page 52]

Next task: Adding the components of the time data flow [page 53]

5.3 Adding the components of the time data flow

The components of the DF_TimeDim data flow consist of a transform as a source and a datastore as a target.

1. Click DF_TimeDim in the Class_Exercises project in the Project Area.


A blank data flow workspace opens.

2. Open the Transforms tab in the Local Object Library and expand the Data Integrator
node.
3. Drag the Date_Generation transform onto the data flow workspace.

The transforms in the Transform tab are predefined. The transform on your workspace is a copy of the
predefined Date_Generation transform.

4. Click the query button on the tool palette and click in the workspace.

A query object appears in the workspace. Arrange the query to the right of the Date Generation transform.
5. Open the Datastore tab in the Local Object Library and expand the Tables node under Target_DS.
6. Drag the TIME_DIM table onto the workspace and drop it to the right of the query.
7. Click Make Target from the popup menu.

All of the objects to create the time dimension table are in the workspace.

Task overview: Populate the Time dimension table [page 51]

Previous task: Adding a job and data flow to the project [page 52]

Next task: Defining the flow of data [page 53]

5.4 Defining the flow of data

Connect the objects in the DF_TimeDim data flow in the order in which you want Data Services to process
them.

1. Click the square on the right edge of the Date_Generation transform and drag a line to the triangle on the
left edge of the query.

Tutorial
Populate the Time dimension table PUBLIC 53
2. Use the same drag technique to connect the query to the TIME_DIM target.

The connections indicate the flow of data. Now you provide instructions in each object of the data flow so the
software knows how to process the data.

Task overview: Populate the Time dimension table [page 51]

Previous task: Adding the components of the time data flow [page 53]

Next task: Defining the output of the Date_Generation transform [page 54]

5.5 Defining the output of the Date_Generation transform

Define the Date_Generation transform so it produces a column of dates for a specific range and increment.

Connect all of the objects in the data flow in the correct order before you configure them.

1. Click the name of the Date_Generation transform in the Class_Exercises project in the Project Area.

The transform editor opens at right.


2. Type the following values in the Date Generation tab in the editor:

Start date 2002.01.01

End date 2008.12.31

Increment daily

Make sure that Join rank is set at 0 and Cache is not selected.

 Note

The Start Date and End Date options have a dropdown arrow, but you must type the values in for this
exercise.

3. Click the Back arrow in the upper toolbar to close the transform editor and return to the data flow.
4. Save the project.

The software moves the specified data to the Query transform as input.

Task overview: Populate the Time dimension table [page 51]

Tutorial
54 PUBLIC Populate the Time dimension table
Previous task: Defining the flow of data [page 53]

Next task: Defining the output of the query [page 55]

5.6 Defining the output of the query


Configure the query to apply functions to the output columns and to map those columns to an internal data
set.

1. Click the Query object in the project area under the DF_TimeDim data flow.

The Query editor opens. The Query editor has an input schema section with a single column, an output
schema that is copied from the target datastore, and an options section.
2. Drag the DI_GENERATED_DATE column from the input schema to the NATIVEDATE column in the output
schema.
3. Map each of the other output columns in the output schema by following these substeps:

1. Select the column name in the output schema.


2. Open the Mapping tab in the options section and type the corresponding function in the text area.
3. Select the next column and type the corresponding function.

The following table contains the column name and the corresponding function to enter.

Output column name Function in Mapping tab Function description

Date_ID julian(di_generated_date) Use the JULIAN function to set the Julian


date for that date value.

YearNum to_char(di_generated_date,'yyyy Use the TO_CHAR function to select only


') the year out of the date value. Enclose
yyyy in single quotes.

MonthNum month(di_generated_date) Use the MONTH function to set the month


number for that date value.

BusQuarter quarter(di_generated_date) Use the QUARTER function to set the quar­


ter for that date value.

 Note

For this tutorial, the business year is the same as the calendar year.

4. Click the Back arrow on the tool bar. .


5. Save the project.

These columns become the input schema for the TIME_DIM target table.

Task overview: Populate the Time dimension table [page 51]

Previous task: Defining the output of the Date_Generation transform [page 54]

Tutorial
Populate the Time dimension table PUBLIC 55
Next task: Saving and executing the job [page 56]

5.7 Saving and executing the job

After you save the data flow DF_TimeDim, execute the JOB_TimeDim job to populate the TIME_DIM dimension
table with the changed data.

For instructions to validate and execute the job, see Validating the DF_SalesOrg data flow [page 44] and
Executing the job [page 47].

After the job successfully completes, view the output data using your database management tool. Compare
the output to the input data and see how the functions that you set up in the query affected the output data.

 Note

Remember that you should periodically close the tabs in the workspace when you are finished working with
the objects in the tab.

Task overview: Populate the Time dimension table [page 51]

Previous task: Defining the output of the query [page 55]

Next: Summary and what to do next [page 56]

5.8 Summary and what to do next

In the exercises to populate the Time Dimension table, you practiced the skills that you learned in the first
group of exercises, plus you learned how to use different objects as source and target in a data flow.

You have now populated the following tables in the sales data warehouse:

● Sales Org dimension from a flat file


● Time dimension from a transform

What you have learned:

● Use a project for multiple jobs


● Create a job without an optional workflow
● Use a predefined software-supplied transform in your job
● How the Date_Generation transform is used as a source
● How to set up an output schema using functions

In the next section, you will extract data to populate the Customer dimension table.

Tutorial
56 PUBLIC Populate the Time dimension table
You can now exit Data Services or go to the next group of tutorial exercises. If you exit, the software reminds
you to save your work if you did not save it before. The software saves all projects, jobs, workflows, data flows,
and results in the local repository.

Parent topic: Populate the Time dimension table [page 51]

Previous task: Saving and executing the job [page 56]

Related Information

Populate the Customer dimension table from a relational table [page 58]

Tutorial
Populate the Time dimension table PUBLIC 57
6 Populate the Customer dimension table
from a relational table

In this exercise, you populate the Customer dimension table in the Sales star schema with data from a
relational table.

In the past exercises you have used a flat file to populate the Sales Org. dimension table and a transform to
populate the Time dimension table. In this exercise you use a relational table to populate the Customer
dimension table.

You also use the interactive debugger to examine the data after each transform or object in the data flow.

Before you continue with this exercise, make sure that you imported the source and target tables as instructed
in the Importing metadata [page 32] section.

1. Adding the CustDim job and workflow [page 59]


Add a new job and workflow to the Class Exercises project.
2. Adding the CustDim data flow [page 59]
Create a data flow named DF_CustDim inside the workflow WF_CustDim.
3. Define the data flow [page 60]
Add objects to DF_CustDim in the workspace area to define the data flow instructions for populating
the Custom dimension table.
4. Validating the CustDim data flow [page 62]
5. Executing the CustDim job [page 63]
You execute the CustDim job in the same way that you execute the other jobs in the tutorial.
However,we show you how to view data.
6. The interactive debugger [page 63]

Tutorial
58 PUBLIC Populate the Customer dimension table from a relational table
The Designer interactive debugger allows you to examine and modify data row by row using filters and
breakpoints on lines in a data flow diagram.
7. Summary and what to do next [page 67]
In the exercise to populate the Customer dimension table with a relational table, you learned to use
some basic features of the interactive debugger.

6.1 Adding the CustDim job and workflow

Add a new job and workflow to the Class Exercises project.

Open the Class_Exercises project so it appears in the Project Area in Designer.

1. Right-click the Class_Exercises project name and select New Batch Job.

A tab opens in the workspace area for the new batch job.
2. Rename this job JOB_CustDim.
3. Select the workflow button from the tool palette at right and click the workspace area.

The workflow icon appears in the workspace.


4. Rename the workflow WF_CustDim.
5. Save your work.

Task overview: Populate the Customer dimension table from a relational table [page 58]

Next task: Adding the CustDim data flow [page 59]

Related Information

Work flows [page 15]

6.2 Adding the CustDim data flow

Create a data flow named DF_CustDim inside the workflow WF_CustDim.

Make sure the workspace is open for the WF_CustDim workflow.

1. Click the data flow button in the tool palette at right and click in the workspace.

A new data flow appears in the workspace.


2. Rename the data flow DF_CustDim.

The project, job, workflow, and data flow objects display in hierarchical form in the Project Area. To navigate
to these levels, click their names in the project area.

Tutorial
Populate the Customer dimension table from a relational table PUBLIC 59
3. Click DF_CustDim in the Project Area.

A blank definition area for the data flow appears in the workspace.

Task overview: Populate the Customer dimension table from a relational table [page 58]

Previous task: Adding the CustDim job and workflow [page 59]

Next: Define the data flow [page 60]

Related Information

Data flows [page 16]

6.3 Define the data flow

Add objects to DF_CustDim in the workspace area to define the data flow instructions for populating the
Custom dimension table.

In this exercise, you build the data flow by adding the following objects:

● Source table
● Query transform
● Target table

Adding objects to a data flow [page 60]


Add three objects to the DF_CustDim data flow workspace.

Configuring the query transform [page 61]


You configure the query transform by mapping columns from the source to the target objects.

Parent topic: Populate the Customer dimension table from a relational table [page 58]

Previous task: Adding the CustDim data flow [page 59]

Next task: Validating the CustDim data flow [page 62]

6.3.1 Adding objects to a data flow

Add three objects to the DF_CustDim data flow workspace.

Make sure that the DF_CustDim data flow workspace is open.

Tutorial
60 PUBLIC Populate the Customer dimension table from a relational table
1. Open the Datastore tab in the Local Object Library and expand the Tables node under ODS_DS.
2. Drag and drop the ODS_CUSTOMER table to the workspace and click Make Source.

3. Click the query button on the tool palette at right and click in the workspace to the right of the
CUSTOMER table.

The query icon appears in the workspace.


4. Open the Datastore tab in the Local Object Library and expand the Tables node under Target_DS.
5. Drag and drop the CUST_DIM table to the right of the query and click Make Target.
6. Connect the objects to indicate the flow of data, as shown.

6.3.2 Configuring the query transform

You configure the query transform by mapping columns from the source to the target objects.

1. Double-click the query in the workspace to open the query editor.


2. Drag CUST_ID key column from Schema In to the Schema Out column area.

The software adds CUST_ID as a column in Query table.


3. Remap the following source columns to the target schema, leaving the names and data types as they are in
the target.

 Note

Do not map CUST_TIMESTAMP.

Schema In column Schema Out column Description

CUST_CLASSF CUST_CLASSF Customer classification

NAME1 NAME1 Customer name

ADDRESS ADDRESS Address

CITY CITY City

REGION_ID REGION_ID Region

ZIP ZIP Postal code

Tutorial
Populate the Customer dimension table from a relational table PUBLIC 61
 Note

If your database manager is Microsoft SQL Server or Sybase ASE, specify the columns in the order
shown in the table.

4. Click the Back arrow in the icon bar to return to the data flow.
5. Save your work.

6.4 Validating the CustDim data flow

Next you will verify that the data flow has been constructed properly.

From the menu bar, click Validation Validate All Objects in View .

 Note

○ Current View validates the object definition open in the workspace.


○ All Objects in View validates the object definition open in the workspace and all of the objects that it
calls.

You can alternatively use the icon bar and click Validate Current and Validate All to perform the same
validations.

If your design contains syntax errors, a dialog box appears with a message describing the error. Warning
messages usually do not affect proper execution of the job.

If your data flow contains no errors, the following message appears:

Validate: No Errors Found

Task overview: Populate the Customer dimension table from a relational table [page 58]

Previous: Define the data flow [page 60]

Next task: Executing the CustDim job [page 63]

Tutorial
62 PUBLIC Populate the Customer dimension table from a relational table
6.5 Executing the CustDim job

You execute the CustDim job in the same way that you execute the other jobs in the tutorial. However,we show
you how to view data.

1. In the Project Area, right-click the JOB_CustDim job and click Execute.
2. Click OK.

The software opens Execution Properties.


3. Leave all of the default settings and click OK.

The software executes the JOB_CustDim job.


4. Check for any warnings or errors after the job execution completes. If errors exist, fix the errors and
execute the job again.
5. After successful execution, view the output data by following these substeps.

1. Click the DF_CustDim data flow in the Project Area. The data flow workspace opens.
2. Click the magnifying glass that appears on the lower right corner of the target object.

A sample view of the output data appears in the lower pane. Notice that there is not a CUST_TIMESTAMP
column in the output file. However, the software added the CUST_ID column to the output file.

For information about the icon options above the sample data, see the Designer Guide.

Task overview: Populate the Customer dimension table from a relational table [page 58]

Previous task: Validating the CustDim data flow [page 62]

Next: The interactive debugger [page 63]

6.6 The interactive debugger

The Designer interactive debugger allows you to examine and modify data row by row using filters and
breakpoints on lines in a data flow diagram.

The debugger allows you to examine what happens to the data after each transform or object in the flow.

● Debug filter: Functions as a simple query transform with a WHERE clause. Use a filter to reduce a data set
in a debug job execution.
● Breakpoint: Location where a debug job execution pauses and returns control to you.

When you start a job in the interactive debugger, Designer displays three additional panes as well as the View
Data panes beneath the workspace area. The following diagram shows the default locations for these panes.

Tutorial
Populate the Customer dimension table from a relational table PUBLIC 63
1. View data panes, left and right
2. Call Stack pane
3. Trace pane
4. Debug Variables pane

The left View Data pane shows the data in the CUSTOMER source table, and the right pane shows one row at a
time (the default) that has passed to the query.

Optionally, set a condition in a breakpoint to search for specific rows. For example, you can set a condition to
stop the data flow when the debugger reaches a row in the data with a Region_ID value of 2.

In the next exercise, we show you how to set a breakpoint and debug your DF_CustDim data flow.

Learn more about the interactive debugger in the Designer Guide.

Setting a breakpoint in a data flow [page 65]


A breakpoint is a location in the data flow where a debug job execution pauses and returns control to
you.

Debugging Job_CustDim with interactive debugger [page 66]


Run an interactive debuging on the Customer Dimension job to see the basic functionality of the
interactive debugger.

Setting a breakpoint condition [page 66]


Set a condition on the breakpoint to stop processing when a specific condition is met.

Tutorial
64 PUBLIC Populate the Customer dimension table from a relational table
Parent topic: Populate the Customer dimension table from a relational table [page 58]

Previous task: Executing the CustDim job [page 63]

Next: Summary and what to do next [page 67]

6.6.1 Setting a breakpoint in a data flow

A breakpoint is a location in the data flow where a debug job execution pauses and returns control to you.

Ensure that you have the Class_Exercises project open in the Project Area.

Follow these steps to set a breakpoint in the DF_CustDim data flow.

1. Expand JOB_CustDim and click DF_CustDim in the Project Area.

The DF_CustDim definition opens in the workspace.


2. Right-click the connector line between the source table and the query and select Set Breakpoint.

A red breakpoint icon appears on the connector line.


3. Double-click the breakpoint icon on the connector to open the Breakpoint editor.

The Breakpoint settings are in the right pane of the Breakpoint editor.
4. Select the Set checkbox.

Leave the other options set at the default settings.

Tutorial
Populate the Customer dimension table from a relational table PUBLIC 65
5. Click OK.

6.6.2 Debugging Job_CustDim with interactive debugger


Run an interactive debuging on the Customer Dimension job to see the basic functionality of the interactive
debugger.

1. In the Designer Project Area, right-click Job_CustDim and select Start debug.
Click OK if you see a prompt to save your work.

The Debug Properties editor opens. See The interactive debugger [page 63] for an explanation of the
Debug Properties editor.
2. Click OK to close the Debug Properties editor.

The debugging stops after the first row and displays the View data left and right panes.

3. To process the next row, click from the icon toolbar at the top of the workspace area.

The next row replaces the existing row in the right view data pane.
4. To see all debugged rows, select the All checkbox in the upper right of the right view data pane.

The right pane shows the first two rows that it has debugged.
5. To stop the debug mode, click Stop Debug from the Debug menu, or click the Stop Debug button on the

toolbar. .

Now debug the job with a breakpoint condition.

6.6.3 Setting a breakpoint condition


Set a condition on the breakpoint to stop processing when a specific condition is met.

For example, add a breakpoint condition for the Customer Dimension job to break when the debugger reaches
a row in the data with a Region_ID value of 2.

1. Open the breakpoint dialog box by double-clicking the breakpoint icon in the data flow.
2. Click the cell under the Column heading and click the down arrow to display a dropdown list of columns.
3. Click CUSTOMER.REGION_ID.
4. Click the cell under the Operator heading and click the down arrow to display a dropdown list of operators.
Click = .
5. Click the cell under the Value heading and type 2.
6. Click OK.
7. Right-click the job name and click Start debug.
The debugger stops after processing the first row with a Region_ID of 2. The right View Data pane shows
the break point.
8. To stop the debug mode, from the Debug menu, click Stop Debug, or click the Stop Debug button on the
toolbar.

Tutorial
66 PUBLIC Populate the Customer dimension table from a relational table
6.7 Summary and what to do next

In the exercise to populate the Customer dimension table with a relational table, you learned to use some basic
features of the interactive debugger.

What you learned:

● Extract data from a relational table


● View a sample of the data by clicking the magnifying glass in the lower right corner of the source or target
icon in the data flow
● Remap source columns to specific data types in the query transform
● Use the basic features of the interactive debugger

In the next section, you learn about document type definitions (DTD) and extracting data from an XML file.

For more information about the topics covered in this section, see the Designer Guide.

Parent topic: Populate the Customer dimension table from a relational table [page 58]

Previous: The interactive debugger [page 63]

Related Information

Populate the Material Dimension from an XML File [page 68]

Tutorial
Populate the Customer dimension table from a relational table PUBLIC 67
7 Populate the Material Dimension from an
XML File

In this exercise, we use a DTD to define the format of an XML file, which has a hierarchical structure. The
software can process the data only after you have flattened the hierarchy.

An XML file represents hierarchical data using XML tags instead of rows and columns as in a relational table.

There are two methods for flattening the hierarchy of an XML file so that the software can process your data. In
this exercise we first use a Query transform and systematically flatten the input file structure. Then we use an
XML_Pipeline transform to select portions of the nested data to process.

To help you understand the goal for the tasks in this section, read about nested data in the Designer Guide.

1. Nested data [page 69]


The software provides a way to view and manipulate hierarchical relationships within data flow sources,
targets, and transforms using Nested Relational Data Modeling (NRDM).
2. Adding MtrlDim job, workflow, and data flow [page 69]
To create the objects for this task, we omit the details and rely on the skills that you learned in the first
few exercises of the tutorial.
3. Importing a document type definition [page 70]
Import the document type definition (DTD) schema named mtrl.dtd as described in the following
steps.
4. Define the MtrlDim data flow [page 71]
In this exercise you add specific objects to the DF_MtrlDim data flow workspace and connect them in
the order in which the software should process them.
5. Validating that the MtrlDim data flow has been constructed properly [page 75]

Tutorial
68 PUBLIC Populate the Material Dimension from an XML File
After unnesting the source data using the Query in the last exercise, validate the DF_MtrlDim to make
sure there are no errors.
6. Executing the MtrlDim job [page 76]
After you save the MtrlDim data flow, execute the MtrlDim job.
7. Leveraging the XML_Pipeline [page 76]
The main purpose of the XML_Pipeline transform is to extract parts of the XML file.
8. Summary and what to do next [page 79]
In this section you learned two ways to process an XML file: With a Query transform and with the XML
Pipeline transform.

Related Information

Designer Guide: Nested Data


Designer Guide: Nested Data, Operations on nested data, Overview of nested data and the Query transform
Designer Guide, Nested Data, Operations on nested data, Unnesting nested data

7.1 Nested data

The software provides a way to view and manipulate hierarchical relationships within data flow sources, targets,
and transforms using Nested Relational Data Modeling (NRDM).

In this tutorial, we use a document type definition (DTD) schema to define an XML source. XML files have a
hierarchical structure. The DTD describes the data contained in the XML document and the relationships
among the elements in the data.

You imported the mtrl.dtd file when you ran the script for this tutorial. It is located in the Formats tab of the
Local Object Library under Nested Schemas.

For complete information about nested data, see the Designer Guide.

Parent topic: Populate the Material Dimension from an XML File [page 68]

Next task: Adding MtrlDim job, workflow, and data flow [page 69]

7.2 Adding MtrlDim job, workflow, and data flow

To create the objects for this task, we omit the details and rely on the skills that you learned in the first few
exercises of the tutorial.

1. Add a new job to the Class_Exercises project and name it JOB_MtrlDim. To remind you of the steps,
see Adding the CustDim job and workflow [page 59].

Tutorial
Populate the Material Dimension from an XML File PUBLIC 69
2. Add a workflow and name it WF_MtrlDim. To remind you of the steps, see Adding the CustDim job and
workflow [page 59].
3. Click WF_MtrlDim in the Project Area to open it in the workspace.
4. Add a data flow to the workflow definition and name it DF_MtrlDim. To remind you of the steps, see
Adding a data flow [page 40].

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous: Nested data [page 69]

Next task: Importing a document type definition [page 70]

7.3 Importing a document type definition

Import the document type definition (DTD) schema named mtrl.dtd as described in the following steps.

1. Open the Formats tab in the Local Object Library.

2. Right-click Nested Schemas and click New DTD .

The Import DTD Format dialog opens.


3. Type Mtrl_List for the DTD definition name.
4. Click Browse and open <LINK_DIR>\Tutorial Files\mtrl.dtd.

The directory and file name appears in File name.


5. For File type, keep the default option value of DTD .
6. Click the dropdown arrow In Root element name and select MTRL_MASTER_LIST.
7. Click OK.

The software adds the DTD file Mtrl_List to the Nested Schemas group in the Local Object Library.

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous task: Adding MtrlDim job, workflow, and data flow [page 69]

Next: Define the MtrlDim data flow [page 71]

Tutorial
70 PUBLIC Populate the Material Dimension from an XML File
7.4 Define the MtrlDim data flow

In this exercise you add specific objects to the DF_MtrlDim data flow workspace and connect them in the
order in which the software should process them.

Follow the tasks in this exercise to configure the objects in the DF_MtrlDim data flow so that the data flow
correctly processes hierarchical data from an XML source file.

The following objects make up the DF_MtrlDim data flow:

● Source XML file


● Query transform
● Target table

Adding objects to DF_MtrlDim [page 71]


Build the DF_MtrlDim data flow with a source, target, and query transform.

Configuring the qryunnest query [page 72]


Use the query transform to unnest the hierarchical Mtrl_List XML source data properly.

Parent topic: Populate the Material Dimension from an XML File [page 68]

Previous task: Importing a document type definition [page 70]

Next task: Validating that the MtrlDim data flow has been constructed properly [page 75]

7.4.1 Adding objects to DF_MtrlDim

Build the DF_MtrlDim data flow with a source, target, and query transform.

1. Click DF_MtrlDim in the Project Area.

The DF_MtrlDim data flow workspace opens at right.


2. Open the Formats tab in the Local Object Library and expand Nested Schemas.
3. Place the Mtrl_List file in the DF_MtrlDim workspace using drag and drop.
4. Select Make File Source from the dialog box notice.
5. Double-click Mtrl_List in the workspace.

The Source File Editor opens containing the Schema Out options in the upper pane and the Source options
in the lower pane.
6. In the Source options, leave File Location set to {none}.
7. Click the File dropdown arrow and select <Select file>.

The Open dialog appears.


8. Select mtrl.xml located in <LINK_DIR>\Tutorial Files and click Open.

The File option populates with the file name and location of the XML file.

Tutorial
Populate the Material Dimension from an XML File PUBLIC 71
9. Select XML.
10. Select Enable validation to enable comparison of the incoming data to the stored data type definition (DTD)
format.

The software automatically populates the following options in the Source tab in the lower pane:
○ Format name: The software automatically populates with the schema name Mtrl_List
○ Root element name: The software automatically populates with the primary node name
MTRL_MASTER_LIST

 Note

You cannot edit these values.

11. Click the back arrow icon to return to the DF_MtrlDim data flow workspace.
12. Click the query transform icon in the tool palette and then click to the right of the table in the workspace.

The new query transform appears in the workspace area.


13. Name the query qryunnest.
14. Open the Datastore tab in the Local Object Library and expand the table node under Target_DS.
15. Add the MTRL_DIM table to the right of qryunnest in the workspace using drag and drop.
16. Click Make Target from the dialog box notice.
17. Connect the objects in the data flow to indicate the flow of data from the source XML file through the query
to the target table.
18. Save your work.

The DF_MtrlDim data flow is ready for configuration.

7.4.2 Configuring the qryunnest query

Use the query transform to unnest the hierarchical Mtrl_List XML source data properly.

This process is lengthy. Make sure that you take your time and try to understand what you accomplish in each
step.

1. Click qryunnest in the Project Area to open the query editor in the workspace.

 Note

Notice the nested structure of the source in the Schema In pane. Notice the differences in column
names and data types between the input and output schemas.

Tutorial
72 PUBLIC Populate the Material Dimension from an XML File
Instead of dragging individual columns to the Schema Out pane like you usually do, you use specific
configuration settings and systematically unnest the table.

2. Multiselect the five column names in the Schema out pane so they are highlighted: MTRL_ID, MTRL_TYP,
IND_SECTOR, MTRL_GRP, and DESCR.
3. Right-click and select Cut.

The software cuts the five columns and saves the column names and data types to your clipboard.

 Note

Do not use Delete. By selecting Cut instead of Delete, the software captures the correct column names
and data types from the target schema. In a later step, we instruct you to paste the clipboard
information to the Schema Out pane of the MTRL_Master target table schema.

4. From the Schema In pane of the qryunnest query editor, drag the MTRL_MASTER schema to the Schema
Out pane.

The software adds the MTRL_Master table and subtable structure to the qryunnest query in the Schema
Out pane.
5. In the Designer Project Area, click the MTRL.DIM target table.

The target table editor opens in the workspace area.

 Note

The Schema In pane in the MTRL_DIM target table editor contains the MTRL_MASTER schema that you
just moved to the Schema Out pane in qryunnest query editor. Now you flatten the Schema Out in the
qryunnest query to fit the target table.

6. Follow these substeps to flatten the columns in the qryunnest Schema Out pane to fit the target table:
a. Select the qryunnest tab in the Workspace area to open the query editor.
b. Right-click MTRL_MASTER in the Schema Out pane and choose Make Current.

The MTRL_MASTER schema is available to edit.


c. Multiselect the following columns in the Schema Out pane under MTRL_Master: MTRL_ID,
MTRL_TYPE, IND_SECTOR, MTRL_GROUP, UNIT, TOLERANCE, and HAZMAT_IND.
d. Right-click and select Delete.

Tutorial
Populate the Material Dimension from an XML File PUBLIC 73
e. Right-click the MTRL_MASTER schema in the Schema Out pane and select Paste.

You have added all of the columns back that you cut from the Schema Out pane in queryunnest
earlier. This time, however, the columns appear under the MTRL_MASTER schema that you copied
from the Schema In pane.
f. Map the MTRL_ID, MTRL_TYPE, IND_SECTOR, and MRTL_GROUP columns in the Schema In pane to
the corresponding columns in the Schema Out pane using drag and drop.
7. Now follow these substeps to map the DESCR column in the Schema Out pane to SHORT_TEXT in the
TEXT nested schema in the Schema In pane of queryunnest:
a. Right-click the DESCR column in the Schema Out pane and select Cut.

The DESCR information is saved to your clipboard.


b. In the Schema Out pane, right-click the TEXT schema, and click Make Current.
c. In the Schema Out pane, right-click the LANGUAGE column, select Paste, and select Insert Below. The
software places the DESCR column at the same level as the SHORT_TEXT column.
d. Map the SHORT_TEXT column from the Schema In pane to the DESCR column in the Schema Out
pane using drag and drop.
8. In the Schema Out pane, multiselect LANGUAGE, SHORT_TEXT, and TEXT_nt_1, right-click and select
Delete.

Now the TEXT schema in the Schema Out pane contains only the DESCR column.
9. Open the MTRL_DIM target table tab in the workspace area to view the schema.

The Schema In pane shows the same schemas and columns that appear in the queryunnest query
Schema Out pane. However, the Schema In of the MTRL_DIM target table is still not flat, and it will not
produce the flat schema that the target requires. Perform the following substeps to flatten the Schema In
schema:
10. Select the qryunnest tab in the workspace area to view the query editor.
11. In the Schema Out pane, right-click the TEXT schema and click Unnest.

The table icon changes to one with an arrow.


12. Select the MTRL_DIM target table tab in the workspace area to open the target table schema.

Notice that the Schema In pane contains two levels: qryunnest as parent level, and MTRL_MASTER as
child. The Schema Out pane shows one level. We still have to reduce the Schema In of the qryunnest
query to one level.
13. Open the qryunnest tab in the Workspace area to open the query editor.
14. Right-click MTRL_MASTER in the Schema Out pane and click Make Current.

The MTRL_Master table is now editable.


15. Right-click MTRL_MASTER in the Schema Out pane and click Unnest.
16. Open the MTRL_DIM target table tab in the workspace area.

The Schema In and Schema Out panes show one level for each.

Tutorial
74 PUBLIC Populate the Material Dimension from an XML File
17. From the Project menu, click Save All.

7.5 Validating that the MtrlDim data flow has been


constructed properly

After unnesting the source data using the Query in the last exercise, validate the DF_MtrlDim to make sure
there are no errors.

1. In the Project Area, click the DF_MtrlDim data flow name to open the editor at right.

2. Click the Validate All icon or select Validate All Objects in View .

You should see warning messages indicating that data type convertion will be used to convert from varchar
(1024) to the data type and length of the target file. The output schema in the query was set to preserve
the date types from the output schema, so you do not have to change anything because of the warnings.

If your design contains any errors in the Errors tab, you must fix them. Go back over the steps in the
exercise to make sure you didn't miss any steps. If you have syntax errors, a dialog box appears with a
message describing the error. Address all errors before continuing.

If you get the error message: "The flat loader...cannot be connected to NRDM," right-click the error
message and click Go to error, which opens the editor for the object in question. In this case, the source
schema is still nested. Return to the qryunnest query editor and unnest the output schema(s).

The next section describes how to execute the job.

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous: Define the MtrlDim data flow [page 71]

Next task: Executing the MtrlDim job [page 76]

Tutorial
Populate the Material Dimension from an XML File PUBLIC 75
7.6 Executing the MtrlDim job

After you save the MtrlDim data flow, execute the MtrlDim job.

1. In the project area, right-click JOB_MtrlDim and click Execute.


2. If prompted to save your work, click OK.
3. In the Execution Properties dialog box, click OK.
4. After the job completes, ensure there are no error or warning messages.
5. To view the captured sample data, in the project area select the data flow to open it in the workspace. Click
the magnifying glass on the target MTRL.DIM table to view its six rows.

Or, use a query tool in your RDBMS to check the contents of the MTRL.DIM table.

The next section describes an alternate way to capture XML data.

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous task: Validating that the MtrlDim data flow has been constructed properly [page 75]

Next task: Leveraging the XML_Pipeline [page 76]

7.7 Leveraging the XML_Pipeline

The main purpose of the XML_Pipeline transform is to extract parts of the XML file.

When you extract data from an XML file to load into a target data warehouse, you usually obtain only parts of
the XML file. The Query transform does partial extraction (as the previous exercise shows), and it does much
more because it has many of the clauses of a SQL SELECT statement.

Because the XML_Pipeline transform focuses on partial extraction, it utilizes memory more efficiently and
performs better than the Query transform for this purpose.

● The XML_Pipeline transform uses less memory because it processes each instance of a repeatable
schema within the XML file, rather than building the whole XML structure first.
● The XML_Pipeline transform continually releases and reuses memory to steadily flow XML data through
the transform.

You can use the XML_Pipeline transform as an alternate way to build the Material dimension table. The data
flow components for this alternate way will consist of the following objects:

● The source XML file


● An XML_Pipeline transform to obtain a repeatable portion of the nested source schema
● A query to map the output of the XML_Pipeline transform to the flat target schema
● The target table into which the material dimension data loads

Setting up a job and data flow that uses the XML_Pipeline transform [page 77]
In this exercise, you will achieve the same outcome as in the previous exercise, but you use the XML
Pipeline transform for more efficient configuration and processing.

Tutorial
76 PUBLIC Populate the Material Dimension from an XML File
Configuring the XML_Pipeline and Query_Pipeline transforms [page 78]
Open the transform and the query to map input columns to output columns.

Task overview: Populate the Material Dimension from an XML File [page 68]

Previous task: Executing the MtrlDim job [page 76]

Next: Summary and what to do next [page 79]

7.7.1 Setting up a job and data flow that uses the


XML_Pipeline transform

In this exercise, you will achieve the same outcome as in the previous exercise, but you use the XML Pipeline
transform for more efficient configuration and processing.

1. Add a new job and name it JOB_Mtrl_Pipe.


2. Add a new work flow job and name it WF_Mtrl_Pipe.
3. Add a data flow to the work flow definition and name it DF_Mtrl_Pipe.
4. Click the name of the data flow to open the data flow definition.
5. In the object library on the Formats tab, expand Nested Schemas.
6. Drag the Mtrl_List file into the DF_Mtrl_Pipe definition workspace, drop it on the left side, and click Make
Source.
7. Click the Mtrl_List name in the workspace to configure it.
8. On the Source tab, ensure XML is selected.
9. Click the down arrow in File and click Select file.
10. Go to <LINK_DIR>\Tutorial Files\ and slect mtrl_list.xml. Click Open to import the mtrl.xml file.
11. Select Enable Validation to enable comparison of the incoming data to the stored DTD format.
12. Click the back arrow to return to the data flow.
13. Select DF_Mtrl_Pipe to open the data flow workspace.
14. In the Local Object Library, open the Transforms tab and expand the Data Integrator transforms.
15. Drag the XML_Pipeline transform into the DF_Mtrl_Pipe definition workspace, and drop it to the right of
Mtrl_List source.
16. In the Transforms tab of the Local Object Library, expand the Platform transforms.
17. Drag the Query transform into the workspace, drop it to the right of XML_Pipeline , and name the query
Query_Pipeline.
18. Open the Datastores tab in the Local Object Library and expand the table node under Target_DS.
19. Drag and drop the MTRL_DIM table to the workspace and click Make Target.
20.Connect the icons to indicate the flow of data from the source XML file through the XML_Pipeline and
Query_Pipeline transforms to the target table.

Tutorial
Populate the Material Dimension from an XML File PUBLIC 77
21. Save all files.

 Note

Remember to periodically close the tabs in the workspace area.

7.7.2 Configuring the XML_Pipeline and Query_Pipeline


transforms

Open the transform and the query to map input columns to output columns.

Set up the job as instructed in Setting up a job and data flow that uses the XML_Pipeline transform [page 77].

 Note

Unlike the qryunnest Query, the XML_Pipeline transform allows you to map a nested column directly to a
flat target.

1. Click XML_Pipeline in the Project Aea to open the transform editor.

The Schema In pane shows the nested structure of the source file.
2. Multiselect the following columns from the Schema In pane and drag them to the XML_Pipeline
transform Schema Out pane.
○ MTRL_ID
○ MTRL_TYPE
○ IND_SECTOR
○ MRTL_GROUP
○ SHORT_TEXT
3. Click the back arrow from the icon menu.
4. Click Query_Pipeline to open the query editor.
5. Map each Schema In column to the corresponding columns in the Schema Out pane.
When you drag each column from the Schema In pane to the Schema Out pane, the Type in the Schema
Out pane remains the same even though the input fields have the type varchar(1024).
Optional. For an experiment, remap one of the fields. After you drop the field into the Schema Out pane, a
popup menu appears. Choose Remap Column. The Remap Column option preserves the name and data
type in Schema Out.
6. In the Project Area, click the MTRL_DIM target table to open the target editor.

Tutorial
78 PUBLIC Populate the Material Dimension from an XML File
7. Open the Options tab in the lower pane and select Delete data from table before loading.

This option deletes existing data in the table before loading new data. If you do not select this option, the
software appends data to the existing table.
8. In the project area, click DF_MTRL_Pipe to return to the data flow.
9. Select the Validate icon from the menu.

The Warnings tab opens. The warnings indicate that each column will be converted to the data type in the
Schema Out pane.

There should not be any errors. If there are errors, you may have missed a step. Fix the errors and try to
validate again.
10. In the Project Area, right-click JOB_Mtrl_Pipe and click Execute.
11. If prompted to save your work, click OK.
12. Accept the default settings in Execution Properties and click OK.
13. After the job completes, ensure that there are no error or warning messages.
14. To view the captured sample data, in the project area select the data flow to open it in the workspace. Click
the magnifying glass on the target MTRL.DIM table to view the six rows of data.
Alternately, use a query tool in your RDBMS to check the contents of the MTRL.DIM table.

7.8 Summary and what to do next

In this section you learned two ways to process an XML file: With a Query transform and with the XML Pipeline
transform.

We walked you through using a Query transform to flatten a nested schema. And we worked with a data type
definition (DTD) file for a source XML file.

If you are unclear about how Data Services processes XML files, and about nested data, see the Designer Guide
for more details.

At this point in the tutorial you have populated the following four tables in the sample data warehouse:

● Sales organization dimension from a flat file


● Time dimension using the Date Generation transform
● Customer dimension from a relational table
● Material dimension from a nested XML file

In the next section you will populate the sales fact table from more than one source.

Parent topic: Populate the Material Dimension from an XML File [page 68]

Previous task: Leveraging the XML_Pipeline [page 76]

Tutorial
Populate the Material Dimension from an XML File PUBLIC 79
Related Information

Designer Guide: Nested Data


Reference Guide: Transforms

Tutorial
80 PUBLIC Populate the Material Dimension from an XML File
8 Populate the Sales Fact Table from
Multiple Relational Tables

In this exercise you learn about using joins and functions to populate the Sales Fact table from the Sales star
schema with data from multiple relational tables.

The exercise joins data from two source tables and loads it into a target table.

1. Adding the SalesFact job, work flow, and data flow [page 82]
Use the basic skills that you have learned in earlier exercises to set up a new job named JOB_SalesFact.
2. Creating the SalesFact data flow [page 82]
Add objects to DF_SalesFact and connect the objects to set the flow of data.
3. Defining the details of the Query transform [page 83]
Set up a table join, a filter, and a Lookup expression in the query transform, and then map columns
from the Schema In columns to the Schema Out columns.
4. Using a lookup_ext function for order status [page 85]
Create a Lookup expression to select a column from the ODS_DELIVERY table to include in the
SALES_FACT output table based on two conditions.
5. Validating the SalesFact data flow [page 89]
Use the skills you obtained from previous exercises to validate the data flow.
6. Executing the SalesFact job [page 89]
After you have performed the validation step and fixed any errors, the SalesFact job should execute
without errors.
7. Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]
Use the metadata reporting tool to browse reports about metadata associated with the SalesFact job.
The metadata reporting tool is a Web-based application.
8. Summary and what to do next [page 93]

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 81
In this section you joined two source tables using a filter, and you used a Lookup expression to add a
column from a related table that was not one of the source tables.

Related Information

Reference Guide: Transforms, Platform transforms, Query transform, Joins in the Query transform
Designer Guide: Nested data, Operations on nested data

8.1 Adding the SalesFact job, work flow, and data flow

Use the basic skills that you have learned in earlier exercises to set up a new job named JOB_SalesFact.

1. Add a new job to the Class_Exercises project and name it JOB_SalesFact.


2. Add a workflow to JOB_SalesFact and name it WF_SalesFact.
3. Add a data flow to the workflow definition WF_SalesFact and name it DF_SalesFact.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Next task: Creating the SalesFact data flow [page 82]

Related Information

Adding a new job [page 39]


Adding a workflow [page 39]
Adding a data flow [page 40]

8.2 Creating the SalesFact data flow

Add objects to DF_SalesFact and connect the objects to set the flow of data.

Optional. The data flow has two sources. To make the workspace look more organized, change the appearance
of the data flow by following these steps:

1. Select Tools Options .


2. Expand the Designer node and click Graphics.
3. Click the dropdown arrow in the Workspace Flow Type option and select Data Flow.
4. Select Horizontal/Vertical for the Line Type option. Click OK.

Tutorial
82 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Follow these steps to set up the data flow:

1. Click the DF_SalesFact data flow in the Project Area to open the data flow workspace.
2. Open the Datastores tab in the Local Object Library and expand the Tables node under ODS_DS.
3. Move the ODS_SALESITEM table to the left side of the workspace using drag and drop. Click Make Source.
4. Move the ODS_SALESORDER table to the left of the ODS_SALESITEM table in the workspace using drag and
drop. Click Make Source.

5. Add a query to the data flow from the tool palette .


6. Open the Datastores tab in the Local Object Library and expand the Tables node under Target_DS.
7. Move the SALES_FACT table to the workspace using drag and drop. Select Make Target.
8. In the data flow workspace, connect the icons to indicate the flow of data as shown in the following
diagram.

9. Save your work.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Adding the SalesFact job, work flow, and data flow [page 82]

Next task: Defining the details of the Query transform [page 83]

8.3 Defining the details of the Query transform

Set up a table join, a filter, and a Lookup expression in the query transform, and then map columns from the
Schema In columns to the Schema Out columns.

1. Expand DF_SalesFact in the Project Area and click the query to open the editor.
2. Open the FROM tab in the options pane.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 83
3. Click the dropdown arrow under the Left column heading in the Join pairs area and select
ODS_SALESORDER.

The ODS_SALESORDER table is now the left portion of the join.


4. Select the dropdown arrow under the Right column heading and select ODS_SALESITEM.

The ODS_SALESITEM is now the right portion of the join. Leave the Join Type set to Inner join.

The software defines the relationship between the SalesItem and SalesOrder tables by using the key
column Sales_Order_Number. The inner join type generates a join expression based on primary and foreign
keys and column names. The SALES_ORDER_NUMBER column is the primary key in ODS_SLAESORDER
table and the foreign key in the ODS_SALESITEM table. The relationship states that the fields in each table
should match before the record is joined.

The resulting relationship appears in the From clause text box:

SALESITEM.SALES_ORDER_NUMBER = SALESORDER.SALES_ORDER_NUMBER

5. Click the elipses icon next to the Right table name ODS_SALESITEM.

The Smart Editor opens.


6. Place your cursor at the end of the first line and click Enter .
7. Type the following two lines, using the casing as shown:

AND ODS_SALESORDER.ORDER_DATE >= to_date('2007.01.01','yyyy.mm.dd')


AND ODS_SALESORDER.ORDER_DATE <= to_date('2007.12.31','yyyy.mm.dd')

These lines filter the sales orders by date. All orders that are from January 1, 2007 up to and including
December 31, 2007 are moved into the target.

 Tip

As you type the function names, the Smart Editor prompts you with options. Either ignore the prompts
and keep typing or select an option that is highlighted and press Enter . You can alternately double-
click the prompt to accept it.

8. Click OK.

The join conditions that you added in the Smart Editor appear in the Join Condition column and in the
FROM Clause area.
9. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop.

Source table Source column Target column Column description

SALESITEM SALES_ORDER_NUMBER SLS_DOC_NO Sales order number

SALES_LINE_ITEM_ID SLS_DOC_LINE_NO Sales line item num­


ber

MTRL_ID MATERIAL_NO Material ID

PRICE NET_VALUE Order item price

SALESORDER CUST_ID CUST_ID Customer ID

Tutorial
84 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Source table Source column Target column Column description

ORDER_DATE SLS_DOC_DATE Order date

10. Keep the Query Editor open for the next task.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Creating the SalesFact data flow [page 82]

Next task: Using a lookup_ext function for order status [page 85]

8.4 Using a lookup_ext function for order status

Create a Lookup expression to select a column from the ODS_DELIVERY table to include in the SALES_FACT
output table based on two conditions.

Continue configuring the Query transform by setting up a lookup_ext function.

1. Select the ORD_STATUS column in the Schema Out pane.

You haven't mapped the ORD_STATUS column.


2. In the Mapping tab in the lower pane, click Functions....

The Select Function editor opens.


3. In the Function categories column at left, click Lookup Functions.
4. In the Function name column at right, select lookup_ext.
5. Click Next.

The dialog changes with options to define the LOOKUP_EXT() function.


6. The following table contains instructions for completing the LOOKUP_EXT() function.

 Note

We use the following two methods to add expressions in the Select Parameters dialog box:
○ Drag column names into the target columns under Condition, Output, and Order by sections.
○ Click the ellipses button to open the Smart Editor.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 85
Lookup_ext option settings

Option Procedure

Lookup table 1. Select the Lookup table dropdown arrow. The Input
Parameter dialog box opens.
 Note 2. Select Datastore from the Look in dropdown arrow.
The lookup table is where the LOOKUP_EXT() func­ 3. Select ODS_DS and click OK.
tion obtains the value to put into the ORD_STATUS 4. Select the ODS_DELIVERY table and click OK.
column.

Leave Cache spec set to PRE_LOAD_CACHE.

Available parameters 1. Expand the Lookup table node at left and then expand
the ODS_DELIVERY node to expose the columns in
You choose parameters to build conditions from the tables
the table.
that you define here.
2. Expand the Input Schema node and then expand the
ODS_SALESITEM node to expose the columns in the
table.

Tutorial
86 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Option Procedure

Conditions Condition 1:

 Note ODS_DELIVERY.DEL_SALES_ORDER_NUMBER
= ODS_SALESITEM.SALES_ORDER_NUMBER
Conditions identify the rules the software follows to
determine what value to output for the ORD_STATUS 1. Under ODS_DELIVERY, move the
column. DEL_SALES_ORDER_NUMBER column to the
Conditions area under the Column in lookup table col­
umn using drag and drop.
 Note
2. Verify that the operator column, OP.(&), automati­
Set up two conditions for this expression because cally sets to =.
there is a one to many relationship between the 3. Click the ellipses under the Expressions column to
SALES_ORDER_NUMBER column and the open the Smart Editor.
SALES_LINE_ITEM_ID column. For example, the
4. Expand the ODS_SALESITEM node and move the
SALES_ORDER_NUMBER column value PT22221000
SALES_ORDER_NUMBER column to the right side us­
has two SALES_LINE_ITEM_ID values: IT100 and
ing drag and drop.
IT102.
5. Click OK.

Condition 2:

ODS_DELIVERY.DEL_ORDER_ITEM_NUMBER =
ODS_SALESITEM.SALES_LINE_ITEM_ID

The steps are similar to the steps for Condition 1:


1. Move the DEL_ORDER_ITEM_NUMBER column to the
Conditions area under the Column in lookup table col­
umn using drag and drop.
2. Verify that the operator column, OP.(&), automati­
cally sets to =.
3. Click the ellipses under the Expressions column and
expand ODS_SALESITEM.
4. Move SALES_LINE_ITEM_ID to the right using drag
and drop.
5. Click OK.

Output 1. Move the DEL_ORDER_STATUS column from


ODS_Delivery to the Output area under the Column in
 Note lookup table column using drag and drop.

Output parameters specify the column in the Lookup 2. Leave all other options as they are.
table that contains the value to put in the
ORD_STATUS column in the query.

The following image shows the completed Select Parameters dialog box.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 87
The final lookup function displays in the Mapping tab and looks as follows:

lookup_ext([ODS_DS.DBO.ODS_DELIVERY,'PRE_LOAD_CACHE','MAX'],
[DEL_ORDER_STATUS],[NULL],
[DEL_SALES_ORDER_NUMBER,'=',ODS_SALESITEM.SALES_ORDER_NUMBER,DEL_ORDER_ITEM_NU
MBER,'=',ODS_SALESITEM.SALES_LINE_ITEM_ID]) SET
("run_as_separate_process"='no', "output_cols_info"='<?xml version="1.0"
encoding="UTF-8"?><output_cols_info><col index="1" expression="no"/>
</output_cols_info>' )

7. Click Finish.

8. Click the Back icon in the upper toolbar.


9. Save your work.

To look at the expression for ORD_STATUS again, select the ORD_STATUS column from the Schema Out
pane in the Query editor and open the Mapping tab in the options pane.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Tutorial
88 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Previous task: Defining the details of the Query transform [page 83]

Next task: Validating the SalesFact data flow [page 89]

8.5 Validating the SalesFact data flow

Use the skills you obtained from previous exercises to validate the data flow.

1. Select DF_SalesFact in the Project Area.

2. Click Validation Validate Current View .


3. If you followed all of the steps in the exercise correctly, there should be no errors or warnings.

Possible errors could result from an incorrect join condition clause or other syntax error.

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Using a lookup_ext function for order status [page 85]

Next task: Executing the SalesFact job [page 89]

8.6 Executing the SalesFact job

After you have performed the validation step and fixed any errors, the SalesFact job should execute without
errors.

1. Right-click JOB_SalesFact in the Project Area and select Execute.

No error notifications should appear in the status window. You might see a warning notification indicating
that a conversion from a date to datetime value occurred.
2. Accept the settings in the Execution Properties dialog box and click OK.
3. Click DF_SalesFact in the Project Area to open it in the workspace.
4. Click the magnifying-glass icon on the target table SALES_FACT to view 17 rows of data. Compare

 Example

The following diagram shows how all of the tables are related, and breaks up the steps that you
completed in the Query editor to help you understand the relationships of the three tables and why you
set up conditions for the Lookup expression.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 89
Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Tutorial
90 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
Previous task: Validating the SalesFact data flow [page 89]

Next task: Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]

8.7 Viewing Impact and Lineage Analysis for the


SALES_FACT target table

Use the metadata reporting tool to browse reports about metadata associated with the SalesFact job. The
metadata reporting tool is a Web-based application.

View information about the Sales_Fact target table to find out when the table was last updated and used. Also
see the related source tables and column mappings.

1. In Designer, select Tools Data Services Management Console


2. Log in using the same credentials as you used to log in to Designer.

The Management Console main page opens.


3. Click the Impact and Lineage Analysis icon.

A browser opens showing the listed repository information.


4. Click Settings in the upper right corner.

Use the Settings options to make sure that you are viewing the applicable repository and to refresh source
and column data.
5. Check the name in Repository to make sure that it contains the current repository.
6. Open the Refresh Usage Data tab to make sure that it lists the current job server.
7. Click Calculate Column Mapping.

The software calculates the current column mapping and notifies you when it is sucessfully complete.
8. Click Close.
9. In the file tree at left, expand Datastores and then Target_DS to view the list of tables.
10. Expand Data Flow Column Mapping Calculation in the right pane to view the calculation status of each data
flow.
11. Double-click the SALES_FACT table under Target_DS in the file tree.

The Overview tab for SALES_FACT table opens at right. The Overview tab displays general information
about the table such as the table datastore name and the table type.
12. Click the Lineage tab.

The following Lineage tab displays the sources for the SALES_FACT target table. When you move the
pointer over a source table icon, the name of the datastore, data flow, and owner appear.

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 91
13. In the SALES_FACT table in the file tree, double-click the ORD_STATUS column.

The Lineage tab in the right-pane refreshes to show the lineage for the column. For example, you should
see that the SALES_FACT.ORD_STATUS column is related to information in the following source columns:
○ ODS_DELIVERY.DEL_ORDER_STATUS
○ ODS_SALESITEM.SALES_LINE_ITEM_ID
○ ODS_SALESITEM.SALES_ORDER_NUMBER

These relationships show the source columns that you defined in the LOOKUP_EXT() function in Using a
lookup_ext function for order status [page 85].
14. Print the reports by selecting the print option in your browser. For example, for Windows Internet Explorer,
select File Print .

Task overview: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Executing the SalesFact job [page 89]

Next: Summary and what to do next [page 93]

Tutorial
92 PUBLIC Populate the Sales Fact Table from Multiple Relational Tables
8.8 Summary and what to do next

In this section you joined two source tables using a filter, and you used a Lookup expression to add a column
from a related table that was not one of the source tables.

We covered a lot of information in this section. Feel free to go back over the steps, examine the data in the
source and target tables using the magnifying glass icon in the data flow, and look at the example provided to
really understand the results of the settings you made in the query transform.

At this point in the tutorial, you have populated all five tables in the sales data warehouse:

● Sales org dimension from a flat file


● Time dimension using the Date Generation transform
● Customer dimension from a relational table
● Material dimension from a nested XML file
● Sales fact table from two relational tables

The next section shows you how to use the change data capture feature in Data Services.

For more information about Impact and Lineage reports, see the Management Console Guide.

For more information about the Lookup expression, functions, and filters, see the Designer Guide.

Parent topic: Populate the Sales Fact Table from Multiple Relational Tables [page 81]

Previous task: Viewing Impact and Lineage Analysis for the SALES_FACT target table [page 91]

Related Information

Changed data capture [page 94]

Tutorial
Populate the Sales Fact Table from Multiple Relational Tables PUBLIC 93
9 Changed data capture

Changed data capture (CDC) extracts only new or modified data after you process an initial load of the data to
the target system.

In this exercise we create two jobs:

● Initial load job: A job that loads all of the rows from a source to a target.
● Delta load job: A job that uses CDC to load changed and new data to the target table.

Both jobs contain the following objects:

● Initialization script that sets date values for global variables named $GV_STARTTIME and $GV_ENDTIME.
● Data flow that loads only the rows changed or added between the $GV_STARTTIME and $GV_ENDTIME
● Termination script that updates a database table that stores the last $GV_ENDTIME

In the initial load job, the software establishes a baseline by assigning the date and time for each row in the data
source. In the delta load job, the software determines which rows are new or changed based on the last date
and time data.

The target database contains a job status table called CDC_time. The software stores the last date and time
data for each row in CDC_time. When you execute the delta load job, it updates that date and time for the next
execution.

For complete information about CDC, see the Designer Guide.

Global variables [page 95]


The initial job contains the usual objects that other jobs contain, but it also serves as a baseline for the
source data by using global variables.

Adding the initial load job and defining global variables [page 95]
Create a job and then create two global variables for the job. The global variables serve as placeholders
for job execution start and end time stamps.

Replicating the initial load data flow [page 100]


Use the replicate feature to build the delta data flow more quickly.

Building the delta load job [page 101]


Build the job JOB_CDC_Delta by adding the WF_CDC_Delta workflow and two new scripts.

Execute the initial and delta load jobs [page 103]


After you carefully set up both the initial load job and the delta load job, it is time to execute them.

Summary and what to do next [page 105]

Tutorial
94 PUBLIC Changed data capture
9.1 Global variables

The initial job contains the usual objects that other jobs contain, but it also serves as a baseline for the source
data by using global variables.

Global variables are global within a job only. For example, you create a global variable while you set up a specific
job, and the variable is not available for other jobs.

Global variables provide you with maximum flexibility at runtime. For example, during production you can
change default values for global variables at runtime from a job's schedule or SOAP call without having to open
a job in the Designer.

For this exercise, you set values for global variables in script objects. You can also set values for global variables
in external jobs, job execution, or job schedule properties. For complete information about using global
variables in Data Services, see the Designer Guide.

Parent topic: Changed data capture [page 94]

Related Information

Adding the initial load job and defining global variables [page 95]
Replicating the initial load data flow [page 100]
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Summary and what to do next [page 105]

9.2 Adding the initial load job and defining global variables

Create a job and then create two global variables for the job. The global variables serve as placeholders for job
execution start and end time stamps.

1. Create a new batch job in the Class_Exercises project and name it JOB_CDC_Initial.

2. With the job name selected in the Project Area, select Tools Variables .

The Variables and Parameters editor opens. It displays the job name in the Context header.
3. Right-click Global Variables and click Insert.

The editor adds a variable under Global Variables.


4. Double-click the new variable to open the Global Variable Properties dialog box.
5. Enter $GV_STARTTIME in Name.
6. Select datetime from the Data type dropdown arrow.
7. Click OK.

Tutorial
Changed data capture PUBLIC 95
8. Create another global variable following the same steps. Name the second global variable $GV_ENDTIME
and set the Data Type to datetime.
9. Close the Variables and Parameters editor by clicking the X in the upper right corner.
10. Save your work.

Task overview: Changed data capture [page 94]

Related Information

Global variables [page 95]


Replicating the initial load data flow [page 100]
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Summary and what to do next [page 105]
Adding a new job [page 39]

9.2.1 Adding a workflow, scripts, and data flow

Use scripts in the job to assign values to the global variables that you just created.

1. Select the JOB_CDC_Initial job in the Project Area to open it in the workspace.
2. Add a new workflow to the job using the tool pallet and name it WF_CDC_Initial.
3. Click WF_CDC_Initial in the Project Area to open it in the workspace.

4. Click the script icon in the tool palette and click in the workspace to add a script.
5. Name the script SET_START_END_TIME.
6. Add a data flow from the tool pallet to the workflow workspace and name it DF_CDC_Initial.
7. Add another script to the right of the data flow object using the tool pallet.
8. Name the script UPDATE_CDC_TIME_TABLE.
9. Connect the objects to set the direction of the data flow.

10. Save your work.

1. Defining the scripts [page 97]

Tutorial
96 PUBLIC Changed data capture
Designate values for the global variables and add functions to instruct how the job should output data.
2. Defining the data flow [page 98]
Add a query and a target template table to the data flow.
3. Defining the QryCDC query [page 99]
Map the output schema in the QryCDC query and add a function that checks the date and time.

Related Information

Adding a workflow [page 39]


Adding a data flow [page 40]

9.2.1.1 Defining the scripts

Designate values for the global variables and add functions to instruct how the job should output data.

When you define scripts, make sure that you follow your database management syntax rules. For more
information about creating scripts in Data Services, see the Designer Guide.

Before you define the scripts, check the date and time in the existing database to make sure you use a date in
the script that includes all of the records.

● Open the Datastores tab in the Local Object Library and expand the Tables node under ODS_DS.
● Double-click the ODS_CUSTOMER source table to open the editor in the workspace.
● Open the View Data tab in the lower pane to see a sample of the source data.
● Look in the CUST_TIMESTAMP column and see that the timestamp for all records is 2008.03.27 00:00:00.
● Close the editor.

1. Expand the WF_CDC_Inital node in the Project Area and click the SET_START_END_TIME script to open
it in the workspace.
2. Enter the following script directly in the text area using the syntax applicable for your database. As you
start to type the global variable name, a dropdown list of variables appears. Double-click the variable name
from the list to add it to the string.

Using Microsoft SQL Server script, type the following text:

$GV_STARTTIME = '2008.01.01 00:00:000';


$GV_ENDTIME = sysdate();

The script does the following:


○ Sets the $GV_STARTTIME global variable as datetime 2008.01.01 00:00:000
○ Captures all rows in the source data because we included a start time set before the existing
CUST_TIMESTAMP values in the database.
○ Sets the $GV_ENDTIME global variable as datetime sysdate(), which populates the job status table,
CDC_TIME, with the date and time of the job execution.
3. Click the Validate icon in the upper menu to validate the script.
Fix any syntax errors and revalidate if necessary.

Tutorial
Changed data capture PUBLIC 97
4. Close the script editor for SET_START_END_TIME.
5. Select the UPDATE_CDC_TIME_TABLE script in the Project Area to open the script editor in the workspace.
6. Enter the following script directly in the text area using the syntax applicable for your database.

For Microsoft SQL Server, type the following text:

sql('Target_DS', 'DELETE FROM ODS.CDC_TIME');


sql('Target_DS', 'INSERT INTO ODS.CDC_TIME VALUES ({$GV_ENDTIME})');

For Oracle, type the following text:

sql('Target_DS', 'DELETE FROM TARGET.CDC_TIME');


sql('Target_DS', 'INSERT INTO TARGET.CDC_TIME VALUES (to_date({$GV_ENDTIME},
\'YYYY.MM.DD HH24:MI:SS\'))');

 Note

Our example assumes that the user name is ODS. Ensure that you use the applicable user name that
you defined for the table. For example, the file name ODS.CDC_TIME indicates that the owner is ODS. If
the owner name is DBO, type the name as DBO.CDC_TIME.

The script resets the $GV_ENDTIME value in the CDC_TIME job status table with a new end time based on
when the software completes the job execution.
7. Click the Validate icon to validate the script.
You may receive a validation warning that the DATETIME data type will be converted to VARCHAR. Data
Services always preserves the data type in output schema, so ignore the warning.
8. Close the UPDATE_CDC_TIME_TABLE script file and save your work.

Task overview: Adding a workflow, scripts, and data flow [page 96]

Next task: Defining the data flow [page 98]

9.2.1.2 Defining the data flow

Add a query and a target template table to the data flow.

With a target template table, you do not have to specify the table schema or import metadata. Instead, during
job execution, Data Services has the DBMS create the table with the schema defined by the data flow.

Template tables appear in the Local Object Library under each datastore.

1. Open the DF_CDC_Initial data flow in the workspace.


2. Open the Datastores tab in the Local Object Library and expand the Tables node under ODS_DS.
3. Add the ODS_Customer table to the left side of the workspace using drag and drop and click Make Source.
4. Add a query from the tool pallet to the right of the source table in the workspace using drag and drop.
Name the query QryCDC.
5. Expand Target_DS in the Datastores tab in the Local Object Library.

Tutorial
98 PUBLIC Changed data capture
6. Add the Template Tables icon to the right of the query in the workspace using drag and drop.

The Template editor opens.


7. Type CUST_CDC in Template name and click OK.

The software automatically completes the Owner name.


8. Connect the objects from left to right to designate the flow of data.
9. Open the target table CUST_CDC in the workspace to define it.
10. Open the Options tab and select Delete data from table before loading.

The Drop and re-create table option is selected by default. You can leave the option selected.
11. Click the back arrow icon in the upper toolbar to close the target table definition and return to the data flow
workspace.
12. Save your work.

Task overview: Adding a workflow, scripts, and data flow [page 96]

Previous task: Defining the scripts [page 97]

Next task: Defining the QryCDC query [page 99]

9.2.1.3 Defining the QryCDC query

Map the output schema in the QryCDC query and add a function that checks the date and time.

1. Open QryCDC in the workspace.


2. Drag the following columns from the Schema In pane to the Schema Out pane:
○ CUST_ID
○ CUST_CLASSF
○ NAME1
○ ZIP
○ CUST_TIMESTAMP
3. Open the Where tab in the lower pane and enter the following query statement using the syntax applicable
for your database.

Type the following text for Microsoft SQL Server:

(ODS_CUSTOMER.CUST_TIMESTAMP >= $GV_STARTTIME) and


(ODS_CUSTOMER.CUST_TIMESTAMP <= $GV_ENDTIME)

 Note

You can drag the column CUSTOMER.CUST_TIMESTAMP from the Schema In pane to the Where tab, or
you can select the table name and the column name from the list that appears as you start to type the
script. The software also offers suggestions to choose from when you type the global variable name.

4. Click the Validate icon on the toolbar to validate the query statement.

Tutorial
Changed data capture PUBLIC 99
Fix any syntax errors and revalidate if necessary.
5. Select JOB_CDC_Initial in the Project Area and click the Validate All icon on the toolbar.

Fix any errors and revalidate if necessary. Ignore the warning about DATETIME converted to VARCHAR.
6. Save your work.

Task overview: Adding a workflow, scripts, and data flow [page 96]

Previous task: Defining the data flow [page 98]

9.3 Replicating the initial load data flow


Use the replicate feature to build the delta data flow more quickly.

1. Open the Data Flows tab in the Local Object Library.


2. Right-click DF_CDC_Initial and click Replicate.

A new data flow appears in the Data Flow list with “Copy_1” added to the workflow name.
3. Rename the copy dataflow DF_CDC_Delta.
4. Double-click the DF_CDC_Delta data flow to open it in the workspace.
5. Double-click CUST_CDC target table in the workspace to open the Template Target Table editor.
6. Open the Options tab in the lower pane.
7. Deselect the following options: Delete data from table before loading and Drop and re-create table.

The following table explains these options.

Option What deselecting does

Delete data from table before loading Does not delete any of the data in the CUST_CDC target
table before loading changed data to it from the
JOB_CDC_Delta job.

Drop and re-create table Does not drop the existing CUST_CDC table and create a
new table with the same name. Preserves the existing
CUST_CDC table.

8. Save your work.

Task overview: Changed data capture [page 94]

Related Information

Global variables [page 95]

Tutorial
100 PUBLIC Changed data capture
Adding the initial load job and defining global variables [page 95]
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Summary and what to do next [page 105]

9.4 Building the delta load job

Build the job JOB_CDC_Delta by adding the WF_CDC_Delta workflow and two new scripts.

1. Click JOB_CDC_Delta in the Project Area to open it in the workspace.


2. Add a workflow from the tool pallet to the workspace.
3. Name the workflow WF_CDC_Delta.
4. Open the work flow in the workspace.
5. Add a script from the tool palette to the left side of the WF_CDC_Delta workspace.
6. Name the script SET_NEW_START_END_TIME.
7. Open the Data Flows tab in the Local Object Library and move DF_CDC_Delta to the right of the
SET_NEW_START_END_TIME script in the workspace.
8. Click the script icon in the tool palette and add it to the right of DF_CDC_Delta in the WF_CDC_Delta
workspace.
9. Name the script UPDATE_CDC_TIME_TABLE.
10. Connect the objects in the workspace from left to right.
11. Save your work.

Adding the job and defining the global variables [page 102]
Create global variables specifically for the delta-load job.

Defining the scripts [page 102]

Task overview: Changed data capture [page 94]

Related Information

Global variables [page 95]


Adding the initial load job and defining global variables [page 95]
Replicating the initial load data flow [page 100]
Execute the initial and delta load jobs [page 103]
Summary and what to do next [page 105]

Tutorial
Changed data capture PUBLIC 101
9.4.1 Adding the job and defining the global variables

Create global variables specifically for the delta-load job.

1. Create a new batch job for Class_Exercises and name the job JOB_CDC_Delta.

2. Select the JOB_CDC_Delta job name in the Project Area and click Tools Variables .

The Variables and Parameters editor opens. Notice that the job name displays in the Context box.
3. Right-click Global Variables and click Insert.
4. Double-click the new variable to open the editor. Rename the variable $GV_STARTTIME.
5. Select datetime for the Data type and click OK.
6. Follow these steps to create the global variable $GV_ENDTIME.
7. Close the Variables and Parameters dialog box and save your work.

 Note

Because you create global variables for a specific delta-load job, the software does not consider them
as duplicates to the variables that you created for the initial-load job.

9.4.2 Defining the scripts

Define the $GV_STARTTIME and $GV_ENDTIME global variables in the scripts.

Open the WF_CDC_Delta workflow in the workspace.

1. Double-click the SET_NEW_START_END_TIME script in the workspace to open the script editor.
2. Enter the following text directly in the text area using the syntax applicable for your database.

For MS SQL Server type the following text:

$GV_STARTTIME = to_date(sql('Target_DS', 'SELECT LAST_TIME FROM


ODS.CDC_TIME'), 'YYYY-MM-DD HH24:MI:SS');
$GV_ENDTIME = sysdate();

For Oracle, type the following text:

$GV_STARTTIME = sql('Target_DS','SELECT to_char(LAST_TIME,\'YYYY.MM.DD


HH24:MI:SS\') FROM TARGET.CDC_TIME');
$GV_ENDTIME = sysdate();

This script defines the start time global variable to be the last date and time stamp recorded in the
CDC_Time table. The end time global variable equals the system date.
3. Validate the script by clicking the Validate icon. Fix any syntax errors.
4. Select the UPDATE_CDC_TIME_TABLE script in the Project Area to open the script editor in the workspace.
5. Enter the following text directly in the text area using the syntax applicable for your database.

For MS SQL Server, type the following text:

sql('Target_DS', 'UPDATE ODS.CDC_TIME SET LAST_TIME ={$GV_ENDTIME}');

Tutorial
102 PUBLIC Changed data capture
For Oracle, type the following text:

sql('Target_DS', 'INSERT INTO TARGET.CDC_TIME VALUES (to_date({$GV_ENDTIME},


\'YYYY.MM.DD HH24:MI:SS\'))');

This script replaces the end time in the CDC_Time table to the end time for the Delta job.
6. Validate the script and fix any syntax errors.

Ignore any warnings about the datetime data type being converted to VarChar.
7. Click the JOB_CDC_delta job name in the Project Area and click the Validate All icon. Correct any errors,
and ignore any warnings for this exercise.
8. Save your work.

9.5 Execute the initial and delta load jobs

After you carefully set up both the initial load job and the delta load job, it is time to execute them.

Execute the initial load job named JOB_CDC_Initial and view the results by looking at the sample data.
Because this is the initial job, the software should return all of the rows.

To see how CDC works, open the source data in your DBMS and alter the data as instructed. Then execute the
JOB_CDC_Delta job. The job extracts the changed data and updates the target table with the changed data.
View the results to see the different time stamps and to verify that only the changed data was loaded to the
target table.

Executing the initial load job [page 104]


The initial load job outputs all of the source data to the target table, and adds the job execution date
and time to the data.

Changing the source data [page 104]


To see CDC in action, change the CUSTOMER table and execute the delta load job.

Executing the delta-load job [page 105]

Task overview: Changed data capture [page 94]

Related Information

Global variables [page 95]


Adding the initial load job and defining global variables [page 95]
Replicating the initial load data flow [page 100]
Building the delta load job [page 101]
Summary and what to do next [page 105]

Tutorial
Changed data capture PUBLIC 103
9.5.1 Executing the initial load job

The initial load job outputs all of the source data to the target table, and adds the job execution date and time to
the data.

Use your DBMS to view the data in the ODS_CUSTOMER table. There are a total of 12 rows. The columns are the
same as the columns that appear in the Schema In pane in the QryCDC object.

1. In Data Services, right click JOB_CDC_Initial and click Execute.


2. Leave all of the default settings In the Execution Properties window and click OK.

If the job fails, click the error icon to read the error messages. You can have script errors even if your script
validation was successful.

3. After successful execution, click the monitor icon and view the Row Count column to determine how
many rows were loaded into the target table. The job should return all 12 rows.

You can also check this row count by opening the data flow and clicking the View Data icon (magnifying
glass) on the target table.

9.5.2 Changing the source data

To see CDC in action, change the CUSTOMER table and execute the delta load job.

Use your DBMS to edit the CUSTOMER table.

1. Add a row to the table with the following data.

 Note

If your database does not allow nulls for some fields, copy the data from another row.

Column name Value

Cust_ID ZZ01

Cust_Classf ZZ

Name1 EZ BI

ZIP ZZZZZ

Tutorial
104 PUBLIC Changed data capture
Column name Value

Cust_Timestamp A date and time that is equal to or later than the time shown in the CDC_TIME
status table after the initial job execution. To be sure that you enter a valid time,
look at the value in the CDC_TIME table using your DBMS.

For example, if the initial-load job execution time is 2016-04-25 12:25:00,


then that is the data that appears in the CDC_TIME table. Make sure that you en­
ter a time for the new record that is equal to or later than that time. So you could,
for this example, enter a time of 2016-04-25 12:30:00 for the new record.

2. Save the table.

9.5.3 Executing the delta-load job

1. In Data Services, execute JOB_CDC_Delta.


2. View the results by opening the monitor log. The row count should be 1.
3. Open the data flow editor and view the data in the target table. The new row should appear in the target
data after the 12 rows that JOB_CDC_Initial loaded.

9.6 Summary and what to do next

In this section you:

● Learned one way to capture changed data


● Used global variables
● Used a template table
● Worked with a job status table (CDC_time) and scripts
● Populated global variables in a script

You can now go on to the next section, which describes how to verify and improve the quality of your source
data.

For more information about the topics covered in this section, see the Designer Guide.

Parent topic: Changed data capture [page 94]

Related Information

Global variables [page 95]


Adding the initial load job and defining global variables [page 95]
Replicating the initial load data flow [page 100]

Tutorial
Changed data capture PUBLIC 105
Building the delta load job [page 101]
Execute the initial and delta load jobs [page 103]
Data Assessment [page 107]

Tutorial
106 PUBLIC Changed data capture
10 Data Assessment

Use data assessment features to identify problems in your data, separate out bad data, and audit data to
improve the quality and validity of your data.

Data Assessment provides features that enable you to trust the accuracy and quality of your source data.

The exercises in this section introduce the following Data Services features:

● Data profiling that pulls specific data statistics about the quality of your source data.
● Validation transform in which you apply your business rules to data and separate bad data from good data.
● Audit dataflow that outputs invalid records to a separate table.
● Auditing tools in the Data Services Management Console.

Default profile statistics [page 108]


The Data Profiler executes on a profiler server to provide column and relationship information about
your data.

Viewing profile statistics [page 109]


Use profile statistics to determine the quality of your source data before you extract, transform, and
load it.

The Validation transform [page 110]


The Validation transform qualifies a data set based on rules for input schema columns.

Audit objects [page 114]


Use audit objects to set up a job that indicates if there are bad records in your data that do not pass
validation rules.

Viewing audit details in Operational Dashboard reports [page 117]


View audit details, such as an audit rule summary and audit labels and values, in the SAP Data Services
Management Console.

Summary and what to do next [page 119]


In this exercise you learned a few methods to view data profile information, set audit rules, and assess
your output data.

Related Information

Designer Guide: Data Assessment

Tutorial
Data Assessment PUBLIC 107
10.1 Default profile statistics

The Data Profiler executes on a profiler server to provide column and relationship information about your data.

The software reveals statistics for each column that you choose to evaluate. The following table describes the
default statistics.

Statistic Description

Column The Data Profiler provides two types of column profiles:

● Basic profiling. Minimum value, maximum value, average value, minimum string length,
and maximum string length.
● Detailed profiling. Distinct count, distinct percent, median, median string length, pat­
tern count, and pattern percent.

Distincts The total number of distinct values out of all records for the column.

Nulls The total number of NULL values out of all records in the column.

Min The minimum value in the column out of all records.

● If the column contains alpha data, the minimum value is the string that comes first al­
phabetically.
● If the column contains numeric data, the minimum value is the lowest numeral in the
column.
● If the column contains alphanumeric data, the minimum value is the string that comes
first alphabetically and lowest numerically.

Max The maximum value in the column out of all records.

● If the column contains alpha data, the maximum value is the string that comes last al­
phabetically.
● If the column contains numeric data, the maximum value is the highest numeral in the
column.
● If the column contains alphanumeric data, the maximum value is the string that comes
last alphabetically and highest numerically.

Timestamp The time that the statistic is calculated.

Parent topic: Data Assessment [page 107]

Related Information

Viewing profile statistics [page 109]


The Validation transform [page 110]
Audit objects [page 114]

Tutorial
108 PUBLIC Data Assessment
Viewing audit details in Operational Dashboard reports [page 117]
Summary and what to do next [page 119]
Designer Guide: Data Assessment, Using the Data Profiler

10.2 Viewing profile statistics

Use profile statistics to determine the quality of your source data before you extract, transform, and load it.

1. Open the Datastores tab in the Local Object Library and expand ODS_DS Tables .
2. Right-click the ODS_CUSTOMER table and select View Data.

The View Data dialog opens.

3. Click the Profile tab icon , which is the second tab from the left in View Data.

The Profile tab opens showing all of the columns in the table, and the basic column profile information.
4. Select the checkboxes next to the following column names:

○ CUST_ID
○ CUST_CLASSF
○ NAME1
○ ZIP
○ CUST_TIMESTAMP

These columns are the columns that you worked with in the previous exercise.
5. Click Update.

The profile statistics appear for the selected columns. The columns that were not selected have “<Blank>”
in the statistics columns.
6. After you examine the statistics, close the View Data dialog.

For this exercise, we comply with a business rule from a fictional company. The rule requires that a target ZIP
column contain numeric data. In the last exercise for changed data capture, you added a new row of data with a
ZIP column value of “ZZZZZ”. Now we set up a validation job that changes that value to blank.

Tutorial
Data Assessment PUBLIC 109
Task overview: Data Assessment [page 107]

Related Information

Default profile statistics [page 108]


The Validation transform [page 110]
Audit objects [page 114]
Viewing audit details in Operational Dashboard reports [page 117]
Summary and what to do next [page 119]
The Validation transform [page 110]
Creating a validation job [page 111]

10.3 The Validation transform

The Validation transform qualifies a data set based on rules for input schema columns.

The Validation transform can output up to three data outputs: Pass, Fail, and RuleViolation. Data outputs are
based on the condition that you specify in the transform. You set the data outputs when you connect the
output of the Validation transform with a Pass object and a Fail object in the workspace.

For this exercise, we set up a Pass target table for the first job execution. Then we alter the first job by adding a
Fail target table with audit rules.

Creating a validation job [page 111]


Use a Validation transform in a job to define business rules that sort good data from bad data.

Adding a job and data flow [page 111]


Create a job and add a data flow to start the validation and auditing process.

Configuring the Validation transform [page 112]


Create a validation rule to find records that contain data in the ZIP column that does not comply with
your format rules.

Parent topic: Data Assessment [page 107]

Related Information

Default profile statistics [page 108]


Viewing profile statistics [page 109]
Audit objects [page 114]
Viewing audit details in Operational Dashboard reports [page 117]

Tutorial
110 PUBLIC Data Assessment
Summary and what to do next [page 119]

10.3.1 Creating a validation job


Use a Validation transform in a job to define business rules that sort good data from bad data.

The following steps outline the tasks involved in this exercise.

1. Create a new job.


2. Add a data flow to the job that contains a Validation transform.
3. Set up audit rules based on profile statistics in the Validation transform.
4. Apply audit rules to the data flow so you can view audit details.

10.3.2 Adding a job and data flow


Create a job and add a data flow to start the validation and auditing process.

1. Add a new job to the Class_Exercises project. Name the job JOB_CustGood.
2. With the job opened in the workspace, add a Data Flow icon to the job from the tool pallet. Name the data
flow DF_CustGood.

3. With DF_CustGood opened in the workspace, expand ODS_DS Tables in the Datastore tab in the
Local Object Library.
4. Move the ODS_CUSTOMER table to the DF_CustGood data flow in the workspace using drag and drop.
5. Select Make Source.
6. Open the Transform tab In the Local Object Library and expand the Platform node.
7. Move the Validation icon for the Validation transform to the right of the ODS_CUSTOMER table in the data
flow using drag and drop.

The software creates a copy of the Validation transform in the workspace.

8. Expand TARGET_DS Template Tables in the Datastores tab of the Local Object Library.
9. Move the Template Tables icon to the right of the Validation transform in the workspace using drag and
drop.
10. In the Create Template dialog box, name the template table Cust_Good.
11. Connect the ODS_CUSTOMER source table to the Validation transform.
12. Connect the Validation transform to the CUST_GOOD target table and select Pass from the dialog box.

The Pass option requires that the software pass all rows, even rows that fail the validation rules, to the
target table.

Tutorial
Data Assessment PUBLIC 111
13. Save your work.

Related Information

Adding a new job [page 39]

10.3.3 Configuring the Validation transform

Create a validation rule to find records that contain data in the ZIP column that does not comply with your
format rules.

1. With DF_CustGood open in the workspace, double-click the Validation transform to open the Transform
Editor.
2. Open the Validation Rules tab in the lower pane and click Add located in the upper right of the pane.
The Rule Editor dialog box opens.
3. Create a rule that requires data in the ZIP column to contain 5-digit stings. Complete the options as
instructed in the following table.

Option Instruction

Name Enter 5_Digit_ZIP_Column_Rule.

Description Optional. Enter a description such as ZIP values


must have a 5-digit format.

Enabled Select.

 Note
When you enable a validation rule for a column, a
check mark appears next to the column in the
Schema in pane.

Action on Fail Select Send to Pass from the dropdown list.

 Note
Send to Pass causes the software to pass the row to
the target table even if it fails the
5_Digit_ZIP_Column_Rule rule.

Column Validation Select.

Tutorial
112 PUBLIC Data Assessment
Option Instruction

Column dropdown Select ODS_Customer.ZIP.

Condition dropdown list Select Match Pattern.

Condition text box Type 99999.

 Note
This condition causes the software to check that the
ZIP column contains 5-digit strings.

4. Click OK.

The rule appears in the Rules list of the Rule Editor pane.
5. In the Rule Editor, select the checkbox under the Enabled column in the lower pane under If any rule fails
and Send to Pass, substitute with.
6. Double-click the cell next to the checked Enabled cell, under Column.

A dropdown list appears showing column names.


7. Select ODS_Customer.ZIP.
8. Type '' (two single quotes with no space between) in the cell under Expression.

This substitutes “<Blank>” for the ZIP values that do not pass the rule because they do not contain a 5-
digit string in the ZIP column.

Tutorial
Data Assessment PUBLIC 113
9. Click the Validate icon in the Designer tool bar to verify that the Validation transform does not have syntax
errors.
10. Execute the Job_CustGood job.

Open DF_CustGood in the workspace. Click the magnifying-glass icons on the ODS_CUSTOMER source table
and the CUST_GOOD target table to view the data in the lower pane. The value in the ZIP column for the record
CUST_ID ZZ01 shows “<Blank>” in the target table.

Related Information

Audit objects [page 114]

10.4 Audit objects


Use audit objects to set up a job that indicates if there are bad records in your data that do not pass validation
rules.

Collect audit statistics on the data that flows out of any object, such as a source, transform, or target.

In the next exercise, we set up the validation job JOB_CustGood to output failed records to a fail target table
instead of to the pass target table named CUST_GOOD.

For this exercise, we define the following objects in the Audit dialog box:

Object Description

Audit point The object in a data flow where you collect audit statistics.

Audit function The audit statistic that the software collects for the audit
points. For this exercise, we set up a Count audit function on
the source and pass target tables. Count collects the follow­
ing statistics:

● Good count for rows that pass the rule.


● Error count for rows that do not pass the rule.

Audit label The unique name in the data flow that the software gener­
ates for the audit statistics collected for each audit function
that you define.

Audit rule A Boolean expression in which you use audit labels to verify
the job.

Audit action on failure Action the software takes when there is a failure.

For a complete list of audit objects and descriptions, see the Designer Guide.

Adding a fail target table to the data flow [page 115]

Tutorial
114 PUBLIC Data Assessment
Add an additional target table to the DF_CustGood data flow to contain the records that fail the rule.

Creating audit functions [page 116]


Create an audit function in this exercise for failed data.

Parent topic: Data Assessment [page 107]

Related Information

Default profile statistics [page 108]


Viewing profile statistics [page 109]
The Validation transform [page 110]
Viewing audit details in Operational Dashboard reports [page 117]
Summary and what to do next [page 119]

10.4.1 Adding a fail target table to the data flow

Add an additional target table to the DF_CustGood data flow to contain the records that fail the rule.

1. Open DF_CustGood in the workspace.


2. Double-click the Validation transform in the data flow to open the Transform Editor.
3. Select the ZIP column in the Schema In pane.
4. Open the Validation Rules tab in the lower pane and double-click the rule name match_pattern to open
the Rule Editor.
5. Select Send To Fail from the Action on Fail dropdown list and click OK.

Send to Fail causes Data Services to send the rows that fail the 5_Digit_ZIP_Column_Rule to a fail target
table.
6. Close the Transform Editor.

7. With the DF_CustGood data flow open in the workspace, expand TARGET_DS Template Tables .
8. Move the Template Tables icon to the dataflow using drag and drop.
9. Enter Cust_Bad_Format in Template name in the Create Template dialog box. Click OK.
10. Draw a connection from the Validation transform to the Cust_Bad_Format target table and select the Fail
option.
Your data flow should look as follows:

Tutorial
Data Assessment PUBLIC 115
11. Save your work.

Continue with the exercise by creating audit functions.

10.4.2 Creating audit functions

Create an audit function in this exercise for failed data.

The following steps set up two rule expressions in the DF_CustGood: One for the source table ODS_CUSTOMER
and one for pass target table CUST_GOOD.

1. With DF_CustGood opened in the workspace, click the Audit icon. in the upper tool bar.

The Audit pane opens.


2. In the Label tab, right-click ODS_CUSTOMER and select the Count audit function.
3. Right-click CUST_GOOD and select the Count audit function.
The Designer automatically generates the audit labels $Count_ODS_CUSTOMER for ODS_CUSTOMER and
$Count_CUST_GOOD for CUST_GOOD in the Audit Label column.
4. Open the Rule tab of the Audit pane.
5. Click Add in the upper right to enable the Expression Editor.
6. There are three text boxes in the Expression Editor to create a Boolean expression. Select values from each
dropdown list as instructed in the following table. Create the following Boolean expression for each table.

Text box selections for expressions

Text box ODS_CUSTOMER CUST_GOOD

1 $Count_ODS_CUSTOMER $Count_CUST_GOOD

2 = =

3 $CountError_ODS_CUSTOMER $CountError_CUST_GOOD

7. In the Action on failure group at the right of the pane, deselect Raise exception.

If there is an exception, the job stops processing. We want the job to continue processing so we deselect
Raise exception.

Tutorial
116 PUBLIC Data Assessment
8. Click Close to close the Audit pane.

Notice that the DF_CustGood data flow indicates the audit points by displaying the audit icon on the right
side of the ODS_Customer source table and Cust_Good target table. The audit points are where the
software collects audit statistics.

9. Click the Validate All icon in the upper tool bar to verify that there are no errors.
10. Save your work.
11. Execute Job_CustGood.

Related Information

Viewing audit details in Operational Dashboard reports [page 117]

10.5 Viewing audit details in Operational Dashboard reports

View audit details, such as an audit rule summary and audit labels and values, in the SAP Data Services
Management Console.

When your administrator installs SAP Data Services, they also install the Management Console. If you do not
have access credentials, contact your system administrator. For more information about the Management
Console, see the Management Console Guide.

1. In the Designer menu bar, click Tools Data Services Management Console .
2. Log in to Management Console using your access credentials.

The Management Console home page opens.


3. Click Operational Dashboard in the home page.

The Dashboard opens with statistics and data. For more information about the Dashboard, see the
Management Console Guide.
4. Select the JOB_CustGood job in the table at right.

If the job doesn't appear in the table, adjust the Time Period dropdown list to a longer or shorter time
period as applicable.

Tutorial
Data Assessment PUBLIC 117
The Job Execution Details pane opens showing the job execution history and a chart depicting the
execution history.
5. Select the JOB_CustGood in the Job Execution History table.

The Job Details table opens.


6. Select DF_CustGood in the Data Flow Name column.

The data flow should contain Yes in the Contains Audit Data column.
Three graphs appear at right: Buffer Used, Row Processed, and CPU Used. Read about these graphs in the
Management Console Guide.
7. Select View Audit Data located just above the View Audit Data table.
The Audit Details window opens containing the following information:

The following table explains the Audit Details pane.

Audit Details Information

Audit Rule Summary shows the audit rules that you created in the Validation
transform.

Audit Rule Failed The audit rule that was violated after data processing.

$Count_ODS_CUSTOMER The row count = 13

$Count_CUST_GOOD The row count = 12

The validation rule required that all records comply with the 5_Digit_ZIP_Column_Rule. One record failed the
rule. The audit rules that you created requires that the row count be equal. One row failed the validation rule.
The details show that the values are not equal: 13 > 12.

View audit results [page 119]


After the job executes, open the fail target table to view the failed record.

Task overview: Data Assessment [page 107]

Tutorial
118 PUBLIC Data Assessment
Related Information

Default profile statistics [page 108]


Viewing profile statistics [page 109]
The Validation transform [page 110]
Audit objects [page 114]
Summary and what to do next [page 119]
View audit results [page 119]

10.5.1 View audit results

After the job executes, open the fail target table to view the failed record.

Open the data flow in the workspace and click the magnifying icon in the lower right corner of the
CUST_BAD_FORMAT target table. The CUST_BAD_FORMAT target table contains one record. In addition to the
fields selected for output, the software added and populated three additional fields for error information:

● DI_ERRORACTION = F
● DI_ERRORCOLUMNS = Validation failed rule(s): ZIP
● DI_ROWID = 1.000000

These are the rule violation output fields that are automatically included in the Validation transform. For
complete information about the Validation transform, see the Reference Guide.

10.6 Summary and what to do next

In this exercise you learned a few methods to view data profile information, set audit rules, and assess your
output data.

There are many more ways to use the software for data assessment. Learn more about profiling your data and
data assessment in the Designer Guide. The following lists the methods that we used in this exercise to profile
and audit data details:

● View table data and use the profile tools to view the default profile statistics.
● Use the Validation transform in a data flow to find records in your data that violate a data format
requirement in the ZIP field.
● Add an additional template table for records that fail an audit rule.
● Create an audit expression and an action for when a record fails the expression.
● View audit details in the Management Console Operational Dashboard reports

Tutorial
Data Assessment PUBLIC 119
The next section shows how to design jobs that are recoverable if the job malfunctions, crashes, or does not
complete.

Parent topic: Data Assessment [page 107]

Related Information

Default profile statistics [page 108]


Viewing profile statistics [page 109]
The Validation transform [page 110]
Audit objects [page 114]
Viewing audit details in Operational Dashboard reports [page 117]
Designer Guide: Data Assessment
Recovery Mechanisms [page 121]

Tutorial
120 PUBLIC Data Assessment
11 Recovery Mechanisms

Use Data Services recovery mechanisms to set up automatic recovery or to recover jobs manually that do not
complete successfully.

A recoverable work flow is one that can run repeatedly after failure without loading duplicate data. Examples of
failure include source or target server crashes or target database errors that could cause a job or work flow to
terminate prematurely.

In the following exercise, you learn how to:

● Design and implement recoverable work flows.


● Use Data Services conditionals.
● Specify and use the table auto correction option.

This exercise creates a recoverable job that loads the sales organization dimension table that was loaded in the
section Populate the Sales Organization dimension from a flat file [page 37]. You reuse the data flow
DF_SalesOrg from that exercise to complete this exercise.

Recoverable job [page 122]


Create a job that contains three objects that are configured so that the job is recoverable.

Creating local variables [page 122]


Local variables contain information that you can use in a script to determine when a job must be
recovered.

Creating the script that determines the status [page 123]


Create a script that checks the $end_time variable to determine if the job completed properly.

Conditionals [page 124]


Use conditionals to implement if-then-else logic in a work flow.

Creating the script that updates the status [page 126]


This script updates the status_table table with the current timestamp after the work flow in the
conditional has completed. The timestamp indicates a successful execution.

Verify the job setup [page 128]


Make sure that job configuration for JOB_Recovery is complete by verifying that the objects are ready.

Executing the job [page 129]


Execute the job to see how the software functions with the recovery mechanism.

Data Services automated recovery properties [page 129]


Data Services provides automated recovery methods to use as an alternative to the job setup for
JOB_Recovery.

Summary and what to do next [page 130]


In this section you learned about the job recovery mechanisms that you can use to recover jobs that
only partially ran, and failed for some reason.

Tutorial
Recovery Mechanisms PUBLIC 121
11.1 Recoverable job

Create a job that contains three objects that are configured so that the job is recoverable.

The recoverable job that you create in this section contains the following objects:

● A script that determines when recovery is required


● A conditional that triggers the appropriate data flow
● Two data flows: A regular data flow and a recovery data flow
● A script that updates a status table that indicates successful execution

Parent topic: Recovery Mechanisms [page 121]

Related Information

Creating local variables [page 122]


Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

11.2 Creating local variables

Local variables contain information that you can use in a script to determine when a job must be recovered.

In previous exercises you defined global variables. Local variables differ from global variables. Use local
variables in a script or expression that is defined in the job or work flow that calls the script.

1. Open the Class_Exercises project in the Project Area and add a new job named JOB_Recovery.

2. Select the job name and select Tools Variables .

The Variables and Parameters dialog box opens.


3. Right-click Variables and select Insert.

A new variable appears named $NewVariableX where X indicates the new variable number.
4. Double-click $NewVariableX and enter $recovery_needed for Name.
5. Select int from the Data type dropdown list.
6. Follow the same steps to create another local variable.
7. Name the variable $end_time and select varchar(20) from the Data type dropdown list.

Tutorial
122 PUBLIC Recovery Mechanisms
Task overview: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

11.3 Creating the script that determines the status

Create a script that checks the $end_time variable to determine if the job completed properly.

The script reads the ending time in the status_table table that corresponds to the most recent start time. If
there is no ending time for the most recent starting time, the software determines that the prior data flow must
not have completed properly.

1. With JOB_Recovery opened in the workspace, add a script to the left side of the workspace and name it
GetWFStatus.
2. Open the script in the workspace and type the script directly into the Script Editor. Make sure that the
script complies with syntax rules for your DBMS.

For Microsoft SQL Server or SAP ASE, enter the following script:

 Sample Code

$end_time = sql('Target_DS', 'select convert(char(20), end_time, 0)

Tutorial
Recovery Mechanisms PUBLIC 123
from status_table where start_time = (select max(start_time)from
status_table)');
if ($end_time IS NULL or $end_time = '')$recovery_needed = 1;
else $recovery_needed = 0;

For Oracle, enter the following script:

 Sample Code

$end_time = sql('Target_DS', 'select to_char(end_time, \'YYYY-MM-DD


HH24:MI:SS\')
from status_table where start_time = (select max(start_time) from
status_table)');
if (($end_time IS NULL) or ($end_time = '')) $recovery_needed = 1; else
$recovery_needed = 0;

3. Validate the script.

Task overview: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating local variables [page 122]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

11.4 Conditionals

Use conditionals to implement if-then-else logic in a work flow.

Conditionals are single use objects, which means they can only be used in the job for which they were created.

Define a conditional for this exercise to specify a recoverable data flow. To define a conditional, you specify a
condition and two logical branches:

Tutorial
124 PUBLIC Recovery Mechanisms
Conditional branch Description

If A Boolean expression that evaluates to TRUE or FALSE. Use


functions, variables, and standard operators to construct the
expression.

Then Work flow elements to execute when the “If” expression eval­
uates to TRUE.

Else (Optional) Work flow elements to execute when the “If” ex­
pression evaluates to FALSE.

Adding the conditional [page 125]


Add a conditional expression to JOB_Recovery to determine the execution path.

Specifying the If-Then work flows [page 126]


Complete the conditional by specifying the data flows to use if the conditional equals true or false.

Parent topic: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating local variables [page 122]
Creating the script that determines the status [page 123]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

11.4.1 Adding the conditional

Add a conditional expression to JOB_Recovery to determine the execution path.

1. Open JOB_Recovery in the workspace.

2. Click the conditional icon on the tool palette then click in the workspace to the right of the script
GetWFStatus.
3. Name the conditional recovery_needed.
4. Double-click the conditional in the workspace to open the Conditional Editor.

The Conditional Editor contains three areas:


○ The if expression text box

Tutorial
Recovery Mechanisms PUBLIC 125
○ A space for specifying the work flow to execute when the if condition evaluates to TRUE. For example, if
condition = true, then perform the task in the Then space.
○ A space for specifying the work flow to execute when the if condition evaluates to FALSE. For example,
if condition does not equal true, run the task in the Else space,
5. Type the following text into the if text box to state the condition.

($recovery_needed = 1)

Complete the conditional by specifying the work flows to execute for the If and Then conditions.

11.4.2 Specifying the If-Then work flows

Complete the conditional by specifying the data flows to use if the conditional equals true or false.

Follow these steps with the recovery_needed conditional open in the workspace:

1. Open the Data Flow tab In the Local Object Library and move DF_SalesOrg to the Else portion of the
Conditional Editor using drag and drop.

You use this data flow for the “false” branch of the conditional.
2. Right-click DF_SalesOrg in the Data Flow tab in the Local Object Library and select Replicate.
3. Name the replicated data flow ACDF_SalesOrg.
4. Move ACDF_SalesOrg to the Then area of the conditional using drag and drop.

This data flow is for the “true” branch of the conditional.


5. Double-click the ACDF_SalesOrg data flow in the Data Flow tab to open it in the workspace.
6. Double-click the SALESORG_DIM target table to open it in the workspace.
7. Open the Options tab in the lower pane of the Target Table Editor.
8. Find the Update control category in the Advanced section and set Auto correct load to Yes.

Auto correct loading ensures that the same row is not duplicated in a target table by matching primary key
fields. See the Reference Guide for more information about how auto correct load works.

Related Information

Creating the script that updates the status [page 126]

11.5 Creating the script that updates the status

This script updates the status_table table with the current timestamp after the work flow in the conditional has
completed. The timestamp indicates a successful execution.

1. With JOB_Recovery opened in the workspace, add the script icon to the right of the recovery_needed
conditional.

Tutorial
126 PUBLIC Recovery Mechanisms
2. Name the script UpdateWFStatus.
3. Double-click UpdateWFStatus to open the Script Editor in the workspace.
4. Enter text using the syntax for your RDBMS.

For Microsoft SQL Server and SAP ASE, enter the following text:

sql('Target_DS', 'update status_table set end_time = getdate() where


start_time = (select max(start_time) from status_table)');

For Oracle, enter the following text:

sql('Target_DS', 'update status_table set end_time = SYSDATE where start_time


= (select max(start_time) from status_table)');

For DB2, enter the following text:

sql('Target_DS','update status_table set end_time = current timestamp where


start_time = (select max(start_time) from status_table)');

5. Validate the script.


6. Close the Script Editor.
7. Open JOB_Recovery in the workspace and connect the objects to indicate execution order.

Connect GetWFStatus script to the recover_needed conditional, and then connect recover_needed
conditional to the UpdateWFStatus script.
8. Save your work.

Task overview: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating local variables [page 122]
Creating the script that determines the status [page 123]
Conditionals [page 124]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

Tutorial
Recovery Mechanisms PUBLIC 127
11.6 Verify the job setup

Make sure that job configuration for JOB_Recovery is complete by verifying that the objects are ready.

Objects in JOB_Recovery

Purpose
Object

GetWFStatus script Determines if recovery is required.

recovery_needed Conditional Specifies the workflow to execute when the “If” statement is
true or false.

● If true, then run the ACDF_SalesOrg data flow


● Else, run the DF_SalesOrg data flow

UpdateWFStatus script Updates the status table with the current timestamp after
the work flow in the conditional has completed. The time­
stamp indicates a successful execution.

Objects in recovery_needed Conditional

Object Purpose

DF_SalesOrg data flow The data flow to execute when the conditional equals false.

ACDF_SalesOrg data flow The data flow to execute when the conditional equals true.

Parent topic: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating local variables [page 122]
Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

Tutorial
128 PUBLIC Recovery Mechanisms
11.7 Executing the job

Execute the job to see how the software functions with the recovery mechanism.

Edit the status table status_table in your DBMS and make sure that the end_time column is NULL or blank.

1. Execute JOB_Recovery.
2. View the Trace messages and the Monitor data to see that the conditional chose ACDF_SalesOrg to
process.

ACDF_SalesOrg is the job that runs when the condition equals true. The condition is true because there
was no date in the end_time column in the status table. The software concludes that the previous job did
not complete and needs recovery.
3. Now execute the JOB_Recovery again.
4. View the Trace messages and the Monitor data to see that the conditional chose DF_SalesOrg to process.

DF_SalesOrg is the job that runs when the conditional equals false. The condition is false for this job
because the end_time column in the status_table contained the date and time of the last execution of the
job. The software concludes that the previous job completed successfully, and that it does not require
recovery.

Task overview: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating local variables [page 122]
Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Data Services automated recovery properties [page 129]
Summary and what to do next [page 130]

11.8 Data Services automated recovery properties

Data Services provides automated recovery methods to use as an alternative to the job setup for
JOB_Recovery.

With automatic recovery, Data Services records the result of each successfully completed step in a job. If a job
fails, you can choose to run the job again in recovery mode. During recovery mode, the software retrieves the
results for successfully completed steps and reruns incompleted or failed steps under the same conditions as
the original job.

Tutorial
Recovery Mechanisms PUBLIC 129
Data Services has the following automatic recovery settings that you can use to recover jobs:

● Select Enable recovery and Recover from last failed execution in the job Execution Properties dialog.
● Select Recover as a unit in the work flow Properties dialog.

For more information about how to use the automated recovery properties in Data Services, see the Designer
Guide.

Parent topic: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]


Creating local variables [page 122]
Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Summary and what to do next [page 130]

11.9 Summary and what to do next

In this section you learned about the job recovery mechanisms that you can use to recover jobs that only
partially ran, and failed for some reason.

In this section, you learned the following methods for recovering jobs:

● Design and implement recoverable work flows.


● Use Data Services Conditional Editor to create if-then-else statements using two data flows.
● Use the Auto correct load option so you do not lose data through data recovery.

As with most features that you learn about in the tutorial, there is more information about the features that you
can learn in the product documentation such as the Designer Guide and the Reference Guide.

The next three sections are optional. They provide information about advanced features available in Data
Services.

Parent topic: Recovery Mechanisms [page 121]

Related Information

Recoverable job [page 122]

Tutorial
130 PUBLIC Recovery Mechanisms
Creating local variables [page 122]
Creating the script that determines the status [page 123]
Conditionals [page 124]
Creating the script that updates the status [page 126]
Verify the job setup [page 128]
Executing the job [page 129]
Data Services automated recovery properties [page 129]
Multiuser Development [page 132]
Extracting SAP application data [page 161]
Real-time jobs [page 190]

Tutorial
Recovery Mechanisms PUBLIC 131
12 Multiuser Development

Data Services enables teams of developers working on separate local repositories to store and share their work
in a central repository.

Each individual developer or team works on the application in their unique local repository. Each team uses a
central repository to store the master copy of its application. The central repository preserves all versions of all
objects in the application so you can revert to a previous version if necessary.

You can implement optional security features for central repositories. For more information about
implementing Central Repository security, see the Designer Guide.

We base the exercises for multiuser development on the following use case: Two developers use a Data
Services job to collect data for the HR department. Each developer has their own local repository and they
share a central repository. Throughout the exercises, the developers modify the objects in the job and use the
central repository to store and manage the modified versions of the objects. You can perform these exercises
by acting as both developers, or work with another person with each of you assuming one of the developer
roles.

12.1 Central Object Library

The central object library acts as source control for managing changes to objects in an environment with
multiple users.

Display the central object library in Designer after you create it. The central object library is a dockable and
movable pane just like the project area and object library.

Through the central object library, authorized users access a library repository that contains versions of
objects saved there from their local repositories. The central object library enables administrators to manage
who can add, view and modify the objects stored in the central repository.

Users must belong to a user group that has permission to perform tasks in the central object library.
Administrators can assign permissions to an entire group of users as well as assign various levels of
permissions to the users in a group.

 Example

All users in Group A can get objects from the central object library. Getting an object means you place a
copy of the object in your local repository. If the object exists in your local repository, Data Services updates
your local copy with the most recent changes. User01 in Group A has administrator rights and can add,
check in, edit, and check out objects. Plus User01 can set other user permissions for the objects.

In Designer, users check out an object from the central repository using the central object library. Once
checked out, no other user can work on that object until it is checked back into the central repository. Other
users can change their local copy of the object, but that does not affect the version in the central repository.

Tutorial
132 PUBLIC Multiuser Development
Related Information

Multi user Development


Central versus local repository
Working in a Multi-user Environment

12.1.1 Central Object Library layout


The information in the central object library is different from the local object library.

In the central object library, there are several icons located at the top of the pane to perform the following
tasks:

● Select various object checkout types


● Show history
● Edit central repository connection
● Refresh the content of the central object library

The top of the pane also displays the current user group to which you belong and the name of the central
repository.

The main area of the central object library contains:

● A list of the object types based on the lower tab that you choose.
● When you check out an object, a red check mark appears over the object name in the left column.

The main area also contains several columns with information for each object:

● Check out user: The name of the user who currently has the object checked out of the library, or blank
when the object is not checked out.
● Check out repository: The name of the repository that contains the checked-out object, or blank when the
object is not checked out.
● Permission: The authorization type for the group that appears in the Group Permission box at the top.
When you add a new object to the central object library, the current group gets FULL permission to the
object and all other groups get READ permission.
● Latest version: A version number and a timestamp that indicate when the software saved this version of the
object.
● Description: Information about the object that you entered when you added the object to the library.

12.2 How multiuser development works


Data Services uses a central repository as a storage location and a version control tool for all objects uploaded
from local repositories.

The central repository retains a history for all objects stored there. Developers use their local repositories to
create, modify, or execute objects such as jobs.

A central repository enables you to perform the following tasks:

Tutorial
Multiuser Development PUBLIC 133
● Get objects
● Add objects
● Check out objects
● Check in objects

Task Description

Get objects Copy objects from the central repository to your local reposi­
tory. If the object already exists in your local repository, the
file from the central repository overwrites the object in your
local repository.

Check out objects The software locks the object when you check it out from the
central repository. No one else can work on the object when
you have it checked out. Other users can copy a locked ob­
ject and put it into their local repository, but it is only a copy.
Any changes that they make cannot be uploaded to the cen­
tral repository.

Check in objects When you check the object back into the central repository,
Data Services creates a new version of the object and saves
the previous version. Other users can check out the object
after you check it in. Other users can also view the object
history to view changes that you made to the object.

Add objects Add objects from your local repository to the central reposi­
tory any time, as long as the object does not already exist in
the central repository.

The central repository works like file collaboration and version control software. The central repository retains a
history for each object. The object history lists all versions of the object. Revert to a previous version of the
object if you want to undo your changes. Before you revert an object to a previous version, make sure that you
are not mistakingly undoing changes from other users.

12.3 Preparation

Your system administrator sets up the multiuser environment to include two repositories and a central
repository.

Create three repositories using the user names and passwords listed in the following table.

Tutorial
134 PUBLIC Multiuser Development
User Name Password

central central

user1 user1

user2 user2

Create the repositories based on the rules for your DBMS.

 Example

For example, with Oracle use the same database for the additional repositories. However, first add the users
listed in the table to the existing database. Make sure that you assign the appropriate access rights for each
user. When you create the additional repositories, Data Services qualifies the names of the repository
tables with these user names.

 Example

For Microsoft SQL Server, create a new database for each of the repositories listed in the table. When you
create the user names and passwords, ensure that you specify appropriate server and database roles to
each database.

Consult the Designer Guide and the Management Console Guide for additional details about multiuser
environments.

1. Configuring the central repository [page 135]


Configure a central repository using the Data Services Repository Manager.
2. Configuring two local repositories [page 136]
Configure the two local repositories using the Data Services Repository Manager.
3. Associating repositories to your job server [page 137]
You assign a Job Server to each repository to enable job execution in Data Services.
4. Defining connections to the central repository [page 137]
Assign the central repository named central to user1 and user2 repositories.

12.3.1 Configuring the central repository

Configure a central repository using the Data Services Repository Manager.

Follow these steps to configure a central repository. If you created a central repository during installation, use
that central repository for the exercises.

1. If you have Designer open, close it before proceeding.

2. From your Windows Start menu, click Programs SAP Data Services 4.2 Data Services Repository
Manager .

The Repository Manager opens.


3. Select Central from the Repository type dropdown list.

Tutorial
Multiuser Development PUBLIC 135
4. Select the database type from the Database type dropdown list.
5. Enter the remaining connection information for the central repository based on the database type you
chose.
6. Entral central for both User name and Password.
7. Click Create.

Data Services creates repository tables in the database that you identified.
8. Click Close.

Task overview: Preparation [page 134]

Next task: Configuring two local repositories [page 136]

12.3.2 Configuring two local repositories

Configure the two local repositories using the Data Services Repository Manager.

Repeat these steps to configure the user1 repository and the user2 repository.

1. If you have Designer open, close it before proceeding.

2. From the Start menu, click Programs SAP Data Services 4.2 Data Services Repository Manager .
3. Enter the database connection information for the local repository.
4. Type the following user name and password based on which repository you are creating:

Repository User name Password

1 user1 user1

2 user2 user2

5. For Repository type, click Local.


6. Click Create.
7. Click Close.

Task overview: Preparation [page 134]

Previous task: Configuring the central repository [page 135]

Next task: Associating repositories to your job server [page 137]

Tutorial
136 PUBLIC Multiuser Development
12.3.3 Associating repositories to your job server

You assign a Job Server to each repository to enable job execution in Data Services.

1. From the Start menu, click Programs SAP Data Services 4.2 Data Services Server Manager .
2. Click Configuration Editor in the Job Server tab.

The Job Server Configuration Editor dialog box opens.


3. Select the Job Server name and click Edit.

The Job Server Properties dialog box opens. A list of current associated repositories appears in the
Associated Repositories list, if applicable.
4. Click Add under the Associated Repositories list.

The Repository Information options become active on the right side of the dialog box.
5. Select the appropriate database type for your local repository from the Database type dropdown list.
6. Complete the appropriate connection information for your database type as applicable.
7. Type user1 in both the User name and Password fields.
8. Click Apply.

<databasename>_user1 appears in the Associated Repositories list.


9. After the software completes processing, a message appears stating that the local repository was
successfully created.
10. Repeat steps 4 through 8 to associate user2 to your job server.
11. Click OK to close the Job Server Properties dialog box.
12. Click OK to close the Job Server Configuration Editor.
13. Click Close and Restart on the Server Manager dialog box.
14. Click OK to confirm that you want to restart the Data Services Service.

The software resyncs the job server with the repositories that you just set up.

Task overview: Preparation [page 134]

Previous task: Configuring two local repositories [page 136]

Next task: Defining connections to the central repository [page 137]

12.3.4 Defining connections to the central repository

Assign the central repository named central to user1 and user2 repositories.

1. Start the Designer, enter your log in credentials, and click Log on.
2. Select the repository user1 and click OK.
3. Enter the password for user1.

If you created the user1 repository as instructed, the password is user1.

Tutorial
Multiuser Development PUBLIC 137
4. Select Tools Central Repositories. .

The Options dialog box opens.


5. Select Central Repository Connections at left.

The Central Repository Connections option is selected by default.


6. Click Add.
7. Enter your log in credentials and click Log on.

A list of the repositories appears if applicable.


8. Select central from the list of available repositories and click OK.
9. Click OK to close the Options dialog box.

If a prompt appears asking to overwrite the Job Server option parameters, select Yes.
10. Exit Designer.
11. Perform the same steps to connect user2 to the central repository.

Task overview: Preparation [page 134]

Previous task: Associating repositories to your job server [page 137]

Related Information

Activating a connection to the central repository [page 139]

12.4 Working in a multiuser environment

As you perform the tasks in this section, Data Services adds all objects to your local repositories.

For this exercise, you will learn the following:

● Activating a connection to the central repository


● Importing objects to your local repository
● Adding objects to the central repository
● Checking out and checking in objects in the central repository
● Comparing objects
● Filtering objects

Activating a connection to the central repository [page 139]


Activate the central repository for the user1 and user2 local repositories so that the local repository has
central repository connection information.

Importing objects into your local repository [page 140]


Import objects from the multiusertutorial.atl file so the objects are ready to use for the
exercises.

Tutorial
138 PUBLIC Multiuser Development
Adding objects to the central repository [page 142]
After you import objects to the user1 local repository, add the objects to the central repository for
storage.

Check out objects from the central repository [page 144]


When you check out an object from the central repository, it becomes unavailable for other users to
change it.

Checking in objects to the central repository [page 147]


You can check in an object by itself or check it in along with all associated dependent objects. When an
object and its dependents are checked out and you check in the single object without its dependents,
the dependent objects remain checked out.

Setting up the user2 environment [page 148]


Set up the environment for user2 so that you can perform the remaining tasks in Multiuser
development.

Undo checkout [page 149]


Undo a checkout to restore the object in the central repository to the condition in which it was when
you checked it out.

Comparing objects [page 150]


Compare two objects, one from the local repository, and the same object from the central repository to
view the differences between the objects.

Check out object without replacement [page 152]


Check out an object from the central repository so that SAP Data Services does not overwrite your
local copy.

Get objects [page 155]


When you get an object from the central repository, you are making a copy of a specific version for your
local repository.

Filter dependent objects [page 157]


Use filtering to select the dependent objects to include, exclude, or replace when you add, check out, or
check in objects in a central repository.

Deleting objects [page 158]

Related Information

Defining connections to the central repository [page 137]

12.4.1 Activating a connection to the central repository

Activate the central repository for the user1 and user2 local repositories so that the local repository has central
repository connection information.

Log in to Designer and select the user1 repository.

1. From the Tools menu, click Central Repositories.

Tutorial
Multiuser Development PUBLIC 139
The Central Repository Connections option is selected by default in the Designer list.
2. In the Central repository connections list, select Central and click Activate.

Data Services activates a link between the user1 repository and the central repository.
3. Select the option Activate automatically.

This option enables you to move back and forth between user1 and user2 local repositories without
reactivating the connection to the central repository each time.
4. Open the Central Object Library by clicking the Central Object Library icon on the Designer toolbar.

For the rest of the exercises in this section, we assume that you have the Central Object Library available in the
Designer.

Task overview: Working in a multiuser environment [page 138]

Related Information

Importing objects into your local repository [page 140]


Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]
Central Object Library [page 132]

12.4.2 Importing objects into your local repository

Import objects from the multiusertutorial.atl file so the objects are ready to use for the exercises.

Before you can import objects into the local repository, complete the tasks in the section Preparation [page
134].

Tutorial
140 PUBLIC Multiuser Development
1. Log in to Data Services and select the user1 repository.

2. In the Local Object Library, right-click in a blank space and click Repository Import From File .
3. Select multiusertutorial.atl located in <LINK_DIR>\Tutorial Files and click Open.

A prompt opens explaining that the chosen ATL file is from an earlier release of Data Services. The ATL
older version does not affect the tutorial exercises. Therefore, click Yes.

Another prompt appears asking if you want to overwrite existing data. Click Yes.

The Import Plan window opens.


4. Click Import.
5. Enter dstutorial for the passphrase and click Import.

The multiusertutorial.atl file contains a batch job with previously created work flows and data flows.
6. Open the Project tab in the Local Object Library and double-click MU to open the project in the Project
Area.
The MU project contains the following objects:
○ JOB_Employee
○ WF_EmpPos
○ DF_EmpDept
○ DF_EmpLoc
○ WF_PostHireDate
○ DF_PostHireDate

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]

Tutorial
Multiuser Development PUBLIC 141
12.4.3 Adding objects to the central repository

After you import objects to the user1 local repository, add the objects to the central repository for storage.

When you add objects to the central repository, add a single object or the object and its dependents. All
projects and objects in the object library can be stored in a central repository.

Adding a single object to the central repository [page 142]


After importing objects into the user1 local repository, you can add them to the central repository for
storage.

Adding an object and dependents to the central repository [page 143]


Select to add an object and object dependents from the local repository to the central repository.

Adding dependent objects that already exist [page 144]


Add an object and dependents that has dependent objects that were already added to the central
repository through a different object.

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]

12.4.3.1 Adding a single object to the central repository

After importing objects into the user1 local repository, you can add them to the central repository for storage.

Follow these steps to add a single object from the user1 repository to the central repository:

1. Click the Formats tab in the user1 Local Object Library.

 Note

Make sure that you verify that you are using the correct library by reading the header information.

Tutorial
142 PUBLIC Multiuser Development
2. Expand the Flat Files node to display the file names.

3. Right-click NameDate_Format and select Add to Central Repository Object .


4. Optional. Add any comments about the object.
5. Click Continue.

A status Options dialog box opens to indicate that Data Services added the object successfully.

 Note

If the object already exists in the central repository, the Add to Central Repository option is not active.

6. Open the Central Object Library and open the Formats tab.

Expand Flat Files to see the NameDate_Format file is now in the central repository.

12.4.3.2 Adding an object and dependents to the central


repository

Select to add an object and object dependents from the local repository to the central repository.

Log in to Data Services Designer, select the user1 repository, and enter user1 for the repository password.

1. Open the Work Flows tab in the Local Object Library.


2. Double-click WF_EmpPos.
Dependent objects include the following:

○ DF_EmpDept
○ DF_EmpLoc

3. Right-click WF_EmpPos in the Local Object Library and select Add to Central Repository Object and
dependents .

Instead of choosing the right-click options, you can move objects from your local repository to the central
repository using drag and drop. The Version Control Confirmation dialog box opens. Click Next and the click
Next again so that all dependent objects are included in the addition.

The Add - Object and dependents dialog box opens.


4. Type a comment as in the following example.

Adding WF_EmpPos, DF_EmpDept, and DF_EmpLoc to the central repository.

5. Click Apply Comments to all Objects.

The comment appears for the object and all dependents when you view the history in the central
repository.
6. Click Continue.

The Output dialog box displays with a message that states “Add object completed”. Close the dialog box.
7. Verify that the Central Object Library contains the WF_EmpPos, DF_EmpDept, and DF_EmpLoc objects in
their respective tabs.

Tutorial
Multiuser Development PUBLIC 143
When you include the dependents of the WF_EmpPos, you add other dependent objects, including dependents
of the two data flows DF_EmpDept and DF_EmpLoc.

● Open the Datastores tab in the Central Object Library to see that the NAMEDEPT and the POSLOC tables.
● Open the Format tab in the Central Object Library to see the flat files PosDept_Format,
NamePos_Format, and NameLoc_Format objects

12.4.3.3 Adding dependent objects that already exist

Add an object and dependents that has dependent objects that were already added to the central repository
through a different object.

This topic continues from Adding an object and dependents to the central repository [page 143]. We assume
that you are still logged in to the user1 repository in Designer.

1. Open the Work Flow tab in the Local Object Library.

2. Right-click WF_PosHireDate and select Add to Central Repository Objects and dependents .

The Add to Central Repository Alert dialog box appears listing the objects that already exist in the central
repository:
○ DW_DS
○ NameDate_Format
○ NamePos_Format
○ POSHDATE(DWS_DS.USER1)
3. Click Yes to continue.

It is okay to continue with the process because you haven't changed the existing objects yet.
4. Enter a comment and select Apply comments to all objects.

Adding WF_PosHireDate and DF_PosHireDate to central repository.

5. Click Continue.
6. Close the Output dialog box.

The central repository now contains all objects in the user1 local repository. Developers who have access to the
central repository can check out, check in, label, and get those objects.

12.4.4 Check out objects from the central repository

When you check out an object from the central repository, it becomes unavailable for other users to change it.

You can check out a single object or check out an object with dependents.

● If you check out a single object such as WF_EmpPos, it is not available for any user to change it. However,
the dependent object DF_EmpDept, remains in the central repository and it can be checked out by other
users.

Tutorial
144 PUBLIC Multiuser Development
● If you check out WF_EmpPos and the dependent DF_EmpDept, no one else can check out those objects.
Change the objects and save your changes locally, and then check the objects with your changes back into
the central repository. The repository creates a new version of the objects that include your changes.

After you make your changes and check the changed objects back into the central repository, other users can
view your changes, and check out the objects to make additional changes.

Checking out an object and its dependent objects [page 145]


Check out an object and dependent objects from the central repository using menu options or icon
tools.

Modifying dependent objects [page 146]


Modify the data flows that are dependent of WF_EmpPos.

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]

12.4.4.1 Checking out an object and its dependent objects


Check out an object and dependent objects from the central repository using menu options or icon tools.

Perform the following steps while you are logged in to the user1 repository.

1. Open the Central Object Library and open the Work Flow tab.

2. Right-click WF_EmpPos and select Check Out Object and dependents .

A warning appears telling you that checking out WF_EmpPos does not include the datastores. To include the
datastores in the checkout, use the Check Out with Filtering check out option.

 Note

The software does not include the datastore DW_DS in the checkout as the message states. However,
the tables NAMEDEPT and POSLOC, which are listed under the Tables node of DW_DS, are included in the
dependent objects that are checked out.

Tutorial
Multiuser Development PUBLIC 145
3. Click Yes to continue.

Alternatively, you can select the object in the Central Object Library, then click the Check out object and

dependents icon on the Central Object Library toolbar.


4. Close the Object dialog box.

Data Services copies the most recent version of WF_EmpPos and its dependent objects from the central
repository into the user1 local repository. A red check mark appears on the icon for objects that are checked
out in both the local and central repositories.

User1 can modify the WF_EmpPos work flow and the checked out dependents in the local repository while it is
checked out of the central repository.

Related Information

Filter dependent objects [page 157]

12.4.4.2 Modifying dependent objects

Modify the data flows that are dependent of WF_EmpPos.

1. In the local object library, click the Data Flow tab.


2. Open the DF_EmpDept data flow.
3. In the DF_EmpDept workspace, double-click the query to open the Query Editor.
4. Change the mapping in the Schema Out pane: Right-click FName and click Cut.
5. Click the Back arrow in the icon bar to return to the data flow.
6. Open the DF_EmpLoc data flow.
7. In the DF_EmpLoc workspace, double-click the query to open the Query Editor.
8. Cut the following rows from the Schema Out pane as follows:
a. Right-click FName and click Cut.
b. Right-click LName and click Cut.
9. Save your work by clicking the Save All icon.

Related Information

Checking in a single object [page 147]

Tutorial
146 PUBLIC Multiuser Development
12.4.5 Checking in objects to the central repository

You can check in an object by itself or check it in along with all associated dependent objects. When an object
and its dependents are checked out and you check in the single object without its dependents, the dependent
objects remain checked out.

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]

12.4.5.1 Checking in a single object

After you change an existing object, check it into the central repository so that other users can access it.

1. In SAP Data Services Designer, open the Central Object Library.


2. Open the Data Flow tab and right-click DF_EmpLoc.

3. Select Check In Object .

The Comment dialog box opens.


4. Type the following text In the Comments field:

Removed FName and LName columns from NAMEDEPT target table

5. Click Continue. Close the Output dialog box.

Data Services copies the object from the user1 local repository to the central repository and removes the
check-out marks.
6. In the Central Object Library window, right-click DF_EmpLoc and click Show History.

The History dialog box contains the user name, date, action, and version number for each time the file was
checked out and checked back in. The dialog box also lists the comments that the user included when they
checked the object into the central repository. This information is helpful for many reasons, including:

Tutorial
Multiuser Development PUBLIC 147
○ Providing information to the next developer who checks out the object.
○ Helping you decide what version to choose when you want to roll back to an older version.
○ Viewing the difference between versions.

For more information about viewing history, see the Designer Guide.
7. After you have reviewed the history, click Close.

The next portion of this exercise involves a second developer, user2.

Related Information

Setting up the user2 environment [page 148]

12.4.6 Setting up the user2 environment

Set up the environment for user2 so that you can perform the remaining tasks in Multiuser development.

Log into SAP Data Services Designer and choose the user2 repository. Enter user2 for the password.

Set up the user2 developer environment in the same way that you set up the environment for user1. The
following is a summary of the steps:

1. Import the multiusertutorial.atl file.


2. Activate the connection to the central repository.
3. Open the Central Object Library and dock it along with the Local Object Library and the Project Area.

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]
Importing objects into your local repository [page 140]

Tutorial
148 PUBLIC Multiuser Development
Activating a connection to the central repository [page 139]

12.4.7 Undo checkout

Undo a checkout to restore the object in the central repository to the condition in which it was when you
checked it out.

In this exercise, you check out DF_PosHireDate from the central repository, modify it, and save your changes
to your local repository. Then you undo the checkout of DF_PosHireDate from the central repository.

When you undo a checkout, you restore the object in the central repository to the way it was when you checked
it out. SAP Data Services does not save changes or create a new version in the central repository. Your local
repository, however, retains the changes that you made. To undo changes in your local repository, “get” the
object from the central repository after you undo the checkout. The software overwrites your local copy and
replaces it with the restored copy of the object in the central repository.

Undo checkout works for both a single object as well as objects with dependents.

Checking out and modifying an object [page 150]


Check out the DF_PosHireDate and modify the output mapping in the query.

Undoing an object checkout [page 150]


Undo an object checkout when you do not want to save your changes, and you want to revert the object
back to the original content when you checked it out.

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]
Get objects [page 155]

Tutorial
Multiuser Development PUBLIC 149
12.4.7.1 Checking out and modifying an object

Check out the DF_PosHireDate and modify the output mapping in the query.

Log on to Designer and the user2 repository.

1. Open the Central Object Library.


2. Open the Data Flow tab, expand Data Flows, and right-click DF_PosHireDate.

3. Select Check Out Object .

The DF_PosHireDate object appears with a red checkmark in both the Local Object Library and the
Central Object Library indicating that it is checked out.
4. In the local object library, double-click DF_PosHireDate to open it in the workspace.
5. Double-click the query in the data flow to open the Query Editor.
6. In the Schema Out pane, right-click LName and click Cut.

You have changed the mapping in the data flow.


7. Save your work.

12.4.7.2 Undoing an object checkout

Undo an object checkout when you do not want to save your changes, and you want to revert the object back to
the original content when you checked it out.

Log on to Designer and the user2 repository.

1. Open the Data Flow tab in the Central Object Library and expand Data Flows.

2. Right-click DF_PosHireDate and click Undo Check Out Object .

Data Services removes the check-out symbol from DF_PosHireDate in the Local and Central Object Library,
without saving your changes in the central repository. The object in your local repository still has the output
mapping change.

Related Information

Comparing objects [page 150]

12.4.8 Comparing objects

Compare two objects, one from the local repository, and the same object from the central repository to view
the differences between the objects.

Make sure that you have followed all of the steps in the Undo checkout section.

Log on to Designer and the user2 repository.

Tutorial
150 PUBLIC Multiuser Development
1. Expand the Data Flow tab in the Local Object Library and expand Data Flows.

2. Right-click DF_PosHireDate and click Compare Object with dependents to Central .

The Difference Viewer opens in the workspace. It shows the local repository contents for DF_PosHireDate
on the left and the central repository contents for DF_PosHireDate on the right.
3. Examine the data in the Difference Viewer.

The Difference Viewer helps you find the differences between the local object and the object in the central
repository.

Expand the Query node and then expand the Query table icon. The Difference Viewer indicates that the
LName column was removed in the local repository on the left, but it was added back in the central
repository. The text is in green, and the green icon appears signifying that there was an insertion.

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]

12.4.8.1 Difference Viewer data

The Difference Viewer shows the difference between an object in the local repository and the central repository.

In the following screen capture, the Difference Viewer shows the differences between the DF_PosHireDate
objects in the left and right panes. Notice the following areas of the dialog box:

● Each line represents an object or item in the object.


● The red bars on the right indicate where data is different. Click a red bar on the right and the viewer
highlights the line that contains the difference.
● The changed lines contain a colored status icon on the object icon that shows the status: Deleted,
changed, inserted, or consolidated. There is a key at the bottom of the Difference Viewer that lists the
status that corresponds to each colored status icon.

Tutorial
Multiuser Development PUBLIC 151
The Difference Viewer contains a status line at the bottom of the dialog box as shown in the image below. The
status line indicates the number of differences. If there are no differences, the status line indicates Difference [ ]
of 0. To the left of the status line is a key to the colored status icons.

Colored status icons and descriptions

Icon Status Description

Deleted The item does not appear in the object in the right pane.

Changed The differences between the items are highlighted in blue (the default) text.

Inserted The item has been added to the object in the right pane.

Consolidated The items within the line have differences. Expand the item by clicking its plus
sign to view the differences

12.4.9 Check out object without replacement


Check out an object from the central repository so that SAP Data Services does not overwrite your local copy.

 Example

For example, you may need to use the checkout without replacement option when you change an object in
your local repository before you check it out from the central repository.

Tutorial
152 PUBLIC Multiuser Development
The option prevents Data Services from overwriting the changes that you made in your local copy.

After you have checked out the object from the central repository the object in both the central and local
repository has a red check out icon. But the local copy is not replaced with the version in the central repository.
You can then check your local version into the central repository so that it is updated with your changes.

Do not ues the check out without replacement option if another user checked out the file from the central
repository, made changes, and then checked in the changes.

 Example

For example, you make changes to your local copy of Object-A without realizing you are working in your
local copy.

Meanwhile, another developer checks out Object-A from the central repository, makes extensive changes
and checks it back in to the central repository.

You finally remember to check out Object-A from the central repository. Instead of checking the object
history, you assume that you were the last developer to work in the master of Object-A, so you check
Object-A out of the central repository using the without replacement option. When you check your local
version of Object-A into the central repository, all changes that the other developer made are overwritten.

 Caution

Before you use the Object without replacement option in a multiuser environment, check the history of the
object in the central repository. Make sure that you are the last person who worked on the object.

In the next exercise, user2 uses the check out option Object without replacement to be able to update the
master version in the central repository with changes from the version in the local repository.

Checking out an object without replacement [page 154]


Use the checkout option without replacement to check out an object from the central repository
without overwriting the local copy that has changed.

Checking in the DF_EmpLoc data flow [page 154]


Check in the local version of DF_EmpLoc to update the central repository version to include your
changes.

Checking in DF_EmpDept and WF_EmpPos [page 155]


Check in files from the user1 repository.

Parent topic: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]

Tutorial
Multiuser Development PUBLIC 153
Undo checkout [page 149]
Comparing objects [page 150]
Get objects [page 155]
Filter dependent objects [page 157]
Deleting objects [page 158]

12.4.9.1 Checking out an object without replacement

Use the checkout option without replacement to check out an object from the central repository without
overwriting the local copy that has changed.

Log in to Data Services Designer and the user2 repository.

1. Open the Data Flow tab in the Local Object Library and expand Data Flows.
2. Double-click DF_EmpLoc to open it in the workspace.
3. Double-click the query in the workspace to open the Query Editor.
4. Right-click FName in the Schema Out pane and click Cut.
5. Save your work.
6. Open the Data Flow tab of the Central Object Library and expand Data Flows.

7. Right-click DF_EmpLoc and click Check Out Object without replacement .


8. Close the Output window

The software marks the DF_EmpLoc object in the Central Object Library and the Local Object Library as
checked out. The software does not overwrite the object in the Local Object Library, but preserves the
object as is.

12.4.9.2 Checking in the DF_EmpLoc data flow

Check in the local version of DF_EmpLoc to update the central repository version to include your changes.

These steps continue from the topic Checking out an object without replacement [page 154].

1. In the Central Object Library, right-click DF_EmpLoc and select Check in Object .
2. Type a comment as follows in the Comment dialog box and click Continue.

Removed FName from POSLOC

Now the central repository contains a third version of DF_EmpLoc. This version is the same as the copy of
DF_EmpLoc in the user2 local object library.

3. Right-click DF_EmpLoc in your Local Object Library and select Compare Object to central .

The Difference Viewer should show the two objects as the same.

Tutorial
154 PUBLIC Multiuser Development
12.4.9.3 Checking in DF_EmpDept and WF_EmpPos

Check in files from the user1 repository.

1. Log on to Designer and the user1 repository.


2. Open the Central Object Library, open the Data Flow tab and expand Data Flows.

3. Right-click DF_EmpDept and select Check in Object and dependents .


4. Enter a comment like the following example and click Continue.

Removed FName from NAMEDEPT

5. Click Yes in the Check In Warning window.


6. Confirm that the central repository contains the following versions by right-clicking each object and
clicking Show History.

○ Three versions of DF_EmpLoc


○ Two versions of DF_EmpDept
○ One version of DF_PosHireDate
7. Save your work and log out of Designer.

Related Information

Get objects [page 155]

12.4.10 Get objects

When you get an object from the central repository, you are making a copy of a specific version for your local
repository.

You might want to copy a specific version of an object from the central repository into your local repository.
Getting objects allows you to select a version other than the most recent version to copy. When you get an
object, you replace the version in your local repository with the version that you copied from the central
repository. The object is not checked out of the central repository, and it is still available for others to lock and
check out.

Getting the latest version of an object [page 156]


Obtain a copy of the latest version of an object from the central repository.

Getting a previous version of an object [page 156]


Obtain a copy of a select previous version of an object from the central repository.

Task overview: Working in a multiuser environment [page 138]

Tutorial
Multiuser Development PUBLIC 155
Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Filter dependent objects [page 157]
Deleting objects [page 158]

12.4.10.1 Getting the latest version of an object

Obtain a copy of the latest version of an object from the central repository.

Perform the following steps in Designer. You can use either the user1 or user2 repository.

1. Open the Data Flow tab of the Local Object Library.


2. Open DF_EmpLoc in the workspace.
3. Open the query to open the Query Editor.
4. Notice that Pos and Loc are the only two columns in the Schema Out pane.
5. Click the Back icon in the icon menu bar to close the Query Editor.
6. Open the Data Flow tab In the Central Object Library.

7. Right-click DF_EmpLoc and select Get Latest Version Object from the dropdown menu.

Data Services copies the most recent version of the data flow from the central repository to the local
repository.
8. Open the DF_EmpLoc data flow in the Local Object Library,.
9. Open the query to open the Query Editor.
10. Notice that there are now three columns in the Schema Out pane: LName, Pos, and Loc.

The latest version of DF_EmpLoc from the central repository overwrites the previous copy in the local
repository.
11. Click the Back arrow in the icon menu bar to return to the data flow.

12.4.10.2 Getting a previous version of an object

Obtain a copy of a select previous version of an object from the central repository.

Perform the following steps in Designer. You can use either the user1 or user2 repository.

Tutorial
156 PUBLIC Multiuser Development
When you get a previous version of an object, you get the object but not its dependents.

1. Open the Data Flows tab of the Central Object Library.


2. Right-click DF_EmpLoc and select Show History.
3. In the History window, click the Version 1 of the DF_EmpLoc data flow.
4. Click Get Object By Version.
5. Click Close to close the History dialog box.
6. Open the Data Flows tab of the Local Object Library and open DF_EmpLoc.
7. Open the query.
8. Notice that all of the original columns are listed in the Schema Out pane.

Version 1 of DF_EmpLoc is the version that you first added to the central repository at the beginning of this
section. The software overwrote the altered version in your local repository with Version 1 from the central
repository.

12.4.11 Filter dependent objects

Use filtering to select the dependent objects to include, exclude, or replace when you add, check out, or check
in objects in a central repository.

When multiple users work on an application, some objects can contain repository-specific information. For
example, datastores and database tables might refer to a particular database connection unique to a user or a
phase of development. After you check out an object with filtering, you can change or replace the following
configurations:

● Change datastore and database connection information to your local repository


● Change the root directory for files associated with a particular file format to a directory on your local
repository
● Replace or exclude specific dependent objects when you check in the object to the central repository

Task overview: Working in a multiuser environment [page 138]

Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]

Tutorial
Multiuser Development PUBLIC 157
Deleting objects [page 158]
Checking out objects with filtering [page 158]
Deleting objects [page 158]

12.4.11.1 Checking out objects with filtering

1. Log on to Designer and the user2 repository.


2. Open the Workflow tab of the Central Object Library.

3. Right-click WF_EmpPos and click Check Out With filtering .

The Version Control Confirmation dialog box opens with a list of dependent object types. Expand each node
to see a list of dependent objects of that object type.
4. Select NamePos_Format under Flat Files.
5. Select Exclude from the Target status dropdown list.

The word “excluded” appears next to NamePos_Format in the Action column. Data Services excludes the
flat file NamePos_Format from the dependent objects to be checked out.
6. Click Next.

The Datastore Options dialog box opens listing the datastores that are used by NamePos_Format.
7. Click Finish.

You may see a Check Out Alert dialog box stating that there are some dependent objects checked out by other
users. For example, if user1 checked in the WF_EmpPos workflow to the central repository without selecting to
include the dependent objects, the dependent objects could still be checked out. The Check Out Alert lists the
reasons why each listed object cannot be checked out. For example, “The object is checked out by the
repository: user1”. This reason provides you with the information to decide what to do next:

● Select Yes to get copies of the latest versions of the selected objects into your repository.
● Select No to check out the objects that are not already checked out by another user.
● Select Cancel to cancel the checkout.

12.4.12 Deleting objects

You can delete objects from the local or the central repository.

Deleting an object from the central repository [page 159]


When you delete objects from the central repository, dependent objects, and objects in your local
repositories are not always deleted.

Deleting an object from a local repository [page 159]


When you delete an object from a local repository, it is not deleted from the central repository.

Task overview: Working in a multiuser environment [page 138]

Tutorial
158 PUBLIC Multiuser Development
Related Information

Activating a connection to the central repository [page 139]


Importing objects into your local repository [page 140]
Adding objects to the central repository [page 142]
Check out objects from the central repository [page 144]
Checking in objects to the central repository [page 147]
Setting up the user2 environment [page 148]
Undo checkout [page 149]
Comparing objects [page 150]
Check out object without replacement [page 152]
Get objects [page 155]
Filter dependent objects [page 157]

12.4.12.1 Deleting an object from the central repository

When you delete objects from the central repository, dependent objects, and objects in your local repositories
are not always deleted.

1. Log on to Designer and the user2 repository.


2. Open the Work Flows tab in the Central Object Library.
3. Right-click WF_PosHireDate and click Delete.
4. Click OK to confirm the deletion.
5. Open the Data Flows tab in the Central Object Library.
6. Verify that DF_PosHireDate was not deleted from the central repository.

When you delete objects from the central repository, you delete only the selected object and all versions of
it; you do not delete any dependent objects.
7. Open the Work Flows tab in the local object library to verify that WF_PosHireDate was not deleted from the
user2 local object library.

When you delete an object from a central repository, it is not automatically deleted from the connected
local repositories.

12.4.12.2 Deleting an object from a local repository

When you delete an object from a local repository, it is not deleted from the central repository.

1. Log on to Designer and the user2 repository.


2. open the Data Flows tab in the Local Object Library.
3. Right-click DF_EmpDept and click Delete.

Tutorial
Multiuser Development PUBLIC 159
When you delete an object from a local repository, the software does not delete it from the central
repository. If you delete an object from your local repository by accident, recover the object by selecting to
“Get” the object from the central repository, if it exists in your central repository.
4. Open the Central Object Library.
5. Click the Refresh icon on the object library toolbar.
6. Open the Data Flows tab in the Central Object Library and verify that DF_EmpDept was not deleted from
the central repository.
7. Exit Data Services.

12.5 Summary

In this section you learned many aspects of working in a multiuser environment.

In this section you learned the following skills:

● Connecting a local repository to the central repository.


● Adding objects from your local repository to the central repository.
● Check out single objects or an object and its dependents from the central repository.
● Rolling back to a previous version of an object.
● Working with more than one user in a central repository.

For more information about the topics covered in this section, see the Designer Guide.

Tutorial
160 PUBLIC Multiuser Development
13 Extracting SAP application data

In this section you learn how to use the SAP Data Services objects that extract data from SAP applications.

To extract data from your SAP applications using Data Services, use the following objects:

● SAP application datastore


● ABAP data flow
● Data transport

 Note

To perform the exercises in this section, your implementation of Data Services must be able to connect to
an SAP remote server. Ask your administrator for details.

 Note

The structure of standard SAP tables varies between versions. Therefore, the sample tables for these
exercises may not work with all versions of SAP applications. If the exercises in this section are not working
as documented, it may be because of the versions of your SAP applications.

In this section, we work with the data sources that are circled in the star schema:

SAP applications [page 162]


SAP applications are the main building blocks of the SAP solution portfolios for industries.

Defining an SAP application datastore [page 163]


Use the SAP application datastore to connect Data Services to the SAP application server.

Tutorial
Extracting SAP application data PUBLIC 161
Importing metadata [page 163]
Import SAP application tables into the new datastore SAP_DS for the exercises in this section.

Repopulate the customer dimension table [page 165]


Repopulate the customer dimension table by configuring a data flow that outputs SAP application data
to a datastore table.

Repopulating the material dimension table [page 173]


Repopulate the material dimension table using the SAP_DS datastore.

Repopulating the Sales Fact table [page 179]


Repopulate the Sales Fact table from two SAP application sources.

Summary [page 188]


This section showed you how to use SAP applications as a source by creating a datastore that contains
connection information to a remote server.

13.1 SAP applications

SAP applications are the main building blocks of the SAP solution portfolios for industries.

SAP applications provide the software foundation with which organizations address their business issues. SAP
delivers the following types of applications:

General-purpose applications These include applications provided within SAP Business


Suite software such as the SAP Customer Relationship
Management application and the SAP ERP application.

Industry-specific applications These applications perform targeted, industry-specific busi­


ness functions. Examples:

● SAP Apparel and Footwear Solution for Consumer Prod­


ucts
● SAP Reinsurance Management application for the insur­
ance industry

Ask your system administrator about the types of SAP applications that your organization uses.

Parent topic: Extracting SAP application data [page 161]

Related Information

Defining an SAP application datastore [page 163]


Importing metadata [page 163]
Repopulate the customer dimension table [page 165]
Repopulating the material dimension table [page 173]

Tutorial
162 PUBLIC Extracting SAP application data
Repopulating the Sales Fact table [page 179]
Summary [page 188]

13.2 Defining an SAP application datastore

Use the SAP application datastore to connect Data Services to the SAP application server.

Log on to Designer and to the tutorial repository. Do not use the user1, user2, or central repositories that you
created for the multiuser exercises.

1. Open the Datastores tab in the Local Object Library.


2. Right-click on a blank space in the tab and click New.

The Create New Datastore dialog box opens.


3. Type SAP_DS for Datastore name.

This name identifies the database connection inside the software.


4. Select SAP Applications from the Datastore type dropdown list.
5. Type the name of the remote SAP application computer (host) in Database server name.
6. Enter the applicable information for the SAP application server in User Name and Password.
7. Click OK.

The new datastore appears in the Datastore tab of the Local Object Library.

Task overview: Extracting SAP application data [page 161]

Related Information

SAP applications [page 162]


Importing metadata [page 163]
Repopulate the customer dimension table [page 165]
Repopulating the material dimension table [page 173]
Repopulating the Sales Fact table [page 179]
Summary [page 188]

13.3 Importing metadata

Import SAP application tables into the new datastore SAP_DS for the exercises in this section.

Create and configure the SAP application datastore named SAP_DS before you import the metadata.

1. Open the Datastores tab in the Local Object Library.

Tutorial
Extracting SAP application data PUBLIC 163
2. Right-click SAP_DS and click Import by Name.

The Import By Name dialog box opens.


3. Select Table from the Type dropdown list.
4. Type KNA1 in Name.

The software automatically completes the owner name.


5. Click Import.
6. Repeat steps 1 through 5 for the following additional SAP tables:

MAKT
MARA
VBAK
VBUP

The software adds the tables to the Datastores tab of the Local Object Library under Tables.

Task overview: Extracting SAP application data [page 161]

Related Information

SAP applications [page 162]


Defining an SAP application datastore [page 163]
Repopulate the customer dimension table [page 165]
Repopulating the material dimension table [page 173]
Repopulating the Sales Fact table [page 179]
Summary [page 188]

Tutorial
164 PUBLIC Extracting SAP application data
13.4 Repopulate the customer dimension table

Repopulate the customer dimension table by configuring a data flow that outputs SAP application data to a
datastore table.

Configure a Data Services job that includes a work flow and an ABAP data flow. The ABAP data flow extracts
SAP data and loads it into the customer dimension table.

This exercise differs from previous exercises in the following ways:

● You access data through a remote server.


● You communicate with the data source using ABAP code.

To configure the Data Services job so that it communicates with the SAP application, configure an ABAP data
flow. The ABAP data flow contains Data Services supplied commands so you do not need to know ABAP.

For more information about configuring an ABAP data flow, see the Supplement for SAP.

1. Adding the SAP_CustDim job, work flow, and data flow [page 166]
The job for repopulating the customer dimension table includes a work flow and a data flow.
2. Adding ABAP data flow to Customer Dimension job [page 166]
Add the ABAP data flow to JOB_SAP_CustDim and set options in the ABAP data flow.
3. Defining the DF_SAP_CustDim ABAP data flow [page 167]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_CustDim job [page 171]
Validate and then execute the JOB_SAP_CustDim job.
5. ABAP job execution errors [page 172]
There are some common ABAP job execution errors that have solutions.

Parent topic: Extracting SAP application data [page 161]

Related Information

SAP applications [page 162]


Defining an SAP application datastore [page 163]
Importing metadata [page 163]
Repopulating the material dimension table [page 173]
Repopulating the Sales Fact table [page 179]
Summary [page 188]

Tutorial
Extracting SAP application data PUBLIC 165
13.4.1 Adding the SAP_CustDim job, work flow, and data flow

The job for repopulating the customer dimension table includes a work flow and a data flow.

1. Open the Class_Exercises project so that it displays in the Project Area.


2. Create a new batch job and name it JOB_SAP_CustDim.
3. Create a new work flow and name it WF_SAP_CustDim.
4. Create a new data flow and name it DF_SAP_CustDim.
5. Save your work.

Task overview: Repopulate the customer dimension table [page 165]

Next task: Adding ABAP data flow to Customer Dimension job [page 166]

13.4.1.1 A data flow within a data flow

The SAP_CustDim data flow needs an ABAP dataflow to extract SAP application data.

The ABAP data flow interacts directly with the SAP application database layer. Because the database layer is
complex, Data Services accesses it using ABAP code.

Data Services executes the SAP_CustDim batch job in the following way:

● Data Services generates the ABAP code.


● Data Services connects to the SAP application server via remote function call (RFC).
● The ABAP code executes on the SAP application server.
● The SAP application generates the ABAP program results and communicates the results to Data Services.
● Data Services loads the target data cache.

Learn more about this process in the Supplement for SAP.

13.4.2 Adding ABAP data flow to Customer Dimension job

Add the ABAP data flow to JOB_SAP_CustDim and set options in the ABAP data flow.

1. Open the DF_SAP_CustDim data flow in the workspace.

2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.

The Properties window of the ABAP data flow opens.


3. Complete the fields in the Options tab as described in the following table:

Tutorial
166 PUBLIC Extracting SAP application data
Option Action

Datastore Select SAP_DS from the dropdown list

Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.

ABAP program name Specify the name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following name requirements:
○ Begins with the letter Y or Z
○ Cannot exceed 8 characters

Job name Type SAP_CustDim. The name is for the job that runs in
the SAP application.

4. Open the General tab and name the data flow DF_SAP_CustDim.
5. Click OK.

6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the CUST_DIM table on to the workspace using drag and drop.

Place the table to the right of the DF_SAP_CustDim object.


8. Select Make Target.
9. Save your work.

Task overview: Repopulate the customer dimension table [page 165]

Previous task: Adding the SAP_CustDim job, work flow, and data flow [page 166]

Next task: Defining the DF_SAP_CustDim ABAP data flow [page 167]

13.4.3 Defining the DF_SAP_CustDim ABAP data flow

Define the ABAP data flow so that it communicates the job tasks to the SAP application.

Perform the following group of tasks to define the ABAP data flow:

1. Designate a source table.


2. Define a query to specify the data to extract.
3. Define the data transport object into which the SAP application writes the resulting data set.
4. Set the order of execution in the data flow.

1. Adding objects to DF_SAP_CustDim ABAP data flow [page 168]


Add the necessary objects to complete the DF_SAP_CustDim ABAP data flow.

Tutorial
Extracting SAP application data PUBLIC 167
2. Defining the query [page 169]
Complete the output schema in the query to define the data to extract from the SAP application.
3. Defining the details of the data transport [page 170]
A data transport defines a staging file for the data that is extracted from the SAP application.
4. Setting the execution order [page 171]
Set the order of execution by joining the objects in the data flow.

Task overview: Repopulate the customer dimension table [page 165]

Previous task: Adding ABAP data flow to Customer Dimension job [page 166]

Next task: Executing the JOB_SAP_CustDim job [page 171]

13.4.3.1 Adding objects to DF_SAP_CustDim ABAP data flow

Add the necessary objects to complete the DF_SAP_CustDim ABAP data flow.

1. Open DF_SAP_CustDim data flow in the workspace.

2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the KNA1 table to the left side of the workspace using drag and drop.
4. Select Make Source.

5. Add a query from the tool pallet to the right of the KNA1 table in the workspace.

6. Add a data transport from the tool pallet to the right of the query in the workspace.
7. Connect the icons in the data flow to indicate the flow of data as shown.

Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]

Next task: Defining the query [page 169]

Tutorial
168 PUBLIC Extracting SAP application data
13.4.3.2 Defining the query

Complete the output schema in the query to define the data to extract from the SAP application.

1. Open the query In the workspace to open the Query Editor dialog box.
2. Expand the KNA1 table in the Schema In pane to see the columns.
3. Click the column head (above the table name) to sort the list in alphabetical order.
4. Map the following seven source columns to the target schema. Use Ctrl + Click to select multiple
columns and drag them to the output schema.

KUKLA
KUNNR
NAME1
ORT01
PSTILZ
REGIO
STRAS

The icon next to the source column changes to an arrow to indicate that the column has been mapped. The
Mapping tab in the lower pane of the Query Editor shows the mapping relationships.
5. Rename the target columns and verify or change the data types and descriptions using the information in
the following table. To change these settings, right-click the column name and select Properties from the
dropdown list.

 Note

Microsoft SQL Server and Sybase ASE DBMSs require that you specify the columns in the order shown
in the following table and not alphabetically.

Original name New name Data type Description

KUNNR Cust_ID varchar(10) Customer number

KUKLA Cust_classf varchar(2) Customer classification

NAME1 Name1 varchar(35) Customer name

STRAS Address varchar(35) Address

ORT01 City varchar(35) City

REGIO Region_ID varchar(3) Region

PSTLZ Zip varchar(10) Postal code

Tutorial
Extracting SAP application data PUBLIC 169
6. Click the Back arrow icon in the icon toolbar to return to the data flow and to close the Query Editor.
7. Save your work.

Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]

Previous task: Adding objects to DF_SAP_CustDim ABAP data flow [page 168]

Next task: Defining the details of the data transport [page 170]

13.4.3.3 Defining the details of the data transport

A data transport defines a staging file for the data that is extracted from the SAP application.

1. Open the DF_SAP_CustDim ABAP data flow In the workspace.


2. Double-click the data transport object to open the ABAP Data File Option Editor.
3. Type cust_dim.dat in File Name.
This file stores the data set produced by the ABAP data flow. The full path name for this file is the path of
the SAP Data Services shared directory concatenated with the file name that you just entered.

Tutorial
170 PUBLIC Extracting SAP application data
4. Select Replace File.
Replace File truncates this file each time the data flow is executed.
5. Click the Back icon in the icon toolbar to return to the data flow.
6. Save your work.

Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]

Previous task: Defining the query [page 169]

Next task: Setting the execution order [page 171]

13.4.3.4 Setting the execution order

Set the order of execution by joining the objects in the data flow.

1. Open the DF_SAP_CustDim data flow in the workspace.

The data flow contains the ABAP data flow and the target table named Cust_Dim.
2. Connect the ABAP data flow to the target table.
3. Save your work.

Task overview: Defining the DF_SAP_CustDim ABAP data flow [page 167]

Previous task: Defining the details of the data transport [page 170]

Related Information

Executing the JOB_SAP_CustDim job [page 171]

13.4.4 Executing the JOB_SAP_CustDim job

Validate and then execute the JOB_SAP_CustDim job.

1. With the job selected in the Project Area, click the Validate All icon on the icon toolbar.

If your design contains errors, a message appears describing the error. The software requires that you
resolve the error before you can proceed.

If the job has warning message, you can continue. Warnings do not prohibit job execution.

If your design does not have errors, the following message appears:

Validate: No errors found

Tutorial
Extracting SAP application data PUBLIC 171
2. Right-click the job name in the project area and click Execute.

If you have not saved your work, a save dialog box appears. Save your work and continue. The Execution
Properties dialog box opens.
3. Leave the default selections and click OK.

After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the cust_dim table in your DBMS.

Task overview: Repopulate the customer dimension table [page 165]

Previous task: Defining the DF_SAP_CustDim ABAP data flow [page 167]

Next: ABAP job execution errors [page 172]

13.4.5 ABAP job execution errors

There are some common ABAP job execution errors that have solutions.

The following table lists a few common ABAP job execution errors. probable causes, and how to fix them.

Error Probable cause Solution

Cannot open Lack of permissions for Job Server 1. Open the Services Control Panel
ABAP output file service account. 2. Double-click the Data Services service and select a user
account that has permissions to the working folder on
the SAP server

Cannot create Working directory on SAP server speci­ Open the Datastores tab in the Local Object Library and fol­
ABAP output file fied incorrectly. low these steps:

1. Right-click the SAP_DS datastore and click Edit.


2. Review the information in Working Directory on SAP
Server and make changes if necessary.

3. Verify the new path by pasting it into the Start

Run dialog box and executing.

If the path is valid, a window to the working directory on the


SAP server opens.

If you have other ABAP errors, read about debugging and testing ABAP jobs in the Supplement for SAP.

Parent topic: Repopulate the customer dimension table [page 165]

Previous task: Executing the JOB_SAP_CustDim job [page 171]

Tutorial
172 PUBLIC Extracting SAP application data
13.5 Repopulating the material dimension table

Repopulate the material dimension table using the SAP_DS datastore.

For this exercise, you create a data flow that is similar to the dataflow that you created to repopulate the
customer dimension table. However, in this process, the data for the material dimension table is the result of a
join between two SAP application tables.

1. Adding the material dimension job, work flow, and data flow [page 173]
Create the material dimension job and add a work flow and a data flow.
2. Adding ABAP data flow to Material Dimension job [page 174]
Add the ABAP data flow to JOB_SAP_MtrlDim and set options in the ABAP data flow.
3. Defining the DF_SAP_MtrlDim ABAP data flow [page 175]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_MtrlDim job [page 178]
Validate and then execute the JOB_SAP_MtrlDim job.

Parent topic: Extracting SAP application data [page 161]

Related Information

SAP applications [page 162]


Defining an SAP application datastore [page 163]
Importing metadata [page 163]
Repopulate the customer dimension table [page 165]
Repopulating the Sales Fact table [page 179]
Summary [page 188]
A data flow within a data flow [page 166]

13.5.1 Adding the material dimension job, work flow, and data
flow

Create the material dimension job and add a work flow and a data flow.

Log on to Designer and open the Class Exercises project in the Project Area.

1. Create a new batch job and name it JOB_SAP_MtrlDim.


2. Open the job and add a work flow. Name the work flow WF_SAP_MtrlDim.
3. Open the work flow and add a data flow. Name the data flow DF_SAP_MtrlDim.
4. Save your work.

Tutorial
Extracting SAP application data PUBLIC 173
Task overview: Repopulating the material dimension table [page 173]

Next task: Adding ABAP data flow to Material Dimension job [page 174]

13.5.2 Adding ABAP data flow to Material Dimension job

Add the ABAP data flow to JOB_SAP_MtrlDim and set options in the ABAP data flow.

1. Open DF_SAP_MtrlDim in the workspace.

2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.

The Properties window of the ABAP data flow opens.


3. Complete the fields in the Options tab as described in the following table:

Option Action

Datastore Select SAP_DS from the dropdown list

Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.

ABAP program name Specify the name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following name requirements:
○ Begins with the letter Y or Z
○ Cannot exceed 8 characters

Job name Type SAP_MtrlDim. The name is for the job that runs in
the SAP application.

4. Open the General tab and name the data flow DF_SAP_MtrlDim.
5. Click OK.

6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the MTRL_DIM table to the workspace using drag and drop.

Place the table to the right of the DF_SAP_MtrlDim object.


8. Select Make Target.
9. Save your work.

Task overview: Repopulating the material dimension table [page 173]

Previous task: Adding the material dimension job, work flow, and data flow [page 173]

Next task: Defining the DF_SAP_MtrlDim ABAP data flow [page 175]

Tutorial
174 PUBLIC Extracting SAP application data
Related Information

Defining the DF_SAP_MtrlDim ABAP data flow [page 175]

13.5.3 Defining the DF_SAP_MtrlDim ABAP data flow

Define the ABAP data flow so that it communicates the job tasks to the SAP application.

Perform the following group of tasks to define the ABAP data flow:

1. Designate the source tables.


2. Define a query to join the source tables and to specify the data to extract.
3. Define the data transport object into which the SAP application writes the resulting data set.
4. Set the order of execution in the data flow.

Adding objects to the DF_SAP_MtrlDim ABAP data flow [page 176]


Add the necessary objects to complete the DF_SAP_MtrlDim ABAP data flow.

Defining the query with a join between source tables [page 176]
Set up a join between the two source tables and complete the output schema to define the data to
extract from the SAP application

Defining data details of the data transport [page 178]


A data transport defines a staging file for the data that is extracted from the SAP application.

Setting the execution order [page 178]


Set the order of execution by joining the objects in the data flow.

Task overview: Repopulating the material dimension table [page 173]

Previous task: Adding ABAP data flow to Material Dimension job [page 174]

Next task: Executing the JOB_SAP_MtrlDim job [page 178]

Related Information

Adding objects to the DF_SAP_MtrlDim ABAP data flow [page 176]

Tutorial
Extracting SAP application data PUBLIC 175
13.5.3.1 Adding objects to the DF_SAP_MtrlDim ABAP data
flow

Add the necessary objects to complete the DF_SAP_MtrlDim ABAP data flow.

1. Open the DF_SAP_MtrlDim data flow in the workspace.

2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the MARA table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Move the MAKT table to the workspace using drag and drop. Position it under the MARA table.
6. Select Make Source.

7. Add a query from the tool pallet to the right of the table in the workspace.

8. Add a data transport from the tool pallet to the right of the query in the workspace.
9. Connect the icons in the data flow to indicate the flow of data as shown.

10. Save your work.

Related Information

Defining the query [page 169]

13.5.3.2 Defining the query with a join between source tables

Set up a join between the two source tables and complete the output schema to define the data to extract from
the SAP application

1. Double-click the query in the workspace to open the Query Editor dialog box.

Tutorial
176 PUBLIC Extracting SAP application data
2. Open the FROM tab in the lower pane.
3. In the Join pairs group, select MARA from the Left dropdown list.
4. Select MAKT from the Right dropdown list.
The source rows must meet the requirements of the condition to be passed to the target, including the join
relationship between sources. The MARA and MAKT tables are related by a common column named
MATNR. The MATNR column contains the material number and is the primary key between the two tables.

The resulting relationship appears in the From clause text box:

(MARA.MATNR = MAKT.MATNR)

5. Click the Smart Editor icon.


6. Type the following command in the Smart Editor. Use all uppercase:

AND (SPRAS = 'E')

This command filters the material descriptions by language. Only the records with the material
descriptions in English are output to the target.
7. Click OK to close the Smart Editor.
8. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop.

Table Column

MARA MATNR
MTART
MBRSH
MATKL

MAKT MAKTX

9. Rename the target columns, verify data types, and add descriptions based on the information in the
following table.

Column name Rename Data type Description

MATNR Mtrl_id varchar(18) Material number

MTART Mtrl_typ varchar(4) Material type

MBRSH Ind_sector varchar(1) Industry sector

MATKL Mtrl_grp varchar(9) Material group

MAKTX Descr varchar(60) Material description

10. Click the Back arrow in the icon toolbar to return to the data flow.
11. Save your work.

Related Information

Defining data details of the data transport [page 178]

Tutorial
Extracting SAP application data PUBLIC 177
13.5.3.3 Defining data details of the data transport

A data transport defines a staging file for the data that is extracted from the SAP application.

1. Open the DF_SAP_MtrlDim ABAP data flow in the workspace


2. Double-click the the data transport object to open the ABAP Data File Option Editor.
3. Type mtrl_dim.dat in File Name .
This file stores the data set produced by the ABAP data flow. The full path name for this file is the path of
the SAP Data Services shared directory concatenated with the file name that you just entered.
4. Select Replace File.
Replace File truncates this file each time the data flow is executed.
5. Click the Back icon in the icon toolbar to return to the data flow.
6. Save your work.

13.5.3.4 Setting the execution order

Set the order of execution by joining the objects in the data flow.

1. Open the DF_SAP_MtrlDim data flow in the workspace.

The data flow contains the ABAP data flow and the target table named Mtrl_Dim.
2. Connect the ABAP data flow to the target table.
3. Save your work.

Related Information

Executing the JOB_SAP_MtrlDim job [page 178]

13.5.4 Executing the JOB_SAP_MtrlDim job

Validate and then execute the JOB_SAP_MtrlDim job.

1. With JOB_SAP_MtrlDim selected in the Project Area, click the Validate All icon on the icon toolbar.

If your design contains errors, a message appears describing the error, which requires solving before you
can proceed.

If your design contains warnings, a warning message appears. Warnings do not prohibit job execution.

If your design does not have errors, the following message appears:

Validate: No errors found

2. Right-click the job name in the Project Area and click the Execute icon in the toolbar.

Tutorial
178 PUBLIC Extracting SAP application data
If you have not saved your work, a save dialog box appears. Save your work and continue.

The Execution Properties dialog box opens.


3. Leave the default selections and click OK.

After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the Mtrl_Dim table in your DBMS.

Task overview: Repopulating the material dimension table [page 173]

Previous task: Defining the DF_SAP_MtrlDim ABAP data flow [page 175]

Related Information

ABAP job execution errors [page 172]

13.6 Repopulating the Sales Fact table

Repopulate the Sales Fact table from two SAP application sources.

This task extracts data from two source tables, and it extracts a single column from a third table using a lookup
function.

1. Adding the Sales Fact job, work flow, and data flow [page 180]
Create the Sales Fact job and add work flow and a data flow objects.
2. Adding ABAP data flow to Sales Fact job [page 180]
Add the ABAP data flow to JOB_SAP_SalesFact and set options in the ABAP data flow.
3. Defining the DF_ABAP_SalesFact ABAP data flow [page 181]
Define the ABAP data flow so that it communicates the job tasks to the SAP application.
4. Executing the JOB_SAP_SalesFact job [page 187]
Validate and then execute the JOB_SAP_SalesFact job.

Parent topic: Extracting SAP application data [page 161]

Related Information

SAP applications [page 162]


Defining an SAP application datastore [page 163]
Importing metadata [page 163]
Repopulate the customer dimension table [page 165]

Tutorial
Extracting SAP application data PUBLIC 179
Repopulating the material dimension table [page 173]
Summary [page 188]
Adding the Sales Fact job, work flow, and data flow [page 180]

13.6.1 Adding the Sales Fact job, work flow, and data flow

Create the Sales Fact job and add work flow and a data flow objects.

Log on to Designer and open the Class Exercises project in the Project Area.

1. Create a new batch job and name it JOB_SAP_SalesFact.


2. Open the job and add a work flow. Name the work flow WF_SAP_SalesFact.
3. Open the work flow and add a data flow. Name the data flow DF_SAP_SalesFact.
4. Save your work.

Task overview: Repopulating the Sales Fact table [page 179]

Next task: Adding ABAP data flow to Sales Fact job [page 180]

13.6.2 Adding ABAP data flow to Sales Fact job

Add the ABAP data flow to JOB_SAP_SalesFact and set options in the ABAP data flow.

1. Open the DF_SAP_SalesFact dataflow in the workspace.

2. Click the ABAP data flow icon from the tool palette and click in the workspace to add it to the data flow.

The Properties window of the ABAP data flow opens.


3. Complete the fields in the Options tab as described in the following table:

Option Action

Datastore Select SAP_DS from the dropdown list

Generated ABAP file name Specify a file name for the generated ABAP code. The
software stores the file in the ABAP directory that you
specified in the SAP_DS datastore.

ABAP program name Specify a name for the ABAP program that the Data
Services job uploads to the SAP application. Adhere to the
following naming requirements:
○ Begins with the letter Y or Z
○ Cannot exceed 8 characters

Tutorial
180 PUBLIC Extracting SAP application data
Option Action

Job name Type SAP_SalesFact. The name is for the job that runs
in the SAP application.

4. Open the General tab enter DF_ABAP_SalesFact for the ABAP data flow.
5. Click OK.

6. Open the Datastores tab in the Local Object Library and expand Target_DS Tables .
7. Move the SALES_FACT table to the workspace using drag and drop.

Place the table to the right of the DF_ABAP_SalesFact object.


8. Select Make Target.
9. Save your work.

Task overview: Repopulating the Sales Fact table [page 179]

Previous task: Adding the Sales Fact job, work flow, and data flow [page 180]

Next task: Defining the DF_ABAP_SalesFact ABAP data flow [page 181]

13.6.3 Defining the DF_ABAP_SalesFact ABAP data flow

Define the ABAP data flow so that it communicates the job tasks to the SAP application.

Perform the following group of tasks to define the ABAP data flow:

1. Designate the source tables.


2. Define the query to join two tables and to specify the data to extract.
3. Define a look up function to extract column data from a different source table.
4. Define the data transport object.
5. Set the order of execution.

Adding objects to the DF_ABAP_SalesFact ABAP data flow [page 182]


Add the necessary objects to complete the DF_ABAP_SalesFact ABAP data flow.

Defining the query with a join between source tables [page 183]
Set up a join between the two source tables and complete the output schema to define the data to
extract from the SAP application

Defining the lookup function to add output column with a value from another table [page 184]
Use a lookup function to extract data from a table that is not defined in the job.

Defining the details of the data transport [page 187]


A data transport defines a staging file for the data that is extracted from the SAP application.

Setting the execution order [page 187]


Set the order of execution by joining the objects in the data flow.

Tutorial
Extracting SAP application data PUBLIC 181
Task overview: Repopulating the Sales Fact table [page 179]

Previous task: Adding ABAP data flow to Sales Fact job [page 180]

Next task: Executing the JOB_SAP_SalesFact job [page 187]

Related Information

Adding objects to the DF_ABAP_SalesFact ABAP data flow [page 182]

13.6.3.1 Adding objects to the DF_ABAP_SalesFact ABAP


data flow

Add the necessary objects to complete the DF_ABAP_SalesFact ABAP data flow.

1. Open the DF_ABAP_SalesFact ABAP data flow in the workspace.

2. Open the Datastores tab in the Local Object Library and expand SAP_DS Tables .
3. Move the VBAP table to the left side of the workspace using drag and drop.
4. Select Make Source.
5. Move the VBAK table to the workspace using drag and drop. Place it under the VBAP table.
6. Select Make Source.

7. Add a query from the tool palette to the right of the tables in the workspace.

8. Add a data transport from the tool palette to the right of the query in the workspace.
9. Connect the icons in the dataflow to indicate the flow of data as shown.

Tutorial
182 PUBLIC Extracting SAP application data
10. Save your work.

Related Information

Defining the query with a join between source tables [page 183]

13.6.3.2 Defining the query with a join between source tables

Set up a join between the two source tables and complete the output schema to define the data to extract from
the SAP application

Open the DF_ABAP_SalesFact ABAP data flow in the workspace.

1. Double-click the query to open the Query Editor.


2. Open the FROM tab in the lower pane.
3. In the Join pairs group, select VBAP from the Left dropdown list.
4. Select VBAK from the Right dropdown list.
The VBAP and VBAK tables are related by a common column named VBELN. The VBELN column contains
the sales document number and is the primary key between the two tables.
The Propose Join option specifies a relationship based on the primary keys.

The resulting relationship appears in the From clause text box:

VBAP.VBELN = VBAK.VBELN

5. Click the Smart Editor icon.


6. Type the following command using all uppercase, as shown.

AND ((AUDAT >= '19970101') AND (AUDAT <= '19971231'))

This statement filters the sales orders by date and brings the sales orders from one year into the target
table.
7. Click OK.
8. In the Schema In and Schema Out panes, map the following source columns to output columns using drag
and drop:

Table Column

VBAP VBELN
POSNR
MATNR
NETWR

Tutorial
Extracting SAP application data PUBLIC 183
Table Column

VBAK KVGR1
AUDAT

9. Rename the target columns, verify data types, and add descriptions as shown in the following table:

Original name New name Data type Description

VBELN SLS_doc_no varchar(10) Sales document

POSNR SLS_doc_line_no varchar(6) Sales document line item

MATNR Material_no varchar(18) Material number

NETWR Net_value varchar(15) Order item value

KVGR1 Cust_ID varchar(3) Customer ID

AUDAT SLS_doc_date date Document date

13.6.3.3 Defining the lookup function to add output column


with a value from another table

Use a lookup function to extract data from a table that is not defined in the job.

Open the DF_ABAP_SalesFact data flow in the workspace.

1. Double-click the query to open the Query Editor.


2. In the Schema Out pane, right-click the target schema name and select New Output Column from the
dropdown menu.

The Column Properties dialog box opens.


3. Complete the options as described in the following table:

Option Value

Name ord_status

Data type varchar

Length 1

Description Order item status

Tutorial
184 PUBLIC Extracting SAP application data
 Note

Leave Content type blank.

4. Click OK.

The ord_status column appears in the output schema list.


5. Click the ord_status column in the Schema Out pane.
6. In the lower pane, click Functions on the Mapping tab.

The Select Function dialog box opens.


7. Click Lookup_Function from Function Categories.
8. Click lookup from the Function Name pane.
9. Click Next.

The Define Input Parameter(s) dialog box opens.


10. Complete the Lookup function using the values in the following table.

 Restriction

The LOOKUP function is case sensitive. Enter the values using the case as listed in the following table.
Type the entries in the text boxes instead of using the dropdown arrow or the Browse button.

Option Value Description

Lookup table SAP_DS_VBUP The table in which to look up values.

Result column GBSTA The column from the VBUP table that contains the value for
the target column ord_status.

Tutorial
Extracting SAP application data PUBLIC 185
Option Value Description

Default value 'none' The value used if the lookup isn't successful. Use single quotes
as shown.

Cache spec 'NO_CACHE' Specifies whether to cache the table. Use single quotes as
shown.

Compare column VBELN The document number in the lookup table.

Expression VBAK.VBELN The document number in the input (source) schema.

 Note

The value for the ord_status column comes from the GBSTA column in the VBUP table. The value in
the GBSTA column indicates the status of a specific item in the sales document. The software needs
both an order number and an item number to determine the correct value to extract from the table.

The function editor provides fields for only one dependency, which you defined using the values from
the table.

11. Click Finish.


The Lookup expression displays in the Mapping text box as follows:

lookup(SAP_DS..VBUP, GBSTA, 'none', 'NO_CACHE', VBELN, VBAK.VBELN)

12. Add lookup values to the mapping expression.

The Lookup function can process any number of comparison value pairs. To include the dependency on the
item number to the Lookup expression, add the item number column from the translation table and the
item number column from the input (source) schema as follows:

POSNR, VBAP.POSNR

The final lookup function looks as follows:

lookup(SAP_DS..VBUP, GBSTA, 'none', 'NO_CACHE', VBELN, VBAK.VBELN, POSNR,


VBAP.POSNR)

13. Click the Back arrow in the icon toolbar to close the Query Editor.
14. Save your work.

Related Information

Defining the details of the data transport [page 187]

Tutorial
186 PUBLIC Extracting SAP application data
13.6.3.4 Defining the details of the data transport

A data transport defines a staging file for the data that is extracted from the SAP application.

1. Open the DF_ABAP_SalesFact data flow in the workspace.


2. Double-click the data transport object to open the ABAP Data File Option Editor.
3. Type sales_fact.dat in File name.
4. Select Replace File.
Replace File truncates this file each time the data flow is executed.
5. Click the Back icon in the icon toolbar to return to the data flow.
6. Save your work.

Related Information

Setting the execution order [page 187]

13.6.3.5 Setting the execution order

Set the order of execution by joining the objects in the data flow.

1. Open the DF_SAP_SalesFact data flow in the workspace.


2. Connect the ABAP data flow to the target table SALES_FACT.
3. Save your work.

Related Information

Executing the JOB_SAP_SalesFact job [page 187]

13.6.4 Executing the JOB_SAP_SalesFact job

Validate and then execute the JOB_SAP_SalesFact job.

1. With JOB_SAP_SalesFact selected in the Project Area, click the Validate All icon in the toolbar.

If your design contains errors, a message appears describing the error, which requires solving before you
can proceed.

If your design contains warnings, a warning message appears. Warnings do not prohibit job execution.

Tutorial
Extracting SAP application data PUBLIC 187
If your design does not have errors, the following message appears:

Validate: No errors found

2. Right-click JOB_SAP_SalesFact in the Project Area and click the Execute icon in the toolbar.

If you have not saved your work, a save dialog box appears. Save your work and continue.

The Execution Properties dialog box opens.


3. Leave the default selections and click OK.

After the job completes, check the Output window for any error or warning messages.
4. Use a query tool to check the contents of the Sales_Fact table in your DBMS.

Task overview: Repopulating the Sales Fact table [page 179]

Previous task: Defining the DF_ABAP_SalesFact ABAP data flow [page 181]

Related Information

ABAP job execution errors [page 172]

13.7 Summary

This section showed you how to use SAP applications as a source by creating a datastore that contains
connection information to a remote server.

In this section you used some advanced features in the software to work with SAP application data:

● Use ABAP code in an ABAP data flow to define the data to extract from SAP applications.
● Use a data transport object to carry data from the SAP application into Data services.
● Use a lookup function and additional lookup values in the mapping expression to obtain data from a source
not included in the job.

For more information about using SAP application data in Data Services, see the Supplement for SAP.

Parent topic: Extracting SAP application data [page 161]

Related Information

SAP applications [page 162]


Defining an SAP application datastore [page 163]
Importing metadata [page 163]

Tutorial
188 PUBLIC Extracting SAP application data
Repopulate the customer dimension table [page 165]
Repopulating the material dimension table [page 173]
Repopulating the Sales Fact table [page 179]

Tutorial
Extracting SAP application data PUBLIC 189
14 Real-time jobs

In this section you execute a real-time job to see the basic functionality.

For real-time jobs, Data Services receives requests from ERP systems and Web applications and sends replies
immediately after receiving the requested data. Requested data comes from a data cache or a second
application. You define operations for processing on-demand messages by building real-time jobs in the
Designer.

Real-time jobs have the following characteristics.

● A single real-time data flow (RTDF) that runs until explicitly stopped
● Requests in XML message format and SAP applications using IDoc format

 Note

The tutorial exercise focuses on a simple XML-based example that you import.

● Requests in XML file format in test mode from a development environment.


● A listener that forwards XML requests to the appropriate real-time job or service

For more information about real-time jobs, see the Reference Guide.

1. Importing a real-time job [page 190]


Import a real time job into SAP Data Services.
2. Running a real time job in test mode [page 191]
Run a real time job that transforms an input string of Hello World to World Hello.

14.1 Importing a real-time job

Import a real time job into SAP Data Services.

1. Copy the following files from <LINK_DIR>\ConnectivityTest and paste them into your temporary
directory. For example, C:\temp:

○ TestOut.dtd
○ TestIn.dtd
○ TestIn.xml
○ ClientTest.txt
2. Copy the file ClientTest.exe from <LINK_DIR>\bin and paste it to your temporary directory.

 Note

ClientTest.exe uses DLLs in your <LINK_DIR>\bin directory. If you encounter problems, ensure
that you have included <LINK_DIR>\bin in the Windows environment variables path statement.

Tutorial
190 PUBLIC Real-time jobs
3. Log on to SAP Data Services Designer and the tutorial repository.

4. Right-click in a blank space in the Local Object Library and select Repository Import From File .

The Open Import File dialog box opens.


5. Go to <LINK_DIR>\ConnectivityTest, select Testconnectivity.atl, and click Open.
6. Click Yes to the prompt that warns you that you will overwrite existing objects.

The Import Plan window opens.


7. Accept the defaults and click Import.
8. Enter ds for Passphrase and click Import.

Task overview: Real-time jobs [page 190]

Next: Running a real time job in test mode [page 191]

14.2 Running a real time job in test mode

Run a real time job that transforms an input string of Hello World to World Hello.

Import the necessary files to create a real-time job.

Use the files that you imported previously to create a real-time job.

1. Click the Project menu option and select New Project .


2. Type TestConnectivity for Project Name and click Create.

The new project opens in the Project Area.


3. Open the Jobs tab and expand Real-time jobs.
4. Move the job named Job_TestConnectivity to the TestConnectivity project in the Project Area using drag
and drop.
5. Expand Job_TestConnectivity and click RT_TestConnectivity to open it in the workspace.

The workspace contains one XML message source named TestIn (XML request) and one XML message
target named TestOut (XML reply).

6. Double-click TestIn to open it. Verify that the Test file option in the Source tab is C:\temp\TestIn.XML.
7. In Windows Explorer, open Testin.XML in your temporary directory. For example, C:\temp\TestIn.XML.
Confirm that it contains the following message:

<test>

Tutorial
Real-time jobs PUBLIC 191
<Input_string>Hello World</Input_string>
</test>

8. Back in Designer, double-click TestOut in the workspace to open it. Verify that the Test file option in the
Target tab is C:\temp\TestOut.XML.
9. Execute the job Job_TestConnectivity
10. Click Yes to save all changes if applicable.
11. Accept the default settings in the Execution Properties dialog box and click OK.
12. When the job completes, open Windows Explorer and open C:\temp\TestOut.xml. Verify that the file
contains the following text:

<test>
<output_string>World Hello</output_string>
</test>

Task overview: Real-time jobs [page 190]

Previous task: Importing a real-time job [page 190]

Tutorial
192 PUBLIC Real-time jobs
Important Disclaimers and Legal Information

Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:

● Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:

● The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
● SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.

● Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering a SAP-hosted Web site. By using such
links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.

Beta and Other Experimental Features


Experimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by
SAP at any time for any reason without notice. Experimental features are not for productive use. You may not demonstrate, test, examine, evaluate or otherwise use
the experimental features in a live operating environment or with data that has not been sufficiently backed up.
The purpose of experimental features is to get feedback early on, allowing customers and partners to influence the future product accordingly. By providing your
feedback (e.g. in the SAP Community), you accept that intellectual property rights of the contributions or derivative works shall remain the exclusive property of SAP.

Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.

Gender-Related Language
We try not to use gender-specific word forms and formulations. As appropriate for context and readability, SAP may use masculine word forms to refer to all genders.

Videos Hosted on External Platforms


Some videos may point to third-party video hosting platforms. SAP cannot guarantee the future availability of videos stored on these platforms. Furthermore, any
advertisements or other content hosted on these platforms (for example, suggested videos or by navigating to other videos hosted on the same site), are not within
the control or responsibility of SAP.

Tutorial
Important Disclaimers and Legal Information PUBLIC 193
www.sap.com/contactsap

© 2020 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form


or for any purpose without the express permission of SAP SE or an SAP
affiliate company. The information contained herein may be changed
without prior notice.

Some software products marketed by SAP SE and its distributors


contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for


informational purposes only, without representation or warranty of any
kind, and SAP or its affiliated companies shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP or
SAP affiliate company products and services are those that are set forth
in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an
additional warranty.

SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.

Please see https://www.sap.com/about/legal/trademark.html for


additional trademark information and notices.

THE BEST RUN

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy