Sap Hana
Sap Hana
This supplement contains information about the big data products that SAP Data Services supports.
Find basic information in the Reference Guide, Designer Guide, and some of the applicable supplement guides.
For example, to learn about datastores and creating datastores, see the Reference Guide. To learn about Google
BigQuery, refer to the Supplement for Google BigQuery.
This documentation uses specific terminology, location variables, and environment variables that describe
various features, processes, and locations in SAP Data Services.
Terminology
• The terms Data Services system and SAP Data Services mean the same thing.
• The term BI platform refers to SAP BusinessObjects Business Intelligence platform.
• The term IPS refers to SAP BusinessObjects Information platform services.
Note
Data Services requires BI platform components. However, when you don't use other SAP applications,
IPS, a scaled back version of BI, also provides these components for Data Services.
• CMC refers to the Central Management Console provided by the BI or IPS platform.
• CMS refers to the Central Management Server provided by the BI or IPS platform.
Variables
The following table describes the location variables and environment variables that are necessary when you
install and configure Data Services and required components.
Default location:
Note
Default location:
Example
C:\Program Files
(x86)\SAP BusinessObjects\SAP
BusinessObjects Enterprise XI 4.0
Note
Note
Default location:
• All platforms
<INSTALL_DIR>\Data Services
Example
Default location:
Note
Example
C:\ProgramData\SAP
BusinessObjects\Data Services
Note
Example
Note
Default location:
Note
Note
Note
SAP Data Services supports many types of big data through various object types and file formats.
Apache Cassandra is an open-source data storage system that you can access with SAP Data Services.
Data Services natively supports Cassandra as an ODBC data source with a DSN connection. Cassandra uses
the generic ODBC driver. Use Cassandra on Windows or Linux operating systems.
Before you use Cassandra with Data Services, ensure that you perform the following setup tasks:
For Data Services on Windows platforms, use the generic ODBC driver.
For more information about configuring database connectivity for UNIX and Linux, see the Administrator Guide.
Use the Connection Manager to configure the ODBC driver for Apache Cassandra on Linux.
Before you complete the following steps, read the topic and subtopics under “Configure database connectivity
for UNIX and Linux” in the Administrator Guide.
Use the GTK+2 library to make a graphical user interface for the Connection Manager. Connection Manager is
a command-line utility. To use it with a UI, install the GTK+2 library. For more information about obtaining and
installing GTK+2, see https://www.gtk.org/ . The following steps are for the UI for Connection Manager.
1. Open a command prompt and set $ODBCINI to a file in which the Connection Manager defines the DSN.
Ensure that the file is readable and writable.
Sample Code
$ export ODBCINI=<dir-path>/odbc.ini
touch $ODBCINI
The Connection Manager uses the $ODBCINI file and other information that you enter for data sources, to
define the DSN for Cassandra.
Note
2. Start the Connection Manager user interface by entering the following command:
Sample Code
$ cd <LINK_DIR>/bin/
$ /DSConnectionManager.sh
Note
3. In Connection Manager, open the Data Sources tab, and click Add to display the list of database types.
The Configuration for... dialog box opens. It contains the absolute location of the odbc.ini file that you set
in the first step.
5. Provide values for additional connection properties for the Cassandra database type as applicable.
6. Provide the following properties:
• User name
• Password
Note
Data Services does not save these properties for other users.
If Data Services is installed on the same machine and in the same folder as the IPS or BI platform, restart
the following services:
• EIM Adaptive Process Service
• Data Services Job Service
If Data Services is not installed on the same machine and in the same folder as the IPS or BI platform,
restart the following service:
• Data Services Job Service
Related Information
Complete data source properties in the Connection Manager when you configure the ODBC driver for SAP Data
Services on Linux.
The Connection Manager configures the $ODBCINI file based on the property values that you enter in the Data
Sources tab. The following table lists the properties that are relevant for Apache Cassandra.
Depending on the value you choose for the certificate mode, Data Services may
require you to define some or all of the following options:
Related Information
Use SAP Data Services to connect to Apache Hadoop frameworks including Hadoop Distributive File Systems
(HDFS) and Hive.
Data Services supports Hadoop on both the Linux and Windows platform. For Windows support, Data Services
uses Hortonworks Data Platform (HDP) only. HDP allows data from many sources and formats. See the latest
Product Availability Matrix (PAM) on the SAP Support Portal for the supported versions of HDP.
For information about deploying Data Services on a Hadoop MapR cluster machine, see SAP Note 2404486 .
For information about accessing your Hadoop in the administered SAP Big Data Services, see the Supplement
for SAP Big Data Services.
For complete information about how Data Services supports Apache Hadoop, see the Supplement for Hadoop.
Connect to your HDFS data using an HDFS file format or an HDFS file location in SAP Data Services Designer.
Create an HDFS file format and file location with your HDFS connection information, such as the account
name, password, and security protocol. Data Services uses this information to access HDFS data during Data
Services processing.
For complete information about how Data Services supports your HDFS, see the Supplement for Hadoop.
Related Information
3.2.2 Hadoop Hive
Use a Hive adapter datastore or a Hive database datastore in SAP Data Services Designer to connect to the
Hive remote server.
Use the Hive adapter datastore when Data Services is installed within the Hadoop cluster. Use the Hive
adapter datastore for server-named (DSN-less) connections. Also include SSL (or the newer Transport Layer
Security TLS) for secure communication over the network.
Use a Hive database datastore when Data Services is installed on a machine either within or outside of the
Hadoop cluster. Use the Hive database datastore for either a DSN or a DSN-less connection. Also include
SSL/TLS for secure communication over the network.
For complete information about how Data Services supports Hadoop Hive, see the Supplement for Hadoop.
Upload data processed with Data Services to your HDFS that is managed by SAP Big Data Services.
Big Data Services is a Hadoop distribution in the cloud. Big Data Services performs all Hadoop upgrades and
patches for you and provides Hadoop support. SAP Big Data Services was formerly known as Altiscale.
Upload your big data files directly from your computer to Big Data Services. Or, upload your big data files from
your computer to an established cloud account, and then to Big Data Services.
Example
Access data from S3 (Amazon Simple Storage Service) and use the data as a source in Data Services. Then
upload the data to your HDFS that resides in Big Data Service in the cloud.
How you choose to upload your data is based on your use case.
For complete information about accessing your Hadoop account in Big Data Services and uploading big data,
see the Supplement for SAP Big Data Services.
Related Information
To connect to an Apache Hadoop web interface running on Google Cloud Dataproc clusters, use a Hive
database datastore and a WebHDFS file location.
Use a Hive datastore to browse and view metadata from Hadoop and to import metadata for use in data flows.
To upload processed data, use a Hadoop file location and a Hive template table. Implement bulk loading in the
target editor in a data flow where you use the Hive template table as a target.
Related Information
3.3 HP Vertica
Access HP Vertica data by creating an HP Vertica database datastore in SAP Data Services Designer.
Use HP Vertica data as sources or targets in data flows. Implement SSL secure data transfer with MIT Kerberos
to access HP Vertica data securely. Additionally, configure options in the source or target table editors to
enhance HP Vertica performance.
SAP Data Services uses MIT Kerberos 5 authentication to securely access an HP Vertica database using SSL
protocol.
You must have Database Administrator permissions to install MIT Kerberos 5 on your Data Services client
machine. Additionally, the Database Administrator must establish a Kerberos Key Distribution Center (KDC)
server for authentication. The KDC server must support Kerberos 5 using the Generic Security Service (GSS)
API. The GSS API also supports non_MIT Kerberos implementations, such as Java and Windows clients.
Note
Specific Kerberos and HP Vertica database processes are required before you can enable SSL protocol in
Data Services. For complete explanations and processes for security and authentication, consult your HP
Vertica user documentation and the MIT Kerberos user documentation.
MIT Kerberos authorizes connections to the HP Vertica database using a ticket system. The ticket system
eliminates the need for users to enter a password.
Related Information
After you install MIT Kerberos, define the specific Kerberos properties in the Kerberos configuration or
initialization file.
After you define Kerberos properties, save the configuration or initialization file to your domain.
Example
See the MIT Kerberos documentation for information about completing the Unix krb5.conf property file or
the Windows krb5.ini property file.
The following table describes log file names and locations for the Kerberos log files.
Example
default = FILE:/var/log/
krb5libs.log
Example
kdc = FILE:/var/log/
krb5kdc.log
Example
admin_server
= FILE:/var/log/
kadmind.log
Example
default_realm = EXAMPLE.COM
ticket_lifetime = <value> Set number of hours for the initial ticket request.
Example
ticket_lifetime = 24h
renew_lifetime = <value> Set number of days a ticket can be renewed after the ticket
lifetime expiration.
Example
renew_lifetime = 7d
The default is 0.
Kerberos realm
EXAMPLE.COM = {kdc=<location>
admin_server=<location>
kpasswd_server=<location>}
• KDC location
• Admin Server location
• Kerberos Password Server location
Note
The following table describes the property for the Kerberos domain realm.
Property Description
<server_host_name>=<kerberos_realm> Maps the server host name to the Kerberos realm name. If
you use a domain name, prefix the name with a period (.).
Parent topic: Enable MIT Kerberos for HP Vertica SSL protocol [page 17]
Related Information
After you've updated the configuration or initialization file and saved it to the client domain, execute the kinit
command to generate a secure key.
Example
Enter the following command using your own information for the variables: kinit
<user_name>@<realm_name>
The following table describes the keys that the command generates.
Key Description
For complete information about using the kinit command to obtain tickets, see the MIT Kerberos Ticket
Management documentation.
Parent topic: Enable MIT Kerberos for HP Vertica SSL protocol [page 17]
Related Information
This procedure is for HP Vertica users who have database administrator permissions to perform these steps, or
who have been associated with an authentication method through a GRANT statement.
Note
DSN for HP Vertica is available in SAP Data Services version 4.2 SP7 Patch 1 (14.2.7.1) or later.
To create a DSN for HP Vertica with Kerberos SSL, perform the following steps:
Access the ODBC Data Source Administrator either from the Datastore Editor in Data Services Designer or
directly from your Start menu.
2. Open the System DSN tab and select Add.
3. Choose the applicable HP Vertica driver from the list and select Finish.
4. Open the Basic Settings tab and complete the options as described in the following table.
Option Value
Option Value
Kerberos Host Name Enter the name of the host computer where Kerberos is
installed.
When the connection test is successful, select OK and close the ODBC Data Source Administrator.
Now the HP Vertica DSN that you just created is included in the DSN option in the datastore editor.
Create the HP Vertica database datastore in Data Services Designer and select the DSN that you created.
Related Information
To enable SSL encryption for HP Vertica datastores, you must create a Data Source Name (DSN) connection.
Before you perform the following steps, an administrator must install MIT Kerberos 5, and enable Kerberos for
HP Vertica SSL protocol.
Additionally, an administrator must create an SSL DSN using the ODBC Data Source Administrator. For more
information about configuring SSL DSN with ODBC drivers, see Configure drivers with data source name
(DSN) connections in the Administrator Guide.
SSL encryption for HP Vertica is available in SAP Data Services version 4.2 Support Package 7 Patch 1
(14.2.7.1) or later. Enabling SSL encryption slows down job performance.
Note
An HP Vertica database datastore requires that you choose DSN as a connection method. DSN-less
connections aren't allowed for HP Vertica datastore with SSL encryption.
To create an HP Vertica datastore with SSL encryption, perform the following steps in Data Services Designer:
Choose the HP Vertica client version from the Database version list, and enter your user name and
password.
3. Select Use Data Source Name (DSN).
4. Choose the HP Vertica SSL DSN that you created from the Data Source Name list.
5. Complete the applicable Advanced options and save your datastore.
Related Information
SAP Data Services doesn't support bulk loading for HP Vertica, but there are settings you can make to increase
loading speed.
For complete details about connecting to HP Vertica, see Connecting to Vertica in the Vertica
documentation. Make sure to select the correct version.
When you load data to an HP Vertica target in a data flow, the software automatically executes an HP Vertica
statement that contains a COPY Local statement. This statement makes the ODBC driver read and stream
the data file from the client to the server.
1. when you configure the ODBC driver for HP Vertica, enable the option to use native connection load
balancing.
2. In Designer, open the applicable data flow.
3. In the workspace, double-click the HP Vertica datastore target object to open it.
4. Open the Options tab in the lower pane.
5. Increase the number of rows in the Rows per commit option.
Related Information
SAP Data Services converts incoming HP Vertica data types to native data types, and outgoing native data
types to HP Vertica data types.
The following table contains HP Vertica data types and the native data types to which Data Services converts
them.
Boolean Int
FLOAT Double
Money Decimal
Numeric Decimal
Number Decimal
Decimal Decimal
Char Varchar
Varchar Varchar
DATE Date
TIMESTAMP Datetime
TIMESTAMPTZ Varchar
Time Time
TIMETZ Varchar
INTERVAL Varchar
The following table contains native data types and the HP Vertica data types to which Data Services outputs
them. Data Services outputs the converted data types to HP Vertica template tables or Data_Transfer
transform tables.
Date Date
Datetime Timestamp
Decimal Decimal
Double Float
Int Int
Interval Float
Real Float
Time Time
Varchar Varchar
Timestamp Timestamp
Related Information
Configure options for an HP Vertica table as a source by opening the source editor in the data flow.
Option Description
Table name Specifies the table name for the source table.
Configure options for an HP Vertica table as a target by opening the target editor in the data flow.
Option Description
Column comparison Specifies how the software maps input columns to output
columns:
Number of loaders Specifies the number of loaders Data Services uses to load
data to the target.
Example
Option Description
Use overflow file Specifies whether Data Services uses a recovery file for rows
that it could not load.
File name Specifies the file name and file format for the overflow file.
Applicable only when you select Yes for Use overflow file.
File format
Enter a file name or specify a variable
The overflow file can include the data rejected and the oper
ation being performed (write_data) or the SQL command
used to produce the rejected operation (write_sql).
Option Description
Use input keys Specifies whether Data Services uses the primary keys from
the input table when the target table does not have a pri
mary key.
• Yes: Uses the primary keys from the input table when
the target table does not have primary keys.
• No: Does not use primary keys from the input table
when the target table does not have primary keys. No is
the default setting.
Update key columns Specifies whether Data Services updates key column values
when it loads data to the target table.
Auto correct load Specifies whether Data Services uses auto correct loading
when it loads data to the target table. Auto correct loading
ensures that Data Services does not duplicate the same row
in a target table. Auto correct load is useful for data recovery
operations.
Note
Ignore columns with value Specifies a value that might appear in a source column and
that you do not want updated in the target table during auto
correct loading.
Option Description
Include in transaction Specifies that this target table is included in the transaction
processed by a batch or real-time job.
Note
Related Information
To read data from MongoDB sources and load data to other SAP Data Services targets, create a MongoDB
adapter.
MongoDB is an open-source document database, which has JSON-like documents called BSON. MongoDB has
dynamic schemas instead of traditional schema-based data.
Data Services needs metadata to gain access to MongoDB data for task design and execution. Use Data
Services processes to generate schemas by converting each row of the BSON file into XML and converting XML
to XSD.
Data Services uses the converted metadata in XSD files to access MongoDB data.
To learn more about Data Services adapters, see the Supplement for Adapters.
3.4.1 MongoDB metadata
Use metadata from a MongoDB adapter datastore to create sources, targets, and templates in a data flow.
MongoDB represents its embedded documents and arrays as nested data in BSON files. SAP Data Services
converts MongoDB BSON files to XML and then to XSD. Data Services saves the XSD file to the following
location: <LINK_DIR>\ext\mongo\mcache.
Data Services has the following restrictions and limitations for working with MongoDB:
• In the MongoDB collection, the tag name can't contain special characters that are invalid for the XSD file.
Example
The following special characters are invalid for XSD files: >, <, &,/, \, #, and so on.
For more information about formatting XML documents, see the Nested Data section in the Designer Guide. For
more information about source and target objects, see the Data flows section of the Designer Guide.
Related Information
3.4.2 MongoDB as a source
Use MongoDB as a source in Data Services and flatten the nested schema by using the XML_Map transform.
The following examples illustrate how to use various objects to process MongoDB sources in data flows.
Example 1: Change the schema of a MongoDB source using the Query transform, and load output to an XML
target.
Specify conditions in the Query transform. Some conditions can be pushed down and others are processed
by Data Services.
Example 2: Set a dataflow where Data Services reads the schema and then loads the schema directly into an
XML template file.
Example 3: Flatten a schema using the XML_Map transform and then load the data to a table or flat file.
Note
Specify conditions in the XML_Map transform. Some conditions can be pushed down and others are
processed by Data Services.
Related Information
Use query criteria as a parameter of the db.<collection>.find() method. Add MongoDB query conditions
to a MongoDB table as a source in a data flow.
To add a MongoDB query format, enter a value next to the Query criteria parameter in the source editor
Adapter Source tab. Ensure that the query criteria is in MongoDB query format. For example, { type:
{ $in: [‘food’, ’snacks’] } }
Example
Given a value of {prize:100}, MongoDB returns only rows that have a field named “prize” with a value of
100. If you don’t specify the value 100, MongoDB returns all the rows.
Configure a Where condition so that Data Services pushes down the condition to MongoDB. Specify a Where
condition in a Query or XML_Map transform, and place the Query or XML_Map transform after the MongoDB
source object in the data flow. MongoDB returns only the rows that you want.
For more information about the MongoDB query format, consult the MongoDB Web site.
Note
If you use the XML_Map transform, it may have a query condition with a SQL format. Data Services
converts the SQL format to the MongoDB query format and uses the MongoDB specification to push down
operations to the source database. In addition, be aware that Data Services does not support push down of
query for nested arrays.
Related Information
SAP Data Services processes push down operators with a MongoDB source in specific ways based on the
circumstance.
Data Services supports the following operators when you use MongoDB as a source:
• Comparison operators: =, !=, >, >=, <, <=, like, and in.
• Logical operators: and and or in SQL query.
Related Information
3.4.3 MongoDB as a target
Configure options for MongoDB as a target in your data flow using the target editor.
SAP Data Services considers the <_id> field in MongoDB data as the primary key. If you create a new
MongoDB document and include a field named <_id>, Data Services recognizes that field as the unique
BSON ObjectID. If a MongoDB document contains more than one <_id> field at different levels, Data Services
considers only the <_id> field at the first level as the BSON Object Id.
The following table contains descriptions for options in the Adapter Target tab of the target editor.
Use auto correct Specifies the mode Data Services uses for MongoDB as a target datastore.
• True: Uses Upsert mode for the writing behavior. Updates the document with the same
<_id> field or it inserts a new <_id> field.
Note
• False: Uses Insert mode for writing behavior. If documents have the same <_id> field in
the MongoDB collection, then Data Services issues an error message.
Write concern level Specifies the MongoDB write concern level that Data Services uses for reporting the success
of a write operation. Enable or disable different levels of acknowledgement for writing opera
tions.
Use bulk Specifies whether Data Services executes writing operations in bulk. Bulk may provide better
performance.
• True: Runs write operation in bulk for a single collection to optimize the CRUD efficiency.
If the write operation in a bulk is more than 1000, MongoDB automatically splits into
multiple bulk groups.
• False: Does not run write operation in bulk.
For more information about bulk, ordered bulk, and bulk maximum rejects, see the Mon
goDB documentation at http://help.sap.com/disclaimer?site=http://docs.mongodb.org/man
ual/core/bulk-write-operations/.
Use ordered bulk Specifies the order in which Data Services executes write operations: Serial or Parallel.
Documents per commit Specifies the maximum number of documents that are loaded to a target before the software
saves the data.
• Blank: Uses the maximum of 1000 documents. Blank is the default setting.
• Enter any integer to specify a number other than 1000.
Bulk maximum rejects Specifies the maximum number of acceptable errors before Data Services fails the job.
Note
Data Services continues to load to the target MongoDB even when the job fails.
Enter an integer. Enter -1 so that Data Services ignores and does not log bulk loading errors.
If the number of actual errors is less than, or equal to the number you specify here, Data
Services allows the job to succeed and logs a summary of errors in the adapter instance trace
log.
Applicable only when you select True for Use ordered bulk.
Delete data before loading Deletes existing documents in the current collection before loading occurs. Retains all the
configuration, including indexes, validation rules, and so on.
Drop and re-create Specifies whether Data Services drops the existing MongoDB collection and creates a new one
with the same name before loading occurs.
• True: Drops the existing MongoDB collection and creates a new one with the same name
before loading. Ignores the value of Delete data before loading. True is the default setting.
• False: Does not drop the existing MongoDB collection and create a new one with the same
name before loading.
Use audit Specifies whether Data Services creates audit files that contain write operation information.
• True: Creates audit files that contain write operation information. Stores audit files
in the <DS_COMMON_DIR>/adapters/audits/ directory. The name of the file is
<MongoAdapter_instance_name>.txt.
• False: Does not create and store audit files.
Data Services behaves in the following way when a regular load fails:
• Use audit = False: Data Services logs loading errors in the job trace log.
• Use audit = True: Data Services logs loading errors in the job trace log and in the audit log.
Data Services behaves in the following way when a bulk load fails:
• Use audit = False: Data Services creates a job trace log that provides only a summary. It
does not contain details about each row of bad data. There is no way to obtain details
about bad data.
• Use audit = True: Data Services creates a job trace log that provides only a summary but
no details. However, the job trace log provides information about where to find details
about each row of bad data in the audit file.
Use template documents as a target in one data flow or as a source in multiple data flows.
Template documents are useful in early application development when you design and test a project. After you
import data for the MongoDB datastore, Data Services stores the template documents in the object library.
Find template documents in the Datastore tab of the object library.
When you import a template document, the software converts it to a regular document. You can use the regular
document as a target or source in your data flow.
Note
Template documents are available in Data Services 4.2.7 and later. If you upgrade from a previous version,
open an existing MongoDB datastore and then click OK to close it. Data Services updates the datastore so
that you see the Template Documents node and any other template document related options.
Template documents are similar to template tables. For information about template tables, see the Data
Services User Guide and the Reference Guide.
Related Information
Create MongoDB template documents as targets in data flows, then use the target as a source in a different
data flow.
To use a MongoDB template as the target or source in a data flow, first use the template as a target. To add a
MongoDB template as a target in a data flow, perform the following steps to create the target:
Note
Use the MongoDB collection namespace format: database.collection. Don’t exceed 120 bytes.
4. Select the related MongoDB datastore from the In datastore dropdown list.
5. Click OK.
6. To use the template document as a target in the data flow, connect the template document to the object
that comes before the template document.
Data Services automatically generates a schema based on the object directly before the template
document in the data flow.
Restriction
The field <_id> is the default primary key of the MongoDB collection. Therefore, make sure that you
correctly configure the <_id> field in the output schema of the object that comes directly before
the target template. If you don't include <_id> in the output schema, the following error appears
when you view the data: “An element named <_id> present in the XML data input does
not exist in the XML format used to set up this XML source in the data flow
<dataflow>. Validate your XML data.”
7. Click Save.
The template document icon in the data flow changes, and Data Services adds the template document to
the object library. Find the template document in the applicable database node under Templates.
Convert the template document into a regular document by selecting to import the template document in the
object library. Then you can use the template document as a source or a target document in other data flows.
SAP Data Services enables you to convert an imported template document into a regular document.
• Open a data flow and select one or more template target documents in the workspace. Right-click, and
choose Import Document.
• Select one or more template documents in the Local Object Library, right-click, and choose Import
Document.
The icon changes and the document appears under Documents instead of Template Documents in the object
library.
Note
The Drop and re-create target configuration option is available only for template target documents.
Therefore it is not available after you convert the template target into a regular document.
Related Information
Use the data preview feature in SAP Data Services Designer to view a sampling of data from a MongoDB
document.
• Expand an applicable MongoDB datastore in the object library. Right-click the MongoDB document and
select View Data from the dropdown menu.
• Right-click the MongoDB document in a data flow and select View Data from the dropdown menu.
• Click the magnifying glass icon in the lower corner of either a MongoDB source or target object in a data
flow.
By default, Data Services displays a maximum of 100 rows. Change this number by setting the Rows To
Scan option in the applicable MongoDB datastore editor. Entering -1 displays all rows.
For more information about viewing data, see the Designer Guide.
Related Information
3.4.6 Parallel Scan
SAP Data Services uses the MongoDB Parallel Scan process to improve performance while it generates
metadata for big data.
To generate metadata, Data Services first scans all documents in the MongoDB collection. This
scanning can be time consuming. However, when Data Services uses the Parallel Scan command
parallelCollectionScan, it uses multiple parallel cursors to read all the documents in a collection. Parallel
Scan can increase performance.
Note
Parallel Scan works with MongoDB server version 2.6.0 and above.
For more information about the parallelCollectionScan command, consult your MongoDB
documentation.
For more information about Mongo adapter datastore configuration options, see the Supplement for Adapters.
3.4.7 Reimport schemas
When you reimport documents from your MongoDB datastore, SAP Data Services uses the current datastore
settings.
Reimport a single MongoDB document by right-clicking the document and selecting Reimport from the
dropdown menu.
To reimport all documents, right-click an applicable MongoDB datastore or right-click on the Documents node
and select Reimport All from the dropdown menu.
Note
When you enable Use Cache, Data Services uses the cached schema.
When you disable Use Cache, Data Services looks in the sample directory for a sample BSON file with the
same name. If there is a matching file, the software uses the schema from the BSON file. If there isn't a
matching BSON file in the sample directory, the software reimports the schema from the database.
Related Information
SAP Data Services enables you to search for MongoDB documents in your repository from the object library.
1. Right-click in any tab in the object library and choose Search from the dropdown menu.
The Search dialog box opens.
2. Select the applicable MongoDB datastore name from the Look in dropdown menu.
The datastore is the one that contains the document for which you are searching.
3. Select Local Repository to search the entire repository.
4. Select Documents from the Object Type dropdown menu.
5. Enter the criteria for the search.
6. Click Search.
Data Services lists matching documents in the lower pane of the Search dialog box. A status line at the
bottom of the Search dialog box shows statistics such as total number of items found, amount of time to
search, and so on.
For more information about searching for objects, see the Objects section of the Designer Guide.
Related Information
Before you create an Apache Impala datastore, download the Cloudera ODBC driver and create a data source
name (DSN). Use the datastore to connect to Hadoop and import Impala metadata. Use the metadata as a
source or target in a data flow.
Before you work with Apache Impala, be aware of the following limitations:
For Linux users. Before you create an Impala database datastore, connect to Apache Impala using the Cloudera
OBDC driver.
Perform the following high-level steps to download a Cloudera ODBC driver and create a data source name
(DSN). For more in-depth information, consult the Cloudera documentation.
Select the driver that is compatible with your platform. For information about the correct driver versions,
see the SAP Product Availability Matrix (PAM).
3. Start DSConnectionManager.sh.
cd $LINK_DIR/bin/
$ ./DSConnectionManager.sh
Example
The following shows prompts and values in DS Connection Manager that includes Kerberos and SSL:
The ODBC ini file is <path to the odbc.ini file>
There are available DSN names in the file:
[DSN name 1]
[DSN name 2]
Specify the DSN name from the list or add a new one:
<New DSN file name>
Specify the User Name:
<Hadoop user name>
Type database password:(no echo)
*Type the Hadoop password. Password does not appear after you type it for security.
Retype database password:(no echo)
Specify the Host Name:
<host name/IP address>
Specify the Port:'21050'
<port number>
Specify the Database:
default
Specify the Unix ODBC Lib Path:
*The Unix ODBC Lib Path is based on where you install the driver.
*For example, /build/unixODBC-2.3.2/lib.
Related Information
To connect to your Hadoop files and access Impala data, create an ODBC datastore in SAP Data Services
Designer.
Before performing the following steps, enable Impala Services on your Hadoop server. Then download the
Cloudera driver for your platform.
Note
If you didn't create a DSN (data source name) in Windows ODBC Data Source application, you can create a
DSN in the following process.
To create an ODBC datastore for Apache Impala, perform the following steps in Designer:
A realm is a set of managed nodes that share the same Kerberos database.
c. Enter the fully qualified domain name (FQDN) of the Hive Server host in Host FQDN.
d. Enter the service principal name for the Hive server in Service Name.
e. Enable the Canonicalize Principal FQDN option.
This option canonicalizes the host FQDN in the server principal name.
12. Optional: To enable Secure Sockets Layer (SSL) protocol, perform the following substeps:
a. Choose No Authentication (SSL) from the Mechanism list.
b. Select Advanced Options.
c. Enter or browse to the Cloudera certificate file in Trusted Certificates.
The DSN appears in the dropdown list only after you've created it.
15. Select Advanced and complete the advanced options as necessary.
a. Optional: Set the Code page option to utf-8 in the Locale group to process multibyte data in Impala
tables.
b. Optional: In the ODBC Date Function Support group, set the Week option to No.
If you don’t set the Week option to No, the result of the Data Services built-in function
week_in_year() may be incorrect.
Related Information
To use your PostgreSQL tables as sources and targets in SAP Data Services, create a PostgreSQL datastore
and import your tables and other metadata.
Prerequisites
Before you configure the PostgreSQL datastore, perform the following prerequisites:
• Download and install the latest supported PostgreSQL Server version from the official PostgreSQL Web
site at https://www.postgresql.org/download/ . Check the Product Availability Matrix (PAM) on the
SAP Support Portal to ensure that you have the supported PostgreSQL version for your version of Data
Services.
• Obtain the ODBC driver that is compatible with your version of PostgreSQL. To avoid potential
processing problems, download the ODBC driver from the official PostgreSQL Web site at https://
www.postgresql.org/ftp/odbc/versions/ . Find option descritions for configuring the ODBC driver at
https://odbc.postgresql.org/docs/config.html .
When you create a DSN-less connection, you can optionally protect your connection with SSL/TLS and X509
single sign-on authentication when you configure the datastore.
Bulk loading
Configure the PostgreSQL datastore for bulk loading to the target PostgreSQL database. Before you can
configure the bulk loading options, obtain the PSQL tool from the official PostgreSQL website. In addition
to setting the bulk loader directory and the location for the PSQL tool in the datastore, complete the Bulk
Loader Options tab in the target editor. For complete information about the bulk loading in PostgreSQL and the
options, see the Performance Optimization Guide.
Note
Data Services supports bulk loading for PostgreSQL DSN-less connections only.
Data Services supports the basic pushdown functions for PostgreSQL. For a list of pushdown functions that
Data Services supports for PostgreSQL, see SAP Note 2212730 .
UTF-8 encoding
To process PostgreSQL tables as sources in data flows, Data Services requires that all data in PostgreSQL
tables use UTF-8 encoding. Additionally, Data Services outputs data to PostgreSQL target tables using UTF 8
encoding.
Data Services converts PostgreSQL data types to data types that it can process. After processing, Data
Services outputs data and converts the data types back to the corresponding PostgreSQL data types.
Complete options in the datastore editor to set the datastore type, database version, database access
information, and DSN information if applicable.
The following table contains descriptions for the first set of options, which define the datastore type (database)
and the PostgreSQL version information.
The following table describes the specific options to complete when you create a DSN connection.
Data Source Name Specifies the name of the DSN you create in the ODBC Data
Source Administrator.
User Name Specifies the user name to access the data source defined in
the DSN.
To create a server-name (DSN-less) connection for the datastore, complete the database-specific options
described in the following table.
Database server name Specifies the database server address. Enter localhost
or an IP address.
Database name Specifies the database name to which this datastore con
nects.
Port Specifies the port number that this datastore uses to access
the database.
User name Specifies the name of the user authorized to access the
database.
Enable Automatic Data Transfer Specifies that any data flow that uses the tables imported
with this datastore can use the Data_Transfer transform.
Data_Transfer uses transfer tables to push down certain op
erations to the database server for more efficient process
ing.
Use SSL encryption Specifies to use SSL/TLS encryption for the datastore con
nection to the database. SSL/TLS encryption is applicable
only for DSN-less connections.
Enabled only when you select Yes for Use SSL encryption.
Note
Bulk loader directory Specifies the directory where Data Services stores the files
related to bulk loading, such as log file, error file, and tempo
rary files.
Click the down arrow at the end of the field and select
<Browse> or select an existing global variable that you cre
ated for this location.
If you leave this field blank, Data Services writes the files to
%DS_COMMON_DIR%/log/bulkloader.
Note
PSQL full path Specifies the full path to the location of the PSQL tool.
Click the down arrow at the end of the field and select
<Browse> or select the global variable.
Note
The following options are in the Encryption Parameters dialog box when you select Yes for Use SSL encryption.
Verify Client Certificate When selected, specifies to have the client verify the server
certificate. Your selection for SSL Mode determines the
method:
Certificate Browse to and select the client certificate file. If you've cre
ated a substitution parameter for this file, select the substi
tution parameter from the list.
Certificate Key Browse to and select the client private key file. If you've
created a substitution parameter for this file, select the sub
stitution parameter from the list.
Certificate Key Password Specifies the password used to encrypt the client key file.
SSL CA File Browse to and select the SSL CA certificate file to verify the
server. If you've created a substitution parameter for this file,
select the substitution parameter from the list.
Note
For a list of properties required for each database type, see the Administrator Guide.
For information about bulk loading with PostgreSQL, see the Performance Optimization Guide.
Related Information
Configure the PostgreSQL ODBC driver for Windows or Linux to update the configuration file with the
applicable driver information.
Download and install the PostgreSQL client ODBC driver from the PostgresSQL Website at https://
www.postgresql.org/ftp/odbc/versions/ . For description of configuration options, see the PostgresSQL
Website at https://odbc.postgresql.org/docs/config.html .
For Windows, use the ODBC Drivers Selector to verify the ODBC driver is installed. For Linux, configure the
ODBC driver using the SAP Data Services Connection Manager.
Related Information
For Linux, configure the ODBC driver for SSL/TLS X509 authentication for connecting to your PostgresSQL
databases.
Before you perform the following steps, download and install the PostgreSQL client ODBC driver from the
PostgresSQL Website at https://www.postgresql.org/ftp/odbc/versions/ . For description of configuration
options, see the PostgresSQL Website at https://odbc.postgresql.org/docs/config.html .
For complete instructions to use the Connection Manager utility, and to implement the graphical user interface
for the utility, see the Server Management section of the Administrator Guide.
$ cd $LINK_DIR/bin
$ source al_env.sh
$ DSConnectionManager.sh
Example
Values shown as <value> and in bold indicate where you enter your own values or values related to
your system.
********************************
Configuration for PostgreSQL
********************************
The ODBC inst file is <.../odbcinst.ini>
Specify the Driver Version:<'version>'
Specify the Driver Name:
<driver_name>
Related Information
Use the PostgreSQL database datastore to access the schemas and tables in the defined database.
Open the datastore and view the metadata available to download. For PostgreSQL, download schemas and the
related tables. Each table resides under a specific schema. For example, each schema contains tables that use
the schema. A table name appears as <dbname>.<schema_name>.<table_name>.
For more information about viewing metadata, see the Datastore metadata section of the Designer Guide.
For more information about the imported metadata from database datastores, see the Datastores section of
the Designer Guide.
Related Information
Use PostgreSQL tables as sources and targets in data flows and use PostgreSQL table schemas for template
tables.
Drag the applicable PostgreSQL table onto your workspace and connect it to a data flow as a source or target.
Also, use a template table as a target in a data flow and save it to use as a future source in a different data flow.
See the Designer Guide to learn about using template tables. Additionally, see the Reference Guide for
descriptions of options to complete for source, target, and template tables.
Related Information
When you import metadata from a PostgreSQL table into the repository, SAP Data Services converts
PostgreSQL data types to Data Services native data types for processing.
After processing, Data Services converts data types back to PostgreSQL data types when it outputs the
generated data to the target.
The following table contains PostgreSQL data types and the corresponding Data Services data types.
Boolean/Integer/Smallint Int
Serial/Samllserial/Serial4/OID Int
Bigint/BigSerial/Serial8 Decimal(19,0)
Money double
Numeric/Decimal Decimal(28,6)
Bytea Blob
Char(n) Fixedchar(n)
Text/varchar(n) Varchar(n)
DATE Date
TIMESTAMP Datetime
TIMESTAMPTZ Varchar(127)
TIMETZ Varchar(127)
INTERVAL Varchar(127)
If Data Services encounters a column that has an unsupported data type, it does not import the column.
However, you can configure Data Services to import unsupported data types by checking the Import
unsupported data types as VARCHAR of size option in the datastore editor dialog box.
Note
When you import tables that have specific PostgreSQL native data types, Data Services saves the data type
as varchar or integer, and includes an attribute setting for Native Type. The following table contains the
column data type in which Data Services saves the PostgreSQL native data type, and the corresponding
attribute.
Related Information
Process your SAP HANA data in SAP Data Services by creating an SAP HANA database datastore.
Import SAP HANA metadata using a database datastore. Use the table metadata as sources and targets in
data flows. Some of the benefits of using SAP HANA in Data Services include the following:
• Protect your SAP HANA data during network transmission using SSL/TLS protocol and X.509
authentication with Cryptographic libraries.
• Create stored procedures and enable bulk loading for faster reading and loading.
• Load spatial and complex spatial data from Oracle to SAP HANA.
Note
Beginning with SAP HANA 2.0 SP1, access databases only through a multitenant database container
(MDC). If you use a version of SAP HANA that is earlier than 2.0 SP1, access only a single database.
When you create an SAP HANA database datastore with SSL/TLS encryption or X.509 authentication,
configure both server side and client side for the applicable authentication.
On the server side, the process of configuring the ODBC driver, SSL/TLS, and/or X.509 authentication
automatically sets the applicable settings in the communications section of the global.ini file.
SAP HANA uses the SAP CommonCrypto library for both SSL/TLS encryption and X.509 authentication. The
SAP HANA server installer installs the CommonCryptoLib (libsapcrypto.sar) to $DIR_EXECUTABLE by default.
Note
Note
Support for OpenSSL in SAP HANA is deprecated. If you are using OpenSSL, we recommend that you
migrate to CommonCryptoLib. For more information, see 2093286
Obtaining the SAP CommonCryptoLib file in Windows and Unix [page 60]
Related Information
The SAP CommonCryptoLib files are required for using SSL/TLS encryption and X.509 authentication in your
SAP HANA database datastores.
If you use the SAP HANA ODBC driver version 2.9 or higher, you don't have to perform the following steps to
obtain the SAP CommonCrypto library, because it's bundled with the driver.
To obtain the SAP CommonCryptoLib file, perform the following steps based on your platform:
1. For Windows:
a. Create a local folder to store the CommonCryptoLib files.
b. Download and install the applicable version of SAPCAR from the SAP download center.
The process to create a system variable on Windows varies based on your Windows version.
Example
To create a system variable for Windows 10 Enterprise, access Control Panel as an administrator
and open Systems. Search for System Variables and select Edit System Variables.
export SECUDIR=/PATH/<LOCAL_FOLDER>
export PATH=$SECUDIR:$PATH
f. Restart the Job Server.
3.7.2 X.509 authentication
X.509 authentication is a more secure method of accessing SAP HANA than user name and password
authentication.
Include X.509 authentication when you configure an SAP HANA datastore. When you use X.509
authentication, consider the following information:
Note
Support for X.509 authentication begins in SAP Data Services version 4.2 SP14 (14.2.14.16) and is
applicable for SAP HANA Server on premise 2.0 SP05 revision 56 and above, and SAP HANA Cloud. For
SAP HANA client, X.509 is applicable for SAP HANA ODBC client 2.7 and above.
Create and configure the server and client certificate files before you include X.509 authentication for
connecting to your SAP HANA data. For details, see the SAP Knowledge Base article 3126555 .
3.7.3 JWT authentication
SAP Data Services supports the JSON web token (JWT) single sign-on user authentication mechanism for SAP
HANA on-premise and cloud servers.
To use JWT user authentication for a SAP HANA datastore and a MDS/VDS request, edit the HANA datastore
and leave the User Name field empty and enter an encoded JWT into the Password field.
For information about obtaining a JWT for a HANA user on the HANA on-premise server, see Single Sign-On
Using JSON Web Tokens in SAP HANA Security Guide for SAP HANA Platform.
For information about obtaining a JWT for a HANA user on the HANA Cloud server, contact your Cloud server
administrator.
Related Information
When Data Services uses changed data capture (CDC) or auto correct load, it uses a temporary staging table
to load the target table. Data Services loads the data to the staging table and applies the operation codes
INSERT, UPDATE, and DELETE to update the target table. With the Bulk load option selected in the target table
editor, any one of the following conditions triggers the staging mechanism:
If none of these conditions are met, the input data contains only INSERT rows. Therefore Data Services
performs only a bulk insert operation, which does not require a staging table or the need to execute any
additional SQL.
By default, Data Services automatically detects the SAP HANA target table type. Then Data Services updates
the table based on the table type for optimal performance.
The bulk loader for SAP HANA is scalable and supports UPDATE and DELETE operations. Therefore, the
following options in the target table editor are also available for bulk loading:
• Use input keys: Uses the primary keys from the input table when the target table does not contain a
primary key.
• Auto correct load: If a matching row to the source table does not exist in the target table, Data Services
inserts the row in the target. If a matching row exists, Data Services updates the row based on other update
settings in the target editor.
For more information about SAP HANA bulk loading and option descriptions, see the Data Services
Supplement for Big Data.
Related Information
When you use SAP HANA tables as targets in a data flow, configure options in the target editor.
The following tables describe options in the target editor that are applicable to SAP HANA. For descriptions of
the common options, see the Reference Guide.
Options
Option Description
Table type Specifies the table type when you use SAP HANA template table as target.
• Column Store: Creates tables organized by column. Column Store is the default
setting.
Note
Data Services does not support blob, dbblob, and clob data types for column
store table types.
Bulk loading
Option Description
Bulk load Specifies whether Data Services uses bulk loading to load data to the target.
Mode Specifies the mode that Data Services uses for loading data to the target table:
• Append: Adds new records to the table. Append is the default setting.
• Truncate: Deletes all existing records in the table and then adds new records.
Commit size Specifies the maximum number of rows that Data Services loads to the staging and target
tables before it saves the data (commits).
• default: Uses a default commit size based on the target table type.
• Column Store: Default commit size is 10,000
• Row Store: Default commit size is 1,000
• Enter a value that is greater than 1.
Update method Specifies how Data Services applies the input rows to the target table.
Note
Do not use DELETE-INSERT if the update rows contain data for only some of the
columns in the target table. If you use DELETE-INSERT, Data Services replaces
missing data with NULLs.
SAP Data Services supports SAP HANA stored procedures with zero, one, or more output parameters.
Data Services supports scalar data types for input and output parameters. Data Services does not support
table data types. If you try to import a procedure with table data type, the software issues an error. Data
Services does not support data types such as binary, blob, clob, nclob, or varbinary for SAP HANA procedure
parameters.
Procedures can be called from a script or from a Query transform as a new function call.
Example
Syntax
Limitations
SAP HANA provides limited support of user-defined functions that can return one or several scalar values.
These user-defined functions are usually written in L. If you use user-defined functions, limit them to the
For more information about creating stored procedurees in a database, see the Functions and Procedures
section in the Designer Guide.
Related Information
To access SAP HANA data for SAP Data Services processes, configure an SAP HANA database datastore with
either a data source name (DSN) or a server name (DSN-less) connection.
You can optionally include secure socket layer (SSL) or transport layer security (TLS) for secure transfer of
data over a network, and you can use X.509 authentication instead of user name and password authentication.
Note
Support for SSL/TLS with a DSN connection begins in SAP Data Services version 4.2 SP7 (14.2.7.0).
Support for SSL/TLS with a DSN-less connection begins in SAP Data Services version 4.2 SP12 (14.2.12.0).
Note
Support for X.509 authentication is applicable for SAP HANA Server on premise 2.0 SP05 revision 56 and
above and SAP HANA Cloud. For SAP HANA client, X.509 is applicable for SAP HANA ODBC client 2.7 and
above.
When you create an SAP HANA datastore, and use SAP HANA data in data flows, Data Services requires
the SAP HANA ODBC driver. The following table lists the additional requirements for including additional
authentications.
For more information about SAP HANA, SSL/TLS, SAP CommonCrypto library, and settings for secure
external connections in the global.ini file, see the “SAP HANA Network and Communication Security”
section of the SAP HANA Security Guide.
Note
Related Information
Perform the following prerequisite tasks before you create the SAP HANA datastore:
Note
If you plan to include SSL/TLS or X.509 authentication in the datastore, consider using the SAP HANA
ODBC driver version 2.9 or higher. Version 2.9 or higher is bundled with the SAP CommonCrypto
library, which is required for SSL/TLS and X.509.
• If you use a data source name (DSN), create a DSN connection using the ODBC Data Source Administrator
(Windows) or the SAP Data Services Connection Manager (Unix).
• If you include SSL/TLS encryption:
• Download the SAP CommonCrypto library and set the PATH environment variable as instructed in
Obtaining the SAP CommonCryptoLib file in Windows and Unix [page 60].
Note
If you use the SAP HANA ODBC driver version 2.9 or higher, set the following environment
variables:
• Windows: Set the SECUDIR and PATH environment variables to HANA Client directory with
Commoncrypto library on Windows system.
• Unix: Set the SECUDIR and LD_LIBRARY_PATH environment variables to HANA Client
directory with Commoncrypto library on Unix system.
• Set SSL/TLS options in the datastore editor (Windows) or the Data Services Connection Manager
(Unix).
• If you use X.509 authentication:
• Download the SAP CommonCrypto library and set the PATH environment variable as instructed in
Obtaining the SAP CommonCryptoLib file in Windows and Unix [page 60].
Note
If you've downloaded the SAP CommonCrypto library and set the PATH environment variable for
SSL/TLS, you don't have to repeat it for X.509 authentication.
Note
If you use the SAP HANA ODBC driver version 2.9 or higher, set the following environment
variables:
• Windows: Set the SECUDIR and PATH environment variables to HANA Client directory with
Commoncrypto library on Windows system.
• Unix: Set the SECUDIR and LD_LIBRARY_PATH environment variables to HANA Client
directory with Commoncrypto library on Unix system.
Note
Support for X.509 authentication is applicable for SAP HANA Server on premise 2.0 SP05 revision 56
and above and SAP HANA Cloud. For SAP HANA client, X.509 is applicable for SAP HANA ODBC client
2.7 and above.
Related Information
In addition to the common database datastore options, SAP Data Services requires that you set options
specific to SAP HANA, SSL/TLS, and X.509 authentication.
For descriptions of common database datastore options, and for steps to create a database datastore, see the
Datastores section of the Designer Guide.
The following table contains the SAP HANA-specific options in the datastore editor, including DSN and DSN-
less settings.
Option Value
Use Data Source Name (DSN) Specifies whether to use a DSN (data source name) connec
tion.
The following options appear when you select to use a DSN connection:
Data Source Name Select the SAP HANA DSN that you created previously (see
Prerequisites).
User Name Enter the user name and password connected to the DSN.
Password Note
Database server name Specifies the name of the computer where the SAP HANA
server is located.
Note
Port Enter the port number to connect to the SAP HANA Server.
The default is 30015.
Note
Advanced options
The following table contains descriptions for the advanced options in the SAP HANA datastore editor, including
options for SSL/TLS and X.509.
Note
Support for X.509 authentication is applicable for SAP HANA Server on premise 2.0 SP05 revision 56 and
above, and SAP HANA Cloud. For SAP HANA client, X.509 is applicable for SAP HANA ODBC client 2.7 and
above.
Option Description
Database name Optional. Enter the specific tenant database name. Applica
ble for SAP HANA version 2.0 SPS 01 MDC and later.
Additional connection parameters Enter information for any additional parameters that
the data source ODBC driver and database supports.
Use the following format: <parameter1=value1;
parameter2=value2>
Use SSL encryption Specifies to use SSL/TLS encryption for the datastore con
nection to the database.
Enabled only when you select Yes for Use SSL encryption or
Yes for Use X.509 authentication.
The following options are in the Encryption Parameters dialog box when you select Yes for Use SSL encryption:
Validate Certificate Specifies whether the software validates the SAP HANA
server SSL certificate. If you do not select this option, none
of the other SSL options are available to complete.
Crypto Provider Specifies the crypto provider used for SSL/TLS and X.509
communication. Data Services populates Crypto Provider
automatically with commoncrypto. SAP CommonCryptoLib
library is the only supported cryptographic library for SAP
HANA.
Certificate host Specifies the host name used to verify the server identity.
Choose one of the following actions:
Key Store Specifies the location and file name for your key store file.
You can also use a substitution parameter.
Note
Note
X.509 key store Specifies the x509 key store file, which includes the X.509
client certificate. If the input isn't an absolute file path, Data
Services assumes that the file is in $SECUDIR. You can also
use a substitution parameter for X.509 key store.
Proxy group: For DSN-less connections only. For connecting to SAP HANA data through the SAP Cloud connector.
SAP cloud connector account Specifies the SAP Cloud connector account. Complete when
the SAP HANA data is in SAP Big Data Services.
Related Information
To use a DSN connection for an SAP HANA datastore, configure a DSN connection for Windows using the
ODBC Data Source Administrator.
Optionally include SSL/TLS encryption and the X.509 authentication settings when you configure the DSN.
Perform the prerequisites listed in SAP HANA datastore prerequisites [page 68].
To configure DSN for SAP HANA on Windows, perform the following steps:
Access the ODBC Data Source Administrator either from the datastore editor in Data Services Designer or
directly from your Start menu.
2. In the ODBC Data Source Administrator, open the System DSN tab and select Add.
3. Choose the SAP HANA ODBC driver and select Finish.
The driver is listed only after you select it in the ODBC Drivers Selector utility.
If you leave Certificate host blank, Data Services uses the value in Database server name. If you don't
want the value from the Database server name, enter one of the following values:
• A string that contains the SAP HANA server hostname.
• The wildcard character “*”, so that Data Services doesn't validate the certificate host.
c. Specify the location and file name for your key store in Key Store.
d. Select OK.
11. Optional: Complete the following options for X.509 authentication:
a. Optional: In the TLS/SSL group, select Connect to the database using TLS/SSL.
b. Open Advanced ODBC Connection Property Setup and enter a string in Additional connection
properties:
To bypass certificate validation, enter the following string:
authenticationX509:<location_of_PSE>\x509.pse
sslValidateCertificate:FALSE
authenticationX509:<Location_of_PSE>\x509.pse
c. Select OK.
Configure a DSN connection for an SAP HANA database datastore for Unix using the SAP Data Services
Connection Manager.
Optionally include SSL/TLS encryption and the X.509 authentication settings when you configure the DSN.
Perform the prerequisites listed in SAP HANA datastore prerequisites [page 68].
Use the GTK+2 library to make a graphical user interface for the Connection Manager. Connection Manager is
a command-line utility. To use it with a UI, install the GTK+2 library. For more information about obtaining and
installing GTK+2, see https://www.gtk.org/ .
The following instructions assume that you have the user interface for Connection Manager.
1. Export $ODBCINI to a file in the same computer as the SAP HANA data source. For example:
export ODBCINI=<dir_path>/odbc.ini
2. Start SAP Data Services Connection Manager by entering the following command:
$LINK_DIR/bin/DSConnectionManager.sh
3. Open the Data Sources tab and select Add to display the list of database types.
4. In the Select Database Type dialog box, select the SAP HANA database type and select OK.
The configuration page opens with some of the connection information automatically completed:
• Absolute location of the odbc.ini file
• Driver for SAP HANA
• Driver version
5. Complete the following options:
Note
If you enable X.509 authentication, the key store file must contain both the SSL server certificate
and the X.509 client certificate.
For descriptions of the DSN and SSL/TLS options, see SAP HANA datastore prerequisites [page 68].
7. Optional: To include X.509 authentication, complete the following X.509 option: Specify the HANA User
Authentication Method: Select 1: x.509.
8. Press Enter .
The system tests the connection. The message “Successfully edited database source.” appears when
you've successfully configured the DSN.
Related Information
SAP Data Services performs data type conversions when it imports metadata from SAP HANA sources or
targets into the repository and when it loads data into an external SAP HANA table or file.
Data Services uses its own conversion functions instead of conversion functions that are specific to the
database or application that is the source of the data.
Additionally, if you use a template table or Data_Transfer table as a target, Data Services converts from internal
data types to the data types of the respective DBMS.
Related Information
SAP Data Services converts SAP HANA data types when you import metadata from an SAP HANA source or
target into the repository.
Data Services converts data types back to SAP HANA data types when you load data into SAP HANA after
processing.
integer int
tinyint int
smallint int
bigint decimal
char varchar
nchar varchar
varchar varchar
nvarchar varchar
float double
real real
double double
date date
time time
timestamp datetime
clob long
nclob long
blob blob
binary blob
varbinary blob
The following table shows the conversion from internal data types to SAP HANA data types in template tables.
blob blob
date date
datetime timestamp
decimal decimal
double double
int integer
interval real
long clob/nclob
real decimal
time time
timestamp timestamp
varchar varchar/nvarchar
SAP Data Services supports spatial data such as point, line, polygon, collection, for specific databases.
When you import a table with spatial data columns, Data Services imports the spatial type columns as
character based large objects (clob). The column attribute is Native Type, which has the value of the
actual data type in the database. For example, Oracle is SDO_GEOMETRY, Microsoft SQL Server is geometry/
geography, and SAP HANA is ST_GEOMETRY.
When reading a spatial column from HANA, Data Services reads it as EWKT (Extended Well-Known Text). When
loading a spatial column to HANA the input must have the correct SRID (spatial reference identifier). Note that
creating template tables with a spatial column and SRID is not supported.
Limitations
• You cannot create template tables with spatial types because spatial columns are imported into Data
Services as clob.
• You cannot manipulate spatial data inside a data flow because the spatial utility functions are not
supported.
Loading complex spatial data from Oracle to SAP HANA [page 79]
Complex spatial data is data such as circular arcs and LRS geometries.
Related Information
Load spacial data from Oracle or Microsoft SQL Server to SAP HANA.
Learn more about spatial data by reading the SAP HANA documentation.
1. Import a source table from Oracle or Microsoft SQL Server to SAP Data Services.
2. Create a target table in SAP HANA with the appropriate spatial columns.
3. Import the SAP HANA target table into Data Services.
4. Create a data flow with an Oracle or Microsoft SQL Server source as reader.
Include any necessary transformations.
5. Add the SAP HANA target table as a loader.
Make sure not to change the data type of spatial columns inside the transformations.
6. Build a job that includes the data flow and run it to load the data into the target table.
Task overview: Using spatial data with SAP HANA [page 78]
Related Information
Loading complex spatial data from Oracle to SAP HANA [page 79]
Complex spatial data is data such as circular arcs and LRS geometries.
For example, in the SQL below, the table name is “Points”. The “geom” column contains the following
geospatial data:
SELECT
SDO_UTIL.TO_WKTGEOMETRY(
SDO_GEOM.SDO_ARC_DENSIFY(
geom,
(MDSYS.SDO_DIM_ARRAY(
MDSYS.SDO_DIM_ELEMENT('X',-83000,275000,0.0001),
MDSYS.SDO_DIM_ELEMENT('Y',366000,670000,0.0001)
)),
'arc_tolerance=0.001'
)
)
from "SYSTEM"."POINTS"
For more information about how to use these functions, see the Oracle Spatial Developer's Guide on the
Oracle Web page at SDO_GEOM Package (Geometry) .
7. Build a job in Data Services that includes the data flow and run it to load the data into the target table.
Task overview: Using spatial data with SAP HANA [page 78]
Related Information
Once connected to Amazon Athena, you can browse metadata, import tables, read data from tables, and load
data into tables. Note that template table creation is not supported.
Note
Before you begin, make sure you have the necessary privileges for Athena and for the underlying S3
storage for the table data. If you need help with privileges, please refer to the Amazon documentation or
contact their Support team.
Note
DELETE is not allowed for the normal external table. The DELETE is transactional and is supported only for
Apache Iceberg tables.
Related Information
When you import metadata from an Athena table into the repository, SAP Data Services converts Athena data
types to Data Services native data types for processing.
Boolean Int
Smallint/Tinyint Int
Int Int
Bigint Decimal(19,0)
Double Double
Decimal(prec,scale) Decimal(prec,scale)
Char(n) Varchar(n)
Varchar(n) Varchar(n)
String Varchar(N)
Note
Date Date
Timestamp Datetime
Array Varchar(N)
Note
Map Varchar(N)
Note
Struct Varchar(N)
Note
Binary Varchar(255)
Note
ODBC capacities
Options Supported
AutoCommit Yes
Absolute Yes
Ceiling Yes
Floor Yes
Round Yes
Truncate Yes
Sqrt Yes
Log Yes
Ln Yes
Power Yes
Mod Yes
Lower Yes
Upper Yes
Trim Yes
Substring Yes
Year/Month/Week/DayOfMonth/DayOfYear Yes
Sysdate/Systime Yes
Avg/Count/Sum Yes
Max/Min Yes
Ifthenelse Yes
For more information, see Value descriptions for capability and function options and ODBC Capability and
function options in the Designer Guide.
SAP Data Services provides access to various cloud databases and storages to use for reading or loading big
data.
Access various cloud databases through file location objects and file format objects.
SAP Data Services supports many cloud database types to use as readers and loaders in a data flow.
Related Information
In SAP Data Services, create a database datastore to access your data from Amazon Redshift. Additionally,
load Amazon S3 data files into Redshift using the built-in function load_from_s3_to_redshift.
Authentication
Select from two authentication methods when you configure your connection to Amazon Redshift:
• Standard: Use your user name and password to connect to Amazon Redshift.
• AWS IAM (Identification and Access Management): There are two types of IAM to select from:
• AWS IAM credentials: Use Cluster ID, Region, Access Key ID, and Secret Access Key for
authentication.
• AWS IAM profile: Use an AWS profile and enter your Cluster ID, Region, and profile name for
authentication.
Note
If you enter the server name in the datastore editor as a full server endpoint, Cluster ID and Region
aren't required. However, if you enter a server name as a gateway, Cluster ID and Region are required.
Encryption
You control whether you require SSL for your Amazon Redshift cluster database by enabling or disabling SSL.
In SAP Data Services, enable SSL in the datastore editor and then set the mode in the Encryption Parameters
dialog box. To learn how the SAP Data Services SSL mode settings affect the SSL mode setting in Amazon
Redshift cluster database, see the table in Using SSL and Trust CA certificates in ODBC in your Amazon
Redshift documentation.
Note
Beginning in SAP Data Services 4.3.02.01, use the options Use SSL encryption and Encryption parameters
to configure SSL encryption instead of entering options in the Additional connection parameters option.
Note
The SSL modes, verify-ca and verify-full, require the application to check the trusted root CA certificate.
The latest Redshift ODBC driver ships with bundled trusted root CA certificates. Check to ensure that your
installed ODBC driver includes trusted root CA certificates. For more information about certificates, see
Connect using SSL in the AWS Redshift documentation. If the root.crt file isn't in your ODBC driver
library directory (C:\Program Files\Amazon Redshift ODBC Driver\lib), you must download the
server certificate from Amazon. Save the certificate as root.crt in the ODBC library directory, or save the
server certificate in a special directory on your system.
Connection information
The Amazon Redshift datastore supports either a DSN or server-based connection. The following list contains
information about these connections:
• For a server-based connection on Windows, select the Redshift ODBC driver by using the
ODBCDriverSelector.exe utility located under %link_dir%\bin. Then, configure the Amazon Redshift
datastore and make sure that Use data source name (DNS) isn't selected. For information about
configuring DSN on Linux, see Configuring a Redshift DSN connection on Linux using ODBC [page 90].
• To configure a DSN connection, select the option Use data source name (DSN) in the datastore editor to
configure the DSN.
Related Information
Use an Amazon Redshift datastore to access data from your Redshift cluster datastore for processing in SAP
Data Services.
The following table describes the options specific for Redshift when you create or edit a datastore. For
descriptions of basic datastore options, see Common datastore options in the Designer Guide.
Option Description
Database server name Specifies the server name for the Redshift cluster database. Enter as a gateway
hostname, gateway IP address, or as an endpoint.
If you enter the server name as an endpoint, you don't have to complete the
Cluster ID and Region options.
User Name Specifies the Redshift cluster database user name. User name is required
regardless of the authentication method you choose.
Note
The cluster database user name isn't the same as your AWS Identity and
Access Management (IAM) user name.
Password Specifies the password that corresponds to the entered User Name. Only appli
cable when you select Standard for Authentication Method.
Additional connection parameters Specifies additional connection information, such as to use integer for boolean
data type.
Example
The Amazon Redshift ODBC driver uses the string (char) data type. To
insert the boolean data type as an integer, enter BoolsAsChar=0. After
you re-import the table, the Bool data type column shows int.
Use SSL encryption Specifies to use SSL encryption when connecting to your Redshift cluster
database. When you set to Yes, also open the Encryption parameters dialog and
select the SSL mode.
Encryption parameters Specifies the level of SSL encryption on the client side.
Select the text box to open the Encryption Parameters dialog. Choose from one
of the following modes:
Note
SSL settings for Amazon Redshift affect some of the SSL modes. For more
information, see
Authentication Method Specifies how to authenticate the connection to Amazon Redshift cluster data
base:
• Standard: Uses your user name and password to authenticate. You must
complete the options User Name and Password.
• AWS IAM Credentials: Uses your AWS Cluster ID, Region, AccessKeyId, and
SecretAccessKey to authenticate.
• AWS Profile: Authenticates using your AWS Cluster ID, Region, and Profile
Name to authenticate.
Cluster ID Specifies the ID assigned to your Amazon Redshift cluster. Obtain from the
AWS Management Console.
Not applicable when the Database server name is an endpoint, which includes
the cluster ID.
Region Specifies the region assigned to your Amazon Redshift cluster. Obtain from the
AWS Management Console.
Not applicable when the Database server name is an endpoint, which includes
the region.
For more information about AWS regions, see your Amazon Redshift documen
tation.
AccessKeyId Specifies your AWS IAM access key ID. Obtain from the AWS Management
Console.
Note
The datastore editor encrypts the access key ID when you enter it so no
one else can see or use it.
Not applicable when you select AWS Profile for Authentication Method.
SecretAccessKey Specifies your AWS IAM secret access key. Obtain from the AWS Management
Console.
Note
The secret access key appears the first time you create an access key ID in
AWS. After that, you must remember it because AWS won't show it again.
The datastore editor encrypts the secret access key when you enter it so
no one else can see or use it.
Not applicable when you select AWS Profile for Authentication Method.
Profile Name Specifies your AWS profile name when you use AWS IAM Profile for
Authentication Method. For more information about using an authentication
profile, see your AWS Redshift documentation.
Note
Not applicable when you select AWS IAM Credentials for Authentication
Method.
Related Information
For a data source name (DSN) connection, use the DS Connection Manager to configure Amazon Redshift as a
source.
Before performing the following task, download and install the Amazon Redshift ODBC driver for Linux.
For details, and the latest ODBC driver download, see Configuring an ODBC Connection on the Amazon
Website. For complete information about installing and configuring an ODBC driver, find a link to download the
PDF file Amazon Redshift ODBC Data Connector Installation and Configuration Guide. Find a link tor the guide in
the topic “Configuring an ODBC Connection”.
To configure a DSN connection on Linux for an Amazon Redshift cluster database, install the downloaded driver
and then perform the following steps:
1. Configure the following files following directions in the Amazon Redshift ODBC Data Connector Installation
and Configuration Guide:
• amazon.redshiftodbc.ini
• odbc.ini
• odbcinst.ini
If you installed the ODBC driver to the default location, the files are in /opt/amazon/
redshiftodbc/lib/64.
2. Open the amazon.redshiftodbc.ini and add a line at the end of the file to point to the
libodbcinst.so file.
Example
/home/ec2-user/unixODBC/lib/libodbcinst.so
3. Find the [Driver] section in the amazon.redshiftodbc.ini file, and set DriverManagerEncoding to
UTF-16.
Example
[Driver]
DriverManagerEncoding=UTF-16
The SAP Data Services installer places the DS Connection Manager in $LINK_DIR//bin/
DSConnectionManager.sh by default. Use the following example as a guide.
Example
Specify the DSN name from the list or add a new one:
<DSN_Name>
Specify the Unix ODBC Lib Path:
/build/unixODBC-232/lib
Specify the Driver:
Related Information
Option descriptions for using an Amazon Redshift database table as a source in a data flow.
When you use an Amazon Redshift table as a source, the software supports the following features:
The following list contains behavior differences from Data Services when you use certain functions with
Amazon Redshift:
• When using add_month(datetime, int), pushdown doesn't occur if the second parameter is not in an
integer data type.
• When using cast(input as ‘datatype’), pushdown does not occur if you use the real data type.
For more about push down functions, see SAP Note 2212730 , “SAP Data Services push-down operators,
functions, and transforms”. Also read about maximizing push-down operations in the Performance
Optimization Guide.
The following table lists source options when you use an Amazon Redshift table as a source:
Option Description
Table name Name of the table that you added as a source to the data flow.
Table owner Owner that you entered when you created the Redshift table.
Database type Database type that you chose when you created the datastore. You cannot change
this option.
The Redshift source table also uses common table source options.
For more information about pushdown operations and viewing optimized SQL, see the Performance
Optimization Guide.
Related Information
Descriptions of options for using an Amazon Redshift table as a target in a data flow.
• input keys
• auto correct
• data deletion from a table before loading
Note
The Amazon Redshift primary key is informational only and the software does not enforce key constraints
for the primary key. Be aware that using SELECT DISTINCT may return duplicate rows if the primary key is
not unique.
Note
The Amazon Redshift ODBC driver does not support parallelize load via ODBC into a single table. Therefore,
the Number of Loaders option in the Options tab is not applicable for a regular loader.
Bulk load Select to use bulk loading options to write the data.
Mode Select the mode for loading data in the target table:
Note
• Truncate: Deletes all existing records in the table, and then adds new records.
S3 file location Enter or select the path to the Amazon S3 configuration file. You can enter a variable for
this option.
Maximum rejects Enter the maximum number of acceptable errors. After the maximum is reached, the
software stops Bulk loading. Set this option when you expect some errors. If you enter 0,
or if you do not specify a value, the software stops the bulk loading when the first error
occurs.
If you insert a Text delimiter, other than a single quote (‘), as well as a comma (,) for the
Column delimiter, Data Services will treat the data file as a .csv file.
Generate files only Enable to generate data files that you can use for bulk loading.
When enabled, the software loads data into data files instead of the target in the data flow.
The software writes the data files into the bulk loader directory specified in the datastore
definition.
If you do not specify a bulk loader directory, the software writes the files to
<%DS_COMMON_DIR%>\log\bulkloader\<tablename><PID>. Then you man
ually copy the files to the Amazon S3 remote system.
Clean up bulk loader directory Enable to delete all bulk load-oriented files from the bulk load directory and the Amazon
after load S3 remote system after the load is complete.
Parameters Allows you to enter some Amazon Redshift copy command data conversion parameters,
such as escape, emptyasnull, blanksasnull, ignoredblanklines, and so on. These parame
ters define how to insert data to a Redshift table. For more information about the param
eters, see https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html#r_COPY-syn
tax-overview-optional-parameters .
General settings
Option Description
Number of loaders Sets the number of threads to generate multiple data files
for a parallel load job. Enter a positive integer for the number
of loaders (threads).
Related Information
SAP Data Services converts Redshift data types to the internal data types when it imports metadata from a
Redshift source or target into the repository.
The following table lists the internal data type that Data Services uses in place of the Redshift data type.
smallint int
integer int
bigint decimal(19,0)
decimal decimal
real real
float double
boolean varchar(5)
char char
Note
The char data type doesn't support multibyte characters. The maximum range is 4096
bytes.
nchar char
varchar varchar
nvarchar
Note
The varchar and nvarchar data types support UTF8 multibyte characters. The size is the
number of bytes and the max range is 65535.
Caution
If you try to load multibyte characters into a char or nchar data type column, Redshift
issues an error. Redshift internally converts nchar and nvarchar data types to char and
varchar. The char data type in Redshift doesn't support multibyte characters. Use overflow
to catch the unsupported data or, to avoid this problem, create a varchar column instead of
using the char data type.
date date
timestamp datetime
text varchar(256)
bpchar char(256)
The following data type conversions apply when you use a Redshift template table as the target.
blob varchar(max)
date date
datetime datetime
decimal decimal
int integer
interval float
long varchar(8190)
real float
time varchar(25)
timestamp datetime
varchar varchar/nvarchar
char char/nchar
Related Information
Developers and administrators who use Microsoft SQL Server can store on-premise SQL Server workloads on
an Azure virtual machine in the cloud.
The Azure virtual machine supports both Unix and Windows platforms.
Related Information
To move blobs to or from a Microsoft Azure container, use built-in functions in a script object and a file location
object.
Note
Before you perform the following steps, create a file location object for the Azure container. Also create an
unstructured binary text file format that describes the blob in the Azure container. Use the file format in a data
flow to perform extra operations on the blob.
Use an existing Azure container or create a new one. Because SAP Data Services doesn't internally manipulate
the blobs in an Azure container, the blobs can be of any type. Currently, Data Services supports the block blob
in the container storage type.
To use built-in functions to upload files to a container storage blob in Microsoft Azure, perform the following
high-level steps:
1. Create a storage account in Azure and take note of the primary shared key.
To move files between remote and local directories, use the following scripts:
• copy_to_remote_system
• copy_from_remote_system
Example
To access a subfolder in your Azure container, specify the subfolder in the following script:
The script copies all of the files from the local directory specified in the file location object to the
container specified in the same object. When you include the remote directory and subfolder in the
script, the function copies all of the files from the local directory to the subfolder specified in the script.
4.1.3 Google BigQuery
The Google BigQuery datastore contains access information and passwords so that the software can open your
Google BigQuery account on your behalf.
After accessing your account, SAP Data Services can load data to or extract data from your Google BigQuery
projects:
• Extract data from a Google BigQuery table to use as a source for Data Services processes.
• Load generated data from Data Services to Google BigQuery for analysis.
• Automatically create and populate a table in your Google BigQuery dataset by using a Google BigQuery
template table.
You can also reference the Google BigQuery datastore in the built-in function load_from_gcs_to_gbq, to
load data from your GCS to a Google BigQuery table. For details about the function, see the Reference Guide.
Note
We recommend that you create a Google BigQuery ODBC datastore instead of the Google BigQuery
datastore. The Google BigQuery ODBC datastore uses the Magnitude Simba ODBC driver for BigQuery,
which supports standard SQL and more data types than the Google BigQuery datastore. For more
information, see SAP Note 3241713 .
For complete information about how Data Services supports Google BigQuery, see the Supplement for Google
BigQuery.
Related Information
With a Google BigQuery ODBC datastore, make ODBC calls to your Google BigQuery data sets to download,
process, and upload data in SAP Data Services.
To access the data in your Google BigQuery account, the datastore uses the Simba ODBC driver for Google
BigQuery, which supports the OAuth 2.0 protocol for authentication and authorization. Configure the driver to
provide your credentials and authenticate the connection to the data using either a Google user account or a
Google service account.
Note
Beginning with Data Services 4.2.13, we recommend that you create a Google BigQuery ODBC datastore
instead of the Google BigQuery datastore. The Google BigQuery ODBC datastore uses the Magnitude
Simba ODBC driver for BigQuery, which supports standard SQL and more data types than the Google
BigQuery datastore.
Note
Data Services supports Google BigQuery ODBC datastore on Windows and Linux platforms only.
For information about how Data Services supports Google BigQuery ODBC, see the Supplement for Google
BigQuery.
Related Information
Access your data lake database in an SAP HANA Cloud by creating a database datastore.
SAP HANA Cloud, data lake consists of a data lake Relational Engine and data lake Files:
• The data lake Relational Engine provides high-performance analysis for petabyte volumes of relational
data.
• Data lake Files provide managed access to structured, semistructured, and unstructured data that is
stored as files in the data lake.
For more information about SAP HANA Cloud, data lake, see the documentation on the SAP Customer Portal.
SAP HANA Cloud, data lake supports bulk loading and pushdown to the database level. Further, you don't need
to use VPN or VPC to connect to your data lake database.
Prerequisites
Download the SAP HANA client server package from the SAP Software center at https://
launchpad.support.sap.com/#/softwarecenter . Enter the search term HANADLCLIENT. Select to download
the SAP HANA data lake client package based on your operating system and the applicable version. The
package includes the native ODBC driver, which you must configure using either the ODBC Driver Selector
(Windows) or the DS Connection Manager (Linux).
For more information about installing the SAP HANA Cloud data lake client, see the SAP HANA Cloud, Data
Lake Client Interfaces guide.
Datastore options for SAP HANA Cloud, data lake [page 100]
The datastore for SAP HANA Cloud, data lake contains the connection information that SAP Data
Services uses to connect to your database.
Related Information
The datastore for SAP HANA Cloud, data lake contains the connection information that SAP Data Services
uses to connect to your database.
Create and configure a data lake datastore following the basic instructions in Creating a database datastore in
the Designer Guide.
The following table contains the options to complete that are specific for data lake.
Use SSL encryption SSL/TLS encryption is required, therefore you can't change
this value from Yes.
TLS options Enter the SSL/TLS options. Enter the following string:
tls_type=rsa;direct-yes
• TLS_Type: RSA is the only supported encryption type
for the connection.
• DIRECT: Yes is the only acceptable value because DI
RECT through a proxy is the only supported connection
type.
Parent topic: SAP HANA Cloud, data lake database [page 99]
Related Information
Configure the ODBC driver for connecting to your SAP HANA Cloud, data lake database for Linux using the DS
Connection Manager. Also configure the DSN in the DS Connection Manager, when applicable.
The following example shows the steps to configure the ODBC driver for SAP HANA Cloud, data lake using
the SAP Data Services Connection Manager. For details about using the Connection Manager, see Using the
Connection Manager in the Administrator Guide.
Example
The following sample code show the options in the DS Connection Manager to configure the ODBC driver
for a DSN-less connection. Options for which you supply your own information are shown in <> in bold.
When you use a data source name (DSN) connection, configure the DSN in the DS Connection Manager as
shown in the following example:
Example
Parent topic: SAP HANA Cloud, data lake database [page 99]
Related Information
Datastore options for SAP HANA Cloud, data lake [page 100]
• Import tables
• Read or load Snowflake tables in a data flow
• Create and load data into template tables
• Browse and import the tables located under different schemas (for example, Netezza)
• Preview data
• Push down base SQL functions and Snowflake-specific SQL functions (see SAP Note 2212730 )
• Bulkload data (possible through AWS S3 File Location or Azure Cloud Storage File Location)
For instructions to create a target template table, see the Template tables section of the Designer Guide.
Related Information
Use the ODBC Drivers Selector utility bundled with SAP Data Services to configure Snowflake as a source for
SAP Data Services.
For more information about the ODBC Drivers Selector utility, see Using the ODBC Drivers Selector for
Windows in the Administrator Guide.
Before you start to configure the connection to Snowflake, make sure you meet the following prerequisites:
• You have correctly installed the Snowflake ODBC Driver from the Snowflake official website.
See the SAP Data Services Product Availability Matrix (PAM) for the latest client version that we support.
For more information about the Snowflake ODBC driver, see the Snowflake User Guide on the Snowflake
website.
• You have database account credentials for the ODBC connection to Snowflake.
1. Open the Data Services Designer and perform the steps in Creating a database datastore.
2. Select the correct values for the datastore options.
• Datastore Type: Database
• Database Type: Snowflake
• Database Version: Select the latest compatible database version.
• Use data source name (DSN): Leave this option unchecked.
For descriptions of the remaining options for the Snowflake datastore, see Common datastore options in
the Designer Guide and Snowflake datastore [page 111].
3. Click Apply or OK.
Use the DS Connection Manager to configure Snowflake as a source for SAP Data Services.
Before you perform the steps in this topic, read Configure drivers with data source name (DSN) connections in
the Administrator Guide.
For details about using the Connection Manager, see Using the Connection Manager in the Administrator
Guide.
Preparation
Before using an ODBC driver to connect to Snowflake, make sure you perform the following required steps:
• Download and install the Snowflake ODBC driver from the Snowflake website.
See the SAP Data Services Product Availability Matrix (PAM) for the latest supported client version.
For more information about the Snowflake ODBC driver, see the Snowflake User Guide on the Snowflake
website.
• Obtain database account credentials for the ODBC connection to Snowflake.
• Install the unixODBC tool (version V2.3 or higher) on your Linux system with the software package tool
‘yum’ for Redhat or ‘zypper’ for SuSE Linux.
• Set the following system environment variables:
• ODBCSYSINI=<Path where you place the $ODBCINI and $ODBCINST files>
• ODBCINI=<DSN configuration file>
• ODBCINST=<Driver configuration file>
For example:
export ODBCSYSINI=/home/dbcfg/
export ODBCINI=/home/dbcfg/odbc.ini
export ODBCINST=/home/dbcfg/odbcinst.ini
DATABASE=DB_EXAMPLE
SCHEMA=
WAREHOUSE=
ROLE=AUTHENTICATOR=SNOWFLAKE_JWT
PRIV_KEY_FILE=/home/dbcfg/snowflake/rsa_key.p8
PRIV_KEY_FILE_PWD=******
Driver sample in $ODBCINST:
[SF_EXAMPLE]
driver=/usr/lib64/snowflake/odbc/lib/libSnowflake.so
unixODBC=/usr/lib64
Note
• The job engine internally uses the bundled file $LINK_DIR/bin/ds_odbc.ini to get to the ODBC
Driver Manager, so the file cannot be referenced by the system environment ODBCINI.
• The two files that are referenced by $ODBCINI and $ODBCINST must be accessible, readable, and
writable by SAP Data Services.
***************************************************
SAP Data Services Connection Manager
***************************************************
------------------Start Menu------------------
Connection Manager is used to configure Data Sources or Drivers.
1: Configure Data Sources
2: Configure Drivers
q: Quit Program
Select one command:'1'
2
5. Add a new driver (enter a) or edit an existing driver (enter e).
1) MySQL Driver for version [8.x, 5.7, 5.6, 5.5, 5.1, 5.0]
2) HANA Driver for version [2, 1]
3) Teradata Driver for version [17.10, 16.20, 16.00, 15.10, 15.00,
14.10, 14.00, 13.10, 13.0, 12]
4) Netezza Driver for version [11, 7, 6, 5, 4]
5) Sybase IQ Driver for version [15, 16]
6) Informix Driver for version [14, 12, 11]
7) DB2 UDB Driver for version [11, 10, 9]
8) Oracle Driver for version [19, 18, 12, 11, 10, 9]
9) SQL Anywhere Driver for version [17, 16, 12]
10) Snowflake Driver for version [3.x, 5.x]
11) Hive Server Driver for version [2.x.y, 2.1.2, 2.1.1]
12) PostgreSQL Driver for version [10.x, 12.x, 13.x, 14.x]
13) Google BigQuery Driver
14) Amazon Redshift Driver for version [8.x]
15) SAP Hana Data Lake ODBC Driver
7. Complete the datastore options.
For more information about the Snowflake security-related parameters, see the Snowflake User Guide on
the Snowflake website.
The following examples show the steps necessary to configure the ODBC driver for Snowflake using the SAP
Data Services Connection Manager.
********************************
Configuration for Snowflake
********************************
The ODBC inst file is <***/***>/odbcinst.ini.
Specify the Driver Version:'3.x'
5.x
Specify the Driver Name:
SF_EXAMPLE
********************************
Configuration for Snowflake
********************************
The ODBC inst file is <***/***>/odbcinst.ini.
Specify the Driver Version:'3.x'
5.x
Specify the Driver Name:
SF_EXAMPLE
Specify the Driver:
/usr/lib64/snowflake/odbc/lib/libSnowflake.so
Specify the Host Name:
******.snowflakecomputing.com
Specify the Port:'443'
***************************************************
SAP Data Services Connection Manager
***************************************************
------------------Start Menu------------------
Connection Manager is used to configure Data Sources or Drivers.
1: Configure Data Sources
2: Configure Drivers
q: Quit Program
Select one command:'1'
1
5. Add a new database source (enter a) or edit an existing database source (enter e).
The following example shows the steps to configure a data source for Snowflake using the SAP Data Services
Connection Manager.
********************************
Configuration for Snowflake
********************************
Specify the DSN name from the list or add a new one:
SF_DSN_EXAMPLE
Specify the User Name:
USER_EXAMPLE
Type database password:(no echo)
Retype database password:(no echo)
Specify the Unix ODBC Lib Path:
/usr/lib64
Specify the Driver Version:'3.x'
5.x
Specify the Driver:
/usr/lib64/snowflake/odbc/lib/libSnowflake.so
Specify the Host Name:
******.snowflakecomputing.com
Specify the Port:'443'
Specify the Database:
DB_EXAMPLE
Testing connection...
Successfully added driver.
Press Enter to go back to the Main Menu.
Note
After configuration, the information will be reflected in the local file $ODBCINI and $LINK_DIR/bin/
ds_odbc.ini.
For steps to define a database datastore, see Creating a database datastore in the Designer Guide.
For information about Snowflake database datastore options, see Snowflake datastore [page 111].
Related Information
For information about creating a database datastore and descriptions for common datastore options, see the
Designer Guide.
Option Description
Note
Private Key File Password The password used to protect the private key file. A password is required for
encrypted version private keys only.
JWT Timeout (in seconds) JSON-based web token timeout value. The default is 30 seconds.
For more detailed information about key pair authentication, see the Snowflake documentation.
Related Information
When you use a Snowflake table as a source, the software supports the following features:
For more about push down functions, see SAP Note 2212730 , “SAP Data Services push-down operators,
functions, and transforms”. Also read about maximizing push-down operations in the Performance
Optimization Guide.
The following table lists source options when you use a Snowflake table as a source:
Option Description
Table name Name of the table that you added as a source to the data flow.
Table owner Owner that you entered when you created the Snowflake table.
Database type Database type that you chose when you created the datastore. You cannot change
this option.
Performance settings
Option Description
Join Rank Specifies the rank of the data file relative to other tables and
files joined in a data flow.
The software joins sources with higher join ranks before join
ing sources with lower join ranks.
For new jobs, specify the cache only in the Query transform
editor.
Cache Specifies whether the software reads the data from the
source and load it into memory or pageable cache.
For new jobs, specify the cache only in the Query transform
editor.
Array fetch size Indicates the number of rows retrieved in a single request
to a source database. The default value is 1000. Higher num
bers reduce requests, lowering network traffic, and possibly
improve performance. Maximum value is 5000.
Related Information
SAP Data Services converts Snowflake data types to the internal data types when it imports metadata from a
Snowflake source or target into the repository.
The following table lists the internal data type that Data Services uses in place of the Snowflake data type.
byteint/tinyint decimal(38,0)
smallint decimal(38,0)
int/integer decimal(38,0)
bigint decimal(38,0)
float double
double double
real double
varchar varchar
boolean int
binary blob
varbinary blob
datetime/timestamp datetime
date date
time time
If Data Services encounters a column that has an unsupported data type, it does not import the column.
However, you can configure Data Services to import unsupported data types by checking the Import
unsupported data types as VARCHAR of size option in the datastore editor dialog box.
Related Information
When you use a Snowflake table as a target in SAP Data Services, complete the target options.
After you add the Snowflake table as a target in a data flow, open the Bulk loader tab and complete the options
as described in the following table.
Bulk load Select to use bulk loading options to write the data.
Mode Select the mode for loading data in the target table:
Note
Generate files only Enable to generate data files that you can use for bulk load
ing.
When enabled, the software loads data into data files instead
of the target in the data flow. The software writes the data
files into the bulk loader directory specified in the datastore
definition.
Clean up bulk loader directory after load Enable to delete all bulk load-oriented files from the bulk
load directory after the load is complete.
Option Description
Column comparison Specifies how the software maps the input columns to per
sistent cache table columns.
Number of loaders Sets the number of threads to generate multiple data files
for a parallel load job. Enter a positive integer for the number
of loaders (threads).
Also complete error handling and transaction control options as described in the following topics in the
Reference Guide:
Access various cloud storages through file location objects and gateways.
File location objects specify specific file transfer protocols so that SAP Data Services safely transfers data from
server to server.
For information about the SAP Big Data Services to access Hadoop in the cloud, see Supplement for Big Data
Services.
Related Information
Amazon Simple Storage Service (S3) is a product of Amazon Web Services that provides scalable storage in
the cloud.
Store large volumes of data in an Amazon S3 cloud storage account. Then use SAP Data Services to securely
download your data to a local directory. Configure a file location object to specify both your local directory and
your Amazon S3 directory.
The following table describes the Data Services built-in functions specific for Amazon S3.
Built-In Functions
Function Description
Related Information
When you configure a file location object for Amazon S3, complete all applicable options, especially the options
specific to Amazon S3.
Use a file location object to access data or upload data stored in your Amazon S3 account. To view options
common to all file location objects, see the Reference Guide. The following table describes the file location
options that are specific to the Amazon S3 protocol.
You must have "s3:ListBucket" rights in order to view a list of buckets or a special bucket.
Option Description
• None
• Amazon S3-Managed Keys
• AWS KMS-Managed Keys
• Customer-Provided Keys
AWS KMS Key ID Specifies whether to create and manage encryption keys via
the Encryption Keys section in AWS IAW console.
AWS KMS Encryption Context Specifies the encryption context of the data.
Example
Enter
eyJmdWxsTmFtZSI6ICJKb2huIENvbm5vciIgf
SANCg== in the encryption context option.
Communication Protocol/Endpoint URL Specifies the communication protocol you use with S3.
• http
• https
• Enter the endpoint URL.
Example
Example
Request Timeout (ms) Specifies the time to wait before the system times out while
trying to complete the request to the specified Amazon S3
account in milliseconds (ms).
The system logs the timeout value in the trace file when you
configure the job to log trace messages:
Note
Connection Retry Count Specifies the number of times the software tries to upload or
download data before stopping the upload or download.
Batch size for uploading data, MB Specifies the size of the data transfer to use for uploading
data to S3.
Batch size for downloading data, MB Specifies the size of the data transfer Data Services uses to
download data from S3.
Storage Class Specifies the S3 cloud storage class to use to restore files.
Note
For more information about the storage classes, see the Am
azon AWS documentation.
Remote directory Optional. Specifies the name of the directory for Amazon S3
to transfer files to and from.
Local directory Optional. Specifies the name of the local directory to use to
create the files. If you leave this field empty, Data Services
uses the default Data Services workspace.
Proxy host, port, user name, password Specifies proxy information when you use a proxy server.
For information about load_from_s3_to_reshift and other built-in functions, see the Reference Guide.
For more information about file location common options, see the Reference Guide.
In SAP Data Services Designer, we refer to Azure Blob Storage as Azure Cloud Storage.
To access your Azure Blob Storage from Data Services, create an Azure Cloud Storage file location object.
The following table describes the Data Services built-in functions specific for Azure Cloud Storage.
Built-In Functions
Function Description
load_from_azure_cloudstorage_to_synapse Loads data from your Azure Cloud Storage to your Azure
Synapse Analytics workspace.
For details about the built-in function, see the “Data Services Functions and Procedures” section in the
Reference Guide.
Related Information
Use a file location object to download data from or upload data to your Azure Blob Storage account.
The following table lists the file location object descriptions for the Azure Cloud Storage protocol. To view
options common to all file location objects, see the Reference Guide.
Note
To work with Azure Blob data in your Data Lake Storage account, see Azure Data Lake Storage [page 130].
Option Description
Protocol Choose Azure Cloud Storage for the type of file transfer pro
tocol.
Account Name Enter the name for the Azure Data Lake Storage account.
Data Services supports only one type for Azure Blob Stor
age: Container.
Authorization Type Choose the type of storage access signature (SAS) authori
zation for Azure storage.
Note
Shared Access Signature URL Enter the access URL that enables access to a specific file
(blob) or blobs in a container. Azure recommends that you
use HTTPS instead of HTTP.
Note
Account Shared Key Specify the Account Shared Key. Obtain a copy from the
Azure portal in the storage account information.
Note
Connection Retry Count Specify the number of times the computer tries to create a
connection with the remote server after a connection fails.
Request Timeout (ms) Specifies the time to wait before the system times out while
trying to complete the request to the specified Azure Cloud
Storage in milliseconds (ms).
The system logs the timeout value in the trace file when you
configure the job to log trace messages:
Batch size for uploading data, MB Specify the maximum size of a data block per request when
transferring data files. The limit is 100 MB.
Caution
Batch size for downloading data, MB Specify the maximum size of a data range to be downloaded
per request when transferring data files. The limit is 4 MB.
Caution
Number of threads Specify the number of upload threads and download threads
for transferring data to Azure Cloud Storage. The default
value is 1.
Remote Path Prefix Optional. Specify the file path for the remote server, exclud
ing the server name. You must have permission to this direc
tory.
If you leave this option blank, the software assumes that the
remote path prefix is the user home directory used for FTP.
Example
Local Directory Specify the path of your local server directory for the file
upload or download.
• must exist
• located where the Job Server resides
• you have appropriate permissions for this directory
Proxy Host, Port, User Name, Password Optional. Specify the proxy information when you use a
proxy server.
Related Information
The number of threads is the number of parallel uploaders or downloaders to be run simultaneously when you
upload or download blobs.
The Number of threads setting affects the efficiency of downloading and uploading blobs to or from Azure
Cloud Storage.
To determine the number of threads to set for the Azure file location object, base the number of threads on the
number of logical cores in the processor that you use.
8 8
16 16
The software automatically re-adjusts the number of threads based on the blob size you are uploading or
downloading. For example, when you upload or download a small file, the software may adjust to use fewer
numbers of threads and use the block or range size you specified in the Batch size for uploading data, MB or
Batch size for downloading data, MB options.
When you upload a large file to an Azure container, the software may divide the file into the same number
of lists of blocks as the setting you have for Number of threads in the file location object. For example, when
the Number of threads is set to 16 for a large file upload, the software divides the file into 16 lists of blocks.
Additionally, each thread reads the blocks simultaneously from the local file and also uploads the blocks
simultaneously to the Azure container.
When all the blocks are successfully uploaded, the software sends a list of commit blocks to the Azure Blob
Service to commit the new blob.
If there is an upload failure, the software issues an error message. If they already existed before the upload
failure, the blobs in the Azure container stay intact.
When you set the number of threads correctly, you may see a decrease in upload time for large files.
When you download a large file from the Azure container to your local storage, the software may divide the
file into the Number of threads setting in the file location object. For example, when the Number of threads is
When your software downloads a blob from an Azure container, it creates a temporary file to hold all of the
threads. When all of the ranges are successfully downloaded, the software deletes the existing file from your
local storage if it existed, and renames the temporary file using the name of the file that was deleted from local
storage.
If there is a download failure, the software issues an error message. The existing data in local storage stays
intact if it existed before the download failure.
When you set the number of threads correctly, you may see a decrease in download time.
Related Information
Use an Azure Data Lake Storage file location object to download data from and upload data to your Azure Data
Lake Storage.
For information about using variables and parameters to increase the flexibility of jobs, work flows, and data
flows, see the Designer Guide.
Data Services provides a built-in function that loads data from your Azure Data Lake Storage to Azure Synapse
Analytics. The function is load_from_azure_datalakestore_to_synapse. For details about the built-in
function, see the Reference Guide.
Use Azure Data Lake Storage Gen1 for big data analytic processing. To create a file location object for Gen1,
make sure that you select Gen1 for the Version option.
Azure Data Lake Storage Gen2 has all of the capabilities of Gen1 plus it’s built on Azure Blob storage. To create
a file location object for Gen2, make sure that you select Gen2 for the Version option.
The following table describes the file location options for the Azure Data Lake Storage protocol. The table
combines options for Azure Data Lake Storage Gen1 and Gen2. The table contains a Version column that
indicates the applicable version for the option.
Note
For descriptions of the common file location options, see the Reference Guide.
Account Shared Key When Authorization Type is set to Shared Key, enter the Gen2
account shared key that you obtain from your adminis
trator.
Request Timeout (ms) Specifies the time to wait before the system times out Gen2
while trying to complete the request to the specified
Azure Data Lake Storage in milliseconds (ms).
The system logs the timeout value in the trace file when
you configure the job to log trace messages:
Data Lake Store name Name of the Azure Data Lake Storage to access. Both
Service Principal ID Obtain from your Azure Data Lake Storage administra Gen1
tor.
Tenant ID Obtain from your Azure Data Lake Storage administra Gen1
tor.
Password Obtain from your Azure Data Lake Storage administra Gen1
tor.
Batch size for uploading data (MB) Maximum size of a data block to upload per request Both
Caution
Batch size for downloading data Maximum size of a data range to download per request Both
(MB)
when transferring data files.
Caution
Tip
Remote path prefix Directory path for your files in the Azure Data Lake Both
Example
adl://
<yourdatastoreName>.azuredatalakes
tore.net/<FolderName>/
<subFolderName>
<FolderName>/<subFolderName>
Local directory Path to the local directory for your local Data Lake Stor Both
age data.
Container Container name. Must be lower case and have a length Gen2
of more than 3 characters.
Related Information
A Google Cloud Storage (GCS) file location contains file transfer protocol information for moving large data
files, 10 MB and larger, between GCS and SAP Data Services.
To work with your data from the Google Cloud Platform, create a file location object that contains your account
connection information and Google file transfer protocol. Use the file location in the following ways:
• As a source, select the GCS file location name in the Cloud Storage File Location object option in a Google
BigQuery datastore configuration.
• As a target, enter the location and file name for the GCS file location object in the Location or File location
option in a target editor.
You can also use the information from the GCS file location in the built-in function load_from_gcs_to_gbq,
to load data from your GCS to a Google BigQuery table. The function includes the Google BigQuery datastore,
which names the GCS file location. For details about the function, see the Reference Guide.
Uploading to GCS can use a large amount of local disk space. When you upload data to GCS in flat files, XML
files, or JSON files, consider setting the number of rows in the target editor option Batch size (rows). The option
can reduce the amount of local disk space that Data Services uses to upload your data.
Note
If you set Location or File location to a location and file name for a non-GCS file location object, Data
Services ignores the setting in Batch size (rows).
To learn about file location objects, and to understand how SAP Data Services uses the file transfer protocol in
the file location configuration, see the Reference Guide.
Related Information
Complete the options in a Google Cloud Storage (GCS) file location to be able to extract data from and load
data to your GCS account.
The GCS file location contains connection and access information for your GCS account. The following table
lists the file location object descriptions for the Google Cloud Storage file location. For descriptions of common
options, and for more information about file location objects, see the Reference Guide.
Upload URL Specifies the URL for uploading data to GCS. Accept
the default, which is https://www.googleapis.com/
upload/storage/v1.
Download URL Specifies the URL for extracting data from GCS. Accept
the default, which is https://www.googleapis.com/
storage/v1.
Authentication Server URL Specifies the Google server URL plus the name of the Web
access service provider, which is OAuth 2.0.
Authentication Access Scope Specifies the specific type of data access permission.
Service Account Email Address Specifies the email address from your Google project. This
email is the same as the service account email address that
you enter into the applicable Google BigQuery datastore.
Note
Service Account Private Key Specifies the P12 or JSON file you generated from your Goo
gle project and stored locally.
Note
Click Browse and open the location where you saved the file.
Select the .p12 or .JSON file and click Open.
Service Account Signature Algorithm Specifies the algorithm type that Data Services uses to sign
JSON Web tokens.
Data Services uses this value, along with your service ac
count private key, to obtain an access token from the Au
thentication Server.
Substitute Access Email Address Optional, for Google BigQuery. Enter the substitute email
address from your Google BigQuery datastore.
Client ID Specifies the OAuth client ID for Data Services. To get the
client ID, go to the Google API Console.
Client Secret Specifies the OAuth client secret for Data Services. To get
the client secret, go to the Google API Console.
Request Timeout (ms) Specifies the time to wait before the system times out while
trying to complete the request to the specified Google Cloud
Storage in milliseconds (ms).
The system logs the timeout value in the trace file when you
configure the job to log trace messages:
Web Service URL Specifies the Data Services Web Services server URL that
the data flow uses to access the Web server.
Note
Connection Retry Count Specifies the number of times Data Services tries to create
a connection with the remote server after the initial connec
tion attempt fails.
Batch size for uploading data, MB Specifies the maximum size for a block of data to upload per
request.
Batch size for downloading data, MB Specifies the maximum size for a block of data to download
per request.
Bucket Specifies the GCS bucket name, which is the name of the
basic container that holds your data in GCS.
Region (for new bucket only) Select a region for the specified bucket. Applicable only
when you enter a new bucket name for Bucket. If you select
an existing bucket, Data Services ignores the region selec
tion.
Remote Path Prefix Specifies the location of the Google Cloud Storage bucket.
Local Directory Specifies the file path of the local server that you use for this
file location object.
Note
Proxy Host, Port, User Name, Password Optional. Species the proxy server information when you use
a proxy server.
Related Information
Decrease the local disk space used when SAP Data Services uploads a large file of generated data to your
Google Cloud Storage (GCS) account.
Data Services uses the file transfer protocol in a GCS file location to load large amounts of data to your GCS
account. Large size is 10 MB or more. During the load process, Data Services first loads all records to a local
temporary file before it uploads data to your GCS account. The first load process uses local disk space equal to
the size of the file.
Decrease local disk space usage by using the option Batch size (rows). The option limits the load to a set
number of rows. Data Services loads the set number of rows in batches to the local file, which decreases the
local disk space used.
Find the Batch size (rows) option in the target editor for flat files, XML files, and JSON files. The following figure
shows the setting in the target editor for a flat file:
When you set Batch size (rows) to 1 or more, Data Services uploads the batch rows in a single thread.
Therefore, Data Services ignores the setting in Parallel process threads. Additionally, Data Services does not
use compression, so it ignores a setting of gzip for Compression type in the GCS file location.
Caution
If you set Location or File location to a location and file name for a non-GCS file location object, Data
Services ignores the setting in Batch size (rows).
Disable Batch size (rows) by leaving the default setting of zero (0).
The following example uses simple row and file sizes for illustration. Sizes may not realistically reflect actual
data.
Example
Your generated file contains 10,000 rows and is about 10 MB. You set Batch size (rows) to 2000, which is 2
MB. Then you execute the job.
Data Services loads the first 2000 rows to the local temporary file, using 2 MB of local disk space. After all
2000 rows finish loading, it uploads the 2000 rows from the local file to your GCS account. When it finishes
loading the 2000 rows to GCS, Data Services deletes the rows in the local file, which frees up 2 MB of
local disk storage. Data Services performs this process five times for a total of 10,000 rows. The maximum
amount of local disk storage the process uses is 2 MB, compared to 10 MB without using the Batch size
(rows) option.
Related Information
Consider your environment and your data before you set the Batch size (rows) option.
Before you set the option Batch size (rows) in the target file format editor, consider your goals. Also keep in
mind the size of one row of data. When each row contains many fields, the record size may determine the
number of rows to set.
You want the best performance regardless of the local disk Set Batch size (rows) to zero (0) to disable it, and set the
space the process uses.
performance optimization settings in the GCS file location.
You achieve the best performance when you use the settings
in the file location object. However, the upload uses local
disk space equal to the size of the entire file.
You want to conserve local disk space, but you’re also inter Use Batch size (rows).
ested in optimizing performance.
Run test jobs to determine the setting that best meets your
goal.
• A higher setting uses more local disk space, but still less
space than without using the option.
• A lower setting uses less local disk space than without
the option.
You don't want to use more than a specific amount of local Work backwards to determine the setting for Batch size
disk space. (rows), beginning with the maximum amount of disk space
to use for the upload transaction.
Examples
Example
Best performance
You plan to execute the job that uploads data to your GCS account over night when there isn’t a large
demand for local disk space. Therefore, you leave the Batch size (rows) option set to the default of zero (0).
Then you configure the GCS file location and set the following options for optimized performance:
• Compression Type
• Batch size for uploading data, MB
• Number of threads
Note
The settings in the GCS file location do not affect loading the data to the local temporary file, but they
affect loading from the temporary file to GCS.
Example
Compare settings
The following table compares option settings for Batch size (rows) and how it affects local disk storage and
performance. For this example, generated data contains 10,000 rows and the total size is 10 MB.
Compare settings
Settings Example 1 Example 2
Number of connections 2 10
Example
You design the Data Services data flow to add additional columns to each row of data, making each row
longer and larger than before processing. You base the number of rows to set for Batch size (rows) using a
maximum disk space.
If the generated data contains 10,000 rows, Data Services connects to GCS a total of seven times, with the
last batch containing only 500 rows.
Parent topic: Decrease local disk usage when uploading [page 141]
Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:
• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:
• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.
• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.
• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.
Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.
Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.
SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.