0% found this document useful (0 votes)

88 views

Sap Hana

This document provides guidance on using various big data technologies as sources and targets in SAP Data Services including Apache Cassandra, Apache Hadoop, HP Vertica, MongoDB, Apache Impala, PostgreSQL, SAP HANA, Amazon Athena, cloud databases like Amazon Redshift, Azure SQL Database, Google BigQuery and Snowflake, and cloud storages like Amazon S3 and Azure Blob Storage. It covers topics such as configuration settings, data type mappings, and best practices for loading and transforming data with each technology.

Uploaded by

Andrea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Sap Hana

Uploaded by

Andrea

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 146

PUBLIC

Document Version: 4.3 Support Package 2 (14.3.2.0) – 2023-04-27

Data Services Supplement for Big Data

THE BEST RUN

Content

1 About this supplement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Naming conventions and variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Big data in SAP Data Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

3.1 Apache Cassandra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Setting ODBC driver configuration on Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Data source properties for Cassandra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Apache Hadoop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Hadoop Distributed File System (HDFS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Hadoop Hive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Upload data to HDFS in the cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Google Cloud Dataproc clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
3.3 HP Vertica. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Enable MIT Kerberos for HP Vertica SSL protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Creating a DSN for HP Vertica with Kerberos SSL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Creating HP Vertica datastore with SSL encryption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Increasing loading speed for HP Vertica. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
HP Vertica data type conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
HP Vertica table source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
HP Vertica target table configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 MongoDB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
MongoDB metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
MongoDB as a source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
MongoDB as a target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
MongoDB template documents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Preview MongoDB document data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Parallel Scan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Reimport schemas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Searching for MongoDB documents in the repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Apache Impala. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Download the Cloudera ODBC driver for Impala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Creating an Apache Impala datastore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Datastore options for PostgreSQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Configure the PostgreSQL ODBC driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
Configuring ODBC driver for SSL/TLS X509 PostgresSQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Data Services Supplement for Big Data

2 PUBLIC Content
Import PostgreSQL metadata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
PostgreSQL source, target, and template tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
PostgreSQL data type conversions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.7 SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Cryptographic libraries and global.ini settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
X.509 authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
JWT authentication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Bulk loading in SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Creating stored procedures in SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
SAP HANA database datastores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Configuring DSN for SAP HANA on Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Configuring DSN for SAP HANA on Unix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Datatype conversion for SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Using spatial data with SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.8 Amazon Athena. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Athena data type conversions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
ODBC capacities and functions for Athena. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 Cloud computing services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.1 Cloud databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84
Amazon Redshift database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Azure SQL database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Google BigQuery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Google BigQuery ODBC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
SAP HANA Cloud, data lake database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Snowflake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Cloud storages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Amazon S3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Azure Blob Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .123
Azure Data Lake Storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .130
Google Cloud Storage file location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Data Services Supplement for Big Data

Content PUBLIC 3
1 About this supplement

This supplement contains information about the big data products that SAP Data Services supports.

The supplement contains information about the following:

• Supported big data products

• Supported cloud computing technologies including cloud databases and cloud storages.

Find basic information in the Reference Guide, Designer Guide, and some of the applicable supplement guides.
For example, to learn about datastores and creating datastores, see the Reference Guide. To learn about Google
BigQuery, refer to the Supplement for Google BigQuery.

Data Services Supplement for Big Data

4 PUBLIC About this supplement
2 Naming conventions and variables

This documentation uses specific terminology, location variables, and environment variables that describe
various features, processes, and locations in SAP Data Services.

Terminology

SAP Data Services documentation uses the following terminology:

• The terms Data Services system and SAP Data Services mean the same thing.
• The term BI platform refers to SAP BusinessObjects Business Intelligence platform.
• The term IPS refers to SAP BusinessObjects Information platform services.

 Note

Data Services requires BI platform components. However, when you don't use other SAP applications,
IPS, a scaled back version of BI, also provides these components for Data Services.

• CMC refers to the Central Management Console provided by the BI or IPS platform.
• CMS refers to the Central Management Server provided by the BI or IPS platform.

Variables

The following table describes the location variables and environment variables that are necessary when you
install and configure Data Services and required components.

Data Services Supplement for Big Data

Naming conventions and variables PUBLIC 5
Variables Description

INSTALL_DIR The installation directory for SAP applications such as Data

Services.

Default location:

• For Windows: C:\Program Files (x86)\SAP

BusinessObjects
• For UNIX: $HOME/sap businessobjects

 Note

INSTALL_DIR isn't an environment variable. The in

stallation location of SAP software can be different than
what we list for INSTALL_DIR based on the location
that your administrator sets during installation.

BIP_INSTALL_DIR The directory for the BI or IPS platform.

Default location:

• For Windows: <INSTALL_DIR>\SAP

BusinessObjects Enterprise XI 4.0

 Example

C:\Program Files
(x86)\SAP BusinessObjects\SAP
BusinessObjects Enterprise XI 4.0

• For UNIX: <INSTALL_DIR>/enterprise_xi40

 Note

These paths are the same for both BI and IPS.

 Note

BIP_INSTALL_DIR isn't an environment variable.

The installation location of SAP software can be different
than what we list for BIP_INSTALL_DIR based on the
location that your administrator sets during installation.

Data Services Supplement for Big Data

6 PUBLIC Naming conventions and variables
Variables Description

<LINK_DIR> An environment variable for the root directory of the Data

Services system.

Default location:

• All platforms
<INSTALL_DIR>\Data Services

 Example

C:\Program Files (x86)\SAP

BusinessObjects\Data Services

Data Services Supplement for Big Data

Naming conventions and variables PUBLIC 7
Variables Description

<DS_COMMON_DIR> An environment variable for the common configuration di

rectory for the Data Services system.

Default location:

• If your system is on Windows (Vista and newer):

<AllUsersProfile>\SAP
BusinessObjects\Data Services

 Note

The default value of <AllUsersProfile> environ

ment variable for Windows Vista and newer is
C:\ProgramData.

 Example

C:\ProgramData\SAP
BusinessObjects\Data Services

• If your system is on Windows (Older versions such as

XP)
<AllUsersProfile>\Application
Data\SAP BusinessObjects\Data
Services

 Note

The default value of <AllUsersProfile> en

vironment variable for Windows older versions
is C:\Documents and Settings\All
Users.

 Example

C:\Documents and Settings\All

Users\Application Data\SAP
BusinessObjects\Data Services

• UNIX systems (for compatibility)

<LINK_DIR>

The installer automatically creates this system environment

variable during installation.

 Note

Starting with Data Services 4.2 SP6, users

can designate a different default location for

Data Services Supplement for Big Data

8 PUBLIC Naming conventions and variables
Variables Description

<DS_COMMON_DIR> during installation. If you can't find

the <DS_COMMON_DIR> in the listed default location, ask
your System Administrator to find out where your de
fault location is for <DS_COMMON_DIR>.

<DS_USER_DIR> The environment variable for the user-specific configuration

directory for the Data Services system.

Default location:

• If you're on Windows (Vista and newer):

<UserProfile>\AppData\Local\SAP
BusinessObjects\Data Services

 Note

The default value of <UserProfile> environment

variable for Windows Vista and newer versions is
C:\Users\{username}.

• If you're on Windows (Older versions such as XP):

<UserProfile>\Local
Settings\Application Data\SAP
BusinessObjects\Data Services

 Note

The default value of <UserProfile> en

vironment variable for Windows older ver
sions is C:\Documents and Settings\
{username}.

 Note

The system uses <DS_USER_DIR> only for Data

Services client applications on Windows. UNIX plat
forms don't use <DS_USER_DIR>.

The installer automatically creates this system environment

variable during installation.

Data Services Supplement for Big Data

Naming conventions and variables PUBLIC 9
3 Big data in SAP Data Services

SAP Data Services supports many types of big data through various object types and file formats.

Apache Cassandra [page 10]

Apache Cassandra is an open-source data storage system that you can access with SAP Data Services.

Apache Hadoop [page 13]

Use SAP Data Services to connect to Apache Hadoop frameworks including Hadoop Distributive File
Systems (HDFS) and Hive.

HP Vertica [page 16]

Access HP Vertica data by creating an HP Vertica database datastore in SAP Data Services Designer.

MongoDB [page 32]

To read data from MongoDB sources and load data to other SAP Data Services targets, create a
MongoDB adapter.

Apache Impala [page 44]

Create an ODBC datastore to connect to Apache Impala in Hadoop.

PostgreSQL [page 48]

To use your PostgreSQL tables as sources and targets in SAP Data Services, create a PostgreSQL
datastore and import your tables and other metadata.

SAP HANA [page 58]

Process your SAP HANA data in SAP Data Services by creating an SAP HANA database datastore.

Amazon Athena [page 80]

Use the Simba Athena ODBC driver to connect to Amazon Athena.

3.1 Apache Cassandra

Apache Cassandra is an open-source data storage system that you can access with SAP Data Services.

Data Services natively supports Cassandra as an ODBC data source with a DSN connection. Cassandra uses
the generic ODBC driver. Use Cassandra on Windows or Linux operating systems.

Use Cassandra data for the following tasks:

• Use as sources, targets, or template tables

• Preview data
• Query using distinct, where, group by, and order by
• Write scripts using functions such as math, string, date, aggregate, and ifthenelse

Before you use Cassandra with Data Services, ensure that you perform the following setup tasks:

• Add the appropriate environment variables to the al_env.sh file.

• For Data Services on Linux platforms, configure the ODBC driver using the Connection Manager.

Data Services Supplement for Big Data

10 PUBLIC Big data in SAP Data Services
 Note

For Data Services on Windows platforms, use the generic ODBC driver.

For more information about configuring database connectivity for UNIX and Linux, see the Administrator Guide.

Setting ODBC driver configuration on Linux [page 11]

Use the Connection Manager to configure the ODBC driver for Apache Cassandra on Linux.

Data source properties for Cassandra [page 12]

Complete data source properties in the Connection Manager when you configure the ODBC driver for
SAP Data Services on Linux.

3.1.1 Setting ODBC driver configuration on Linux

Use the Connection Manager to configure the ODBC driver for Apache Cassandra on Linux.

Before you complete the following steps, read the topic and subtopics under “Configure database connectivity
for UNIX and Linux” in the Administrator Guide.

Use the GTK+2 library to make a graphical user interface for the Connection Manager. Connection Manager is
a command-line utility. To use it with a UI, install the GTK+2 library. For more information about obtaining and
installing GTK+2, see https://www.gtk.org/ . The following steps are for the UI for Connection Manager.

1. Open a command prompt and set $ODBCINI to a file in which the Connection Manager defines the DSN.
Ensure that the file is readable and writable.

 Sample Code

$ export ODBCINI=<dir-path>/odbc.ini

touch $ODBCINI

The Connection Manager uses the $ODBCINI file and other information that you enter for data sources, to
define the DSN for Cassandra.

 Note

Do not point to the Data Services ODBC .ini file.

2. Start the Connection Manager user interface by entering the following command:

 Sample Code

$ cd <LINK_DIR>/bin/

$ /DSConnectionManager.sh

 Note

<LINK_DIR> is the Data Services installation directory.

3. In Connection Manager, open the Data Sources tab, and click Add to display the list of database types.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 11
4. In the Select Database Type dialog box, select Cassandra and click OK.

The Configuration for... dialog box opens. It contains the absolute location of the odbc.ini file that you set
in the first step.
5. Provide values for additional connection properties for the Cassandra database type as applicable.
6. Provide the following properties:

• User name
• Password

 Note

Data Services does not save these properties for other users.

7. To test the connection, click Test Connection.

8. Click Restart Services to restart services applicable to the Data Services installation location:

If Data Services is installed on the same machine and in the same folder as the IPS or BI platform, restart
the following services:
• EIM Adaptive Process Service
• Data Services Job Service

If Data Services is not installed on the same machine and in the same folder as the IPS or BI platform,
restart the following service:
• Data Services Job Service

Task overview: Apache Cassandra [page 10]

Related Information

Data source properties for Cassandra [page 12]

3.1.2 Data source properties for Cassandra

Complete data source properties in the Connection Manager when you configure the ODBC driver for SAP Data
Services on Linux.

The Connection Manager configures the $ODBCINI file based on the property values that you enter in the Data
Sources tab. The following table lists the properties that are relevant for Apache Cassandra.

Data Services Supplement for Big Data

12 PUBLIC Big data in SAP Data Services
Data Source settings for Apache Cassandra
Database Type Properties on Data Sources tab

Apache Cassandra • User Name

• Database password
• Host Name
• Port
• Database
• Unix ODBC Lib Path
• Driver
• Cassandra SSL Certificate Mode [0:disabled|1:one-way|2:two-way]

Depending on the value you choose for the certificate mode, Data Services may
require you to define some or all of the following options:

• Cassandra SSL Server Certificate File

• Cassandra SSL Client Certificate File
• Cassandra SSL Client Key File
• Cassandra SSL Client Key Password
• Cassandra SSL Validate Server Hostname? [0:disabled|1:enabled]

Parent topic: Apache Cassandra [page 10]

Related Information

Setting ODBC driver configuration on Linux [page 11]

3.2 Apache Hadoop

Use SAP Data Services to connect to Apache Hadoop frameworks including Hadoop Distributive File Systems
(HDFS) and Hive.

Data Services supports Hadoop on both the Linux and Windows platform. For Windows support, Data Services
uses Hortonworks Data Platform (HDP) only. HDP allows data from many sources and formats. See the latest
Product Availability Matrix (PAM) on the SAP Support Portal for the supported versions of HDP.

For information about deploying Data Services on a Hadoop MapR cluster machine, see SAP Note 2404486 .

For information about accessing your Hadoop in the administered SAP Big Data Services, see the Supplement
for SAP Big Data Services.

For complete information about how Data Services supports Apache Hadoop, see the Supplement for Hadoop.

Hadoop Distributed File System (HDFS) [page 14]

Connect to your HDFS data using an HDFS file format or an HDFS file location in SAP Data Services
Designer.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 13
Hadoop Hive [page 14]
Use a Hive adapter datastore or a Hive database datastore in SAP Data Services Designer to connect to
the Hive remote server.

Upload data to HDFS in the cloud [page 15]

Upload data processed with Data Services to your HDFS that is managed by SAP Big Data Services.

Google Cloud Dataproc clusters [page 15]

To connect to an Apache Hadoop web interface running on Google Cloud Dataproc clusters, use a Hive
database datastore and a WebHDFS file location.

3.2.1 Hadoop Distributed File System (HDFS)

Connect to your HDFS data using an HDFS file format or an HDFS file location in SAP Data Services Designer.

Create an HDFS file format and file location with your HDFS connection information, such as the account
name, password, and security protocol. Data Services uses this information to access HDFS data during Data
Services processing.

For complete information about how Data Services supports your HDFS, see the Supplement for Hadoop.

Parent topic: Apache Hadoop [page 13]

Related Information

Hadoop Hive [page 14]

Upload data to HDFS in the cloud [page 15]
Google Cloud Dataproc clusters [page 15]

3.2.2 Hadoop Hive

Use a Hive adapter datastore or a Hive database datastore in SAP Data Services Designer to connect to the
Hive remote server.

Use the Hive adapter datastore when Data Services is installed within the Hadoop cluster. Use the Hive
adapter datastore for server-named (DSN-less) connections. Also include SSL (or the newer Transport Layer
Security TLS) for secure communication over the network.

Use a Hive database datastore when Data Services is installed on a machine either within or outside of the
Hadoop cluster. Use the Hive database datastore for either a DSN or a DSN-less connection. Also include
SSL/TLS for secure communication over the network.

For complete information about how Data Services supports Hadoop Hive, see the Supplement for Hadoop.

Parent topic: Apache Hadoop [page 13]

Data Services Supplement for Big Data

14 PUBLIC Big data in SAP Data Services
Related Information

Hadoop Distributed File System (HDFS) [page 14]

Upload data to HDFS in the cloud [page 15]
Google Cloud Dataproc clusters [page 15]

3.2.3 Upload data to HDFS in the cloud

Upload data processed with Data Services to your HDFS that is managed by SAP Big Data Services.

Big Data Services is a Hadoop distribution in the cloud. Big Data Services performs all Hadoop upgrades and
patches for you and provides Hadoop support. SAP Big Data Services was formerly known as Altiscale.

Upload your big data files directly from your computer to Big Data Services. Or, upload your big data files from
your computer to an established cloud account, and then to Big Data Services.

 Example

Access data from S3 (Amazon Simple Storage Service) and use the data as a source in Data Services. Then
upload the data to your HDFS that resides in Big Data Service in the cloud.

How you choose to upload your data is based on your use case.

For complete information about accessing your Hadoop account in Big Data Services and uploading big data,
see the Supplement for SAP Big Data Services.

Parent topic: Apache Hadoop [page 13]

Related Information

Hadoop Distributed File System (HDFS) [page 14]

Hadoop Hive [page 14]
Google Cloud Dataproc clusters [page 15]

3.2.4 Google Cloud Dataproc clusters

To connect to an Apache Hadoop web interface running on Google Cloud Dataproc clusters, use a Hive
database datastore and a WebHDFS file location.

Use a Hive datastore to browse and view metadata from Hadoop and to import metadata for use in data flows.
To upload processed data, use a Hadoop file location and a Hive template table. Implement bulk loading in the
target editor in a data flow where you use the Hive template table as a target.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 15
For complete information about how SAP Data Services supports Google Cloud Dataproc clusters, see the
Supplement for Hadoop.

Parent topic: Apache Hadoop [page 13]

Related Information

Hadoop Distributed File System (HDFS) [page 14]

Hadoop Hive [page 14]
Upload data to HDFS in the cloud [page 15]

3.3 HP Vertica

Access HP Vertica data by creating an HP Vertica database datastore in SAP Data Services Designer.

Use HP Vertica data as sources or targets in data flows. Implement SSL secure data transfer with MIT Kerberos
to access HP Vertica data securely. Additionally, configure options in the source or target table editors to
enhance HP Vertica performance.

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

SAP Data Services uses MIT Kerberos 5 authentication to securely access an HP Vertica database
using SSL protocol.

Creating a DSN for HP Vertica with Kerberos SSL [page 21]

To enable SSL for HP Vertica database datastores, first create a data source name (DSN).

Creating HP Vertica datastore with SSL encryption [page 23]

To enable SSL encryption for HP Vertica datastores, you must create a Data Source Name (DSN)
connection.

Increasing loading speed for HP Vertica [page 24]

SAP Data Services doesn't support bulk loading for HP Vertica, but there are settings you can make to
increase loading speed.

HP Vertica data type conversion [page 25]

SAP Data Services converts incoming HP Vertica data types to native data types, and outgoing native
data types to HP Vertica data types.

HP Vertica table source [page 27]

Configure options for an HP Vertica table as a source by opening the source editor in the data flow.

HP Vertica target table configuration [page 28]

Configure options for an HP Vertica table as a target by opening the target editor in the data flow.

Data Services Supplement for Big Data

16 PUBLIC Big data in SAP Data Services
3.3.1 Enable MIT Kerberos for HP Vertica SSL protocol

SAP Data Services uses MIT Kerberos 5 authentication to securely access an HP Vertica database using SSL
protocol.

You must have Database Administrator permissions to install MIT Kerberos 5 on your Data Services client
machine. Additionally, the Database Administrator must establish a Kerberos Key Distribution Center (KDC)
server for authentication. The KDC server must support Kerberos 5 using the Generic Security Service (GSS)
API. The GSS API also supports non_MIT Kerberos implementations, such as Java and Windows clients.

 Note

Specific Kerberos and HP Vertica database processes are required before you can enable SSL protocol in
Data Services. For complete explanations and processes for security and authentication, consult your HP
Vertica user documentation and the MIT Kerberos user documentation.

MIT Kerberos authorizes connections to the HP Vertica database using a ticket system. The ticket system
eliminates the need for users to enter a password.

Edit configuration or initialization file [page 18]

After you install MIT Kerberos, define the specific Kerberos properties in the Kerberos configuration or
initialization file.

Generate secure key with kinit command [page 21]

After you've updated the configuration or initialization file and saved it to the client domain, execute the
kinit command to generate a secure key.

Parent topic: HP Vertica [page 16]

Related Information

Creating a DSN for HP Vertica with Kerberos SSL [page 21]

Creating HP Vertica datastore with SSL encryption [page 23]
Increasing loading speed for HP Vertica [page 24]
HP Vertica data type conversion [page 25]
HP Vertica table source [page 27]
HP Vertica target table configuration [page 28]
Edit configuration or initialization file [page 18]
Generate secure key with kinit command [page 21]
Creating a DSN for HP Vertica with Kerberos SSL [page 21]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 17
3.3.1.1 Edit configuration or initialization file

After you install MIT Kerberos, define the specific Kerberos properties in the Kerberos configuration or
initialization file.

After you define Kerberos properties, save the configuration or initialization file to your domain.

 Example

Save the initialization file named krb5.ini to C:\Windows.

See the MIT Kerberos documentation for information about completing the Unix krb5.conf property file or
the Windows krb5.ini property file.

Log file locations for Kerberos

The following table describes log file names and locations for the Kerberos log files.

Log file Property File name and location

Kerberos library log file

default = <value> krb5libs.log.

 Example

default = FILE:/var/log/
krb5libs.log

Kerberos Data Center log file

kdc = <value> krb5kdc.log.

 Example

kdc = FILE:/var/log/
krb5kdc.log

Administrator log file

admin_server = <value> kadmind.log.

 Example

admin_server
= FILE:/var/log/
kadmind.log

Kerberos 5 library settings

The following table describes the Kerberos 5 library settings.

Data Services Supplement for Big Data

18 PUBLIC Big data in SAP Data Services
Property Description

default_realm = <VALUE> <VALUE> = the location of your domain.

 Example

default_realm = EXAMPLE.COM

Domain location value must be in all capital letters.

dns_lookup_realm = <value> Set to False: dns_lookup_realm = false

dns_lookup_kdc = <value> Set to False: dns_lookup_kdc = false

ticket_lifetime = <value> Set number of hours for the initial ticket request.

 Example

ticket_lifetime = 24h

The default is 24h.

renew_lifetime = <value> Set number of days a ticket can be renewed after the ticket
lifetime expiration.

 Example

renew_lifetime = 7d

The default is 0.

forwardable = <value> Set to True to forward initial tickets: forwardable =

true

Kerberos realm

The following table describes the Kerberos realm property.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 19
Property Description

<kerberos_realm> = {<subsection_property = Location for each property of the Kerberos realm.

value>}
 Example

EXAMPLE.COM = {kdc=<location>
admin_server=<location>
kpasswd_server=<location>}

Properties include the following:

• KDC location
• Admin Server location
• Kerberos Password Server location

 Note

Enter host and server names in lowercase.

Kerberos domain realm

The following table describes the property for the Kerberos domain realm.

Property Description

<server_host_name>=<kerberos_realm> Maps the server host name to the Kerberos realm name. If
you use a domain name, prefix the name with a period (.).

Parent topic: Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Related Information

Generate secure key with kinit command [page 21]

Data Services Supplement for Big Data

20 PUBLIC Big data in SAP Data Services
3.3.1.2 Generate secure key with kinit command

After you've updated the configuration or initialization file and saved it to the client domain, execute the kinit
command to generate a secure key.

 Example

Enter the following command using your own information for the variables: kinit
<user_name>@<realm_name>

The following table describes the keys that the command generates.

Key Description

-k Precedes the service name portion of the Kerberos principal.

The default is vertica.

-K Precedes the instance or host name portion of the Kerberos

principal.

-h Precedes the machine host name for the server.

-d Precedes the HP Vertica database name with which to con

nect.

-U Precedes the user name of the administrator user.

For complete information about using the kinit command to obtain tickets, see the MIT Kerberos Ticket
Management documentation.

Parent topic: Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Related Information

Edit configuration or initialization file [page 18]

3.3.2 Creating a DSN for HP Vertica with Kerberos SSL

To enable SSL for HP Vertica database datastores, first create a data source name (DSN).

This procedure is for HP Vertica users who have database administrator permissions to perform these steps, or
who have been associated with an authentication method through a GRANT statement.

 Note

DSN for HP Vertica is available in SAP Data Services version 4.2 SP7 Patch 1 (14.2.7.1) or later.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 21
Before you perform the following steps, install MIT Kerberos 5 and perform all of the required steps for MIT
Kerberos authentication for HP Vertica. See your HP Vertica documentation in the security and authentication
sections for details.

To create a DSN for HP Vertica with Kerberos SSL, perform the following steps:

1. Open the ODBC Data Source Administrator.

Access the ODBC Data Source Administrator either from the Datastore Editor in Data Services Designer or
directly from your Start menu.
2. Open the System DSN tab and select Add.
3. Choose the applicable HP Vertica driver from the list and select Finish.
4. Open the Basic Settings tab and complete the options as described in the following table.

HP Vertica ODBC DSN Configuration Basic Settings tab

Option Value

DSN Enter the HP Vertica data source name.

Description Optional. Enter a description for this data source.

Database Enter the name of the database that is running on the

server.

Server Enter the server name.

Port Enter the port number on which HP Vertica listens for

ODBC connections.

The default port is 5433.

User Name Enter the database user name.

The database user must have DBADMIN permission

or must be associated with the authentication method
through a GRANT statement.

5. Optional: Select Test Connection.

If the connection fails, either continue with the configuration and fix the connection issue later, or
reconfigure the connection information and test the connection again.
6. Open the Client Settings tab and complete the options as described in the following table.

HP Vertica ODBC DSN Configuration Client Settings tab

Option Value

Kerberos Host Name Enter the name of the host computer where Kerberos is
installed.

Kerberos Service Name Enter the applicable value.

SSL Mode Select Require.

Data Services Supplement for Big Data

22 PUBLIC Big data in SAP Data Services
Option Value

Address Family Preference Select None.

Autocommit Select this option.

Driver String Conversions Select Output.

Result Buffer Size (bytes) Enter the applicable value in bytes.

The default value is 131072.

Three Part Naming Select this option.

Log Level Select No logging from the dropdown list.

7. Select Test Connection.

When the connection test is successful, select OK and close the ODBC Data Source Administrator.

Now the HP Vertica DSN that you just created is included in the DSN option in the datastore editor.

Create the HP Vertica database datastore in Data Services Designer and select the DSN that you created.

Task overview: HP Vertica [page 16]

Related Information

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

3.3.3 Creating HP Vertica datastore with SSL encryption

To enable SSL encryption for HP Vertica datastores, you must create a Data Source Name (DSN) connection.

Before you perform the following steps, an administrator must install MIT Kerberos 5, and enable Kerberos for
HP Vertica SSL protocol.

Additionally, an administrator must create an SSL DSN using the ODBC Data Source Administrator. For more
information about configuring SSL DSN with ODBC drivers, see Configure drivers with data source name
(DSN) connections in the Administrator Guide.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 23
 Note

SSL encryption for HP Vertica is available in SAP Data Services version 4.2 Support Package 7 Patch 1
(14.2.7.1) or later. Enabling SSL encryption slows down job performance.

 Note

An HP Vertica database datastore requires that you choose DSN as a connection method. DSN-less
connections aren't allowed for HP Vertica datastore with SSL encryption.

To create an HP Vertica datastore with SSL encryption, perform the following steps in Data Services Designer:

1. Select Project New Datastore.

The Datastore editor opens.

2. Complete the regular HP Vertica database datastore options.

Choose the HP Vertica client version from the Database version list, and enter your user name and
password.
3. Select Use Data Source Name (DSN).
4. Choose the HP Vertica SSL DSN that you created from the Data Source Name list.
5. Complete the applicable Advanced options and save your datastore.

Task overview: HP Vertica [page 16]

Related Information

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Creating a DSN for HP Vertica with Kerberos SSL [page 21]
Increasing loading speed for HP Vertica [page 24]
HP Vertica data type conversion [page 25]
HP Vertica table source [page 27]
HP Vertica target table configuration [page 28]

3.3.4 Increasing loading speed for HP Vertica

SAP Data Services doesn't support bulk loading for HP Vertica, but there are settings you can make to increase
loading speed.

For complete details about connecting to HP Vertica, see Connecting to Vertica in the Vertica
documentation. Make sure to select the correct version.

When you load data to an HP Vertica target in a data flow, the software automatically executes an HP Vertica
statement that contains a COPY Local statement. This statement makes the ODBC driver read and stream
the data file from the client to the server.

Data Services Supplement for Big Data

24 PUBLIC Big data in SAP Data Services
You can further increase loading speed by increasing rows per commit and enable use native connection load
balancing:

1. when you configure the ODBC driver for HP Vertica, enable the option to use native connection load
balancing.
2. In Designer, open the applicable data flow.
3. In the workspace, double-click the HP Vertica datastore target object to open it.
4. Open the Options tab in the lower pane.
5. Increase the number of rows in the Rows per commit option.

Task overview: HP Vertica [page 16]

Related Information

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Creating a DSN for HP Vertica with Kerberos SSL [page 21]
Creating HP Vertica datastore with SSL encryption [page 23]
HP Vertica data type conversion [page 25]
HP Vertica table source [page 27]
HP Vertica target table configuration [page 28]

3.3.5 HP Vertica data type conversion

SAP Data Services converts incoming HP Vertica data types to native data types, and outgoing native data
types to HP Vertica data types.

The following table contains HP Vertica data types and the native data types to which Data Services converts
them.

HP Vertica data type Data Services data type

Boolean Int

Integer, INT, BIGINT, INT8, SMALLINT, TINYINT Decimal

FLOAT Double

Money Decimal

Numeric Decimal

Number Decimal

Decimal Decimal

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 25
HP Vertica data type Data Services data type

Binary, Varbinary, Long Varbinary Blob

Long Varchar Long

Char Varchar

Varchar Varchar

Char(n), Varchar(n) Varchar(n)

DATE Date

TIMESTAMP Datetime

TIMESTAMPTZ Varchar

Time Time

TIMETZ Varchar

INTERVAL Varchar

The following table contains native data types and the HP Vertica data types to which Data Services outputs
them. Data Services outputs the converted data types to HP Vertica template tables or Data_Transfer
transform tables.

Data Services data type HP Vertica data type

Blob Long Varbinary

Date Date

Datetime Timestamp

Decimal Decimal

Double Float

Int Int

Interval Float

Long Long Varchar

Real Float

Time Time

Varchar Varchar

Data Services Supplement for Big Data

26 PUBLIC Big data in SAP Data Services
Data Services data type HP Vertica data type

Timestamp Timestamp

Parent topic: HP Vertica [page 16]

Related Information

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Creating a DSN for HP Vertica with Kerberos SSL [page 21]
Creating HP Vertica datastore with SSL encryption [page 23]
Increasing loading speed for HP Vertica [page 24]
HP Vertica table source [page 27]
HP Vertica target table configuration [page 28]

3.3.6 HP Vertica table source

Configure options for an HP Vertica table as a source by opening the source editor in the data flow.

HP Vertica source table options

Option Description

Table name Specifies the table name for the source table.

Table owner Specifies the table owner.

You cannot edit the value. Data Services automatically popu

lates with the name that you entered when you created the
HP Vertica table.

Datastore name Specifies the name of the related HP Vertica datastore.

Database type Specifies the database type.

You cannot edit this value. Data Services automatically pop

ulates with the database type that you chose when you cre
ated the datastore.

Parent topic: HP Vertica [page 16]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 27
Related Information

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Creating a DSN for HP Vertica with Kerberos SSL [page 21]
Creating HP Vertica datastore with SSL encryption [page 23]
Increasing loading speed for HP Vertica [page 24]
HP Vertica data type conversion [page 25]
HP Vertica target table configuration [page 28]

3.3.7 HP Vertica target table configuration

Configure options for an HP Vertica table as a target by opening the target editor in the data flow.

Options tab General options

Option Description

Column comparison Specifies how the software maps input columns to output
columns:

• Compare by position: Maps source columns to target

columns by position, and ignores column names.
• Compare by name: Maps source columns to target col
umns by column name. Compare by name is the default
setting.

Data Services issues validation errors when the data types of

the columns do not match.

Data Services Supplement for Big Data

28 PUBLIC Big data in SAP Data Services
Option Description

Number of loaders Specifies the number of loaders Data Services uses to load
data to the target.

Enter a positive integer. The default is 1.

There are different types of loading:

• Single loader loading: Loading with one loader.

• Parallel loading: Loading with two or more loaders.

With parallel loading, each loader receives the number of

rows indicated in the Rows per commit option, and proc
esses the rows in parallel with other loaders.

 Example

For example, if Rows per commit = 1000 and Number of

Loaders = 3:

• First 1000 rows go to the first loader

• Second 1000 rows go to the second loader
• Third 1000 rows go to the third loader
• Fourth 1000 rows go to the first loader

Options tab Error handling options

Option Description

Use overflow file Specifies whether Data Services uses a recovery file for rows
that it could not load.

• No: Data Services does not save information about un

loaded rows. The default setting is No.
• Yes: Data Services loads data to an overflow file when it
cannot load a row. When you select Yes, also complete
File Name and File Format.

File name Specifies the file name and file format for the overflow file.
Applicable only when you select Yes for Use overflow file.
File format
Enter a file name or specify a variable

The overflow file can include the data rejected and the oper
ation being performed (write_data) or the SQL command
used to produce the rejected operation (write_sql).

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 29
Update control

Option Description

Use input keys Specifies whether Data Services uses the primary keys from
the input table when the target table does not have a pri
mary key.

• Yes: Uses the primary keys from the input table when
the target table does not have primary keys.
• No: Does not use primary keys from the input table
when the target table does not have primary keys. No is
the default setting.

Update key columns Specifies whether Data Services updates key column values
when it loads data to the target table.

• Yes: Updates key column values when it loads data to

the target table.
• No: Does not update key column values when it loads
data to the target table. No is the default setting.

Auto correct load Specifies whether Data Services uses auto correct loading
when it loads data to the target table. Auto correct loading
ensures that Data Services does not duplicate the same row
in a target table. Auto correct load is useful for data recovery
operations.

• Yes: Uses auto correct loading.

 Note

Not applicable for targets in real time jobs or target

tables that contain LONG columns.

• No: Does not use auto correct loading. No is the default

setting.

For more information about auto correct loading, read about

recovery mechanisms in the Designer Guide.

Ignore columns with value Specifies a value that might appear in a source column and
that you do not want updated in the target table during auto
correct loading.

Enter a string excluding single or double quotation marks.

The string can include spaces.

When Data Services finds the string in the source column,

it does not update the corresponding target column during
auto correct loading.

Data Services Supplement for Big Data

30 PUBLIC Big data in SAP Data Services
Transaction control

Option Description

Include in transaction Specifies that this target table is included in the transaction
processed by a batch or real-time job.

• No: This target table is not included in the transaction

processed by a batch or real-time job. No is the default
setting
• Yes: The target table is included in the transaction proc
essed by a batch or real-time job. Selecting Yes enables
Data Services to commit data to multiple tables as part
of the same transaction. If loading fails for any of the
tables, Data Services does not commit any data to any
of the tables.

 Note

Ensure that the tables are from the same datastore.

Data Services does not push down a complete opera

tion to the database when transactional loading is ena
bled.

Data Services may buffer rows to ensure the correct load

order. If the buffered data is larger than the virtual memory
available, Data Services issues a memory error.

If you choose to enable transactional loading, the following

options are not available:

• Rows per commit

• Use overflow file and overflow file specification
• Number of loaders

Parent topic: HP Vertica [page 16]

Related Information

Enable MIT Kerberos for HP Vertica SSL protocol [page 17]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 31
3.4 MongoDB

To read data from MongoDB sources and load data to other SAP Data Services targets, create a MongoDB
adapter.

MongoDB is an open-source document database, which has JSON-like documents called BSON. MongoDB has
dynamic schemas instead of traditional schema-based data.

Data Services needs metadata to gain access to MongoDB data for task design and execution. Use Data
Services processes to generate schemas by converting each row of the BSON file into XML and converting XML
to XSD.

Data Services uses the converted metadata in XSD files to access MongoDB data.

To learn more about Data Services adapters, see the Supplement for Adapters.

MongoDB metadata [page 32]

Use metadata from a MongoDB adapter datastore to create sources, targets, and templates in a data
flow.

MongoDB as a source [page 33]

Use MongoDB as a source in Data Services and flatten the nested schema by using the XML_Map
transform.

MongoDB as a target [page 36]

Configure options for MongoDB as a target in your data flow using the target editor.

MongoDB template documents [page 39]

Use template documents as a target in one data flow or as a source in multiple data flows.

Preview MongoDB document data [page 41]

Use the data preview feature in SAP Data Services Designer to view a sampling of data from a
MongoDB document.

Parallel Scan [page 42]

SAP Data Services uses the MongoDB Parallel Scan process to improve performance while it generates
metadata for big data.

Reimport schemas [page 43]

When you reimport documents from your MongoDB datastore, SAP Data Services uses the current
datastore settings.

Searching for MongoDB documents in the repository [page 44]

SAP Data Services enables you to search for MongoDB documents in your repository from the object
library.

3.4.1 MongoDB metadata

Use metadata from a MongoDB adapter datastore to create sources, targets, and templates in a data flow.

MongoDB represents its embedded documents and arrays as nested data in BSON files. SAP Data Services
converts MongoDB BSON files to XML and then to XSD. Data Services saves the XSD file to the following
location: <LINK_DIR>\ext\mongo\mcache.

Data Services Supplement for Big Data

32 PUBLIC Big data in SAP Data Services
Restrictions and limitations

Data Services has the following restrictions and limitations for working with MongoDB:

• In the MongoDB collection, the tag name can't contain special characters that are invalid for the XSD file.

 Example

The following special characters are invalid for XSD files: >, <, &,/, \, #, and so on.

If special characters exist, Data Services removes them.

• Because MongDB data is always changing, the XSD doesn't always reflect the entire data structure of all
the documents in the MongoDB.
• Data Services doesn't support projection queries on adapters.
• Data Services ignores any new fields that you add after the metadata schema creation that aren't present
in the common documents.
• Data Services doesn't support push down operators when you use MongoDB as a target.

For more information about formatting XML documents, see the Nested Data section in the Designer Guide. For
more information about source and target objects, see the Data flows section of the Designer Guide.

Parent topic: MongoDB [page 32]

Related Information

MongoDB as a source [page 33]

MongoDB as a target [page 36]
MongoDB template documents [page 39]
Preview MongoDB document data [page 41]
Parallel Scan [page 42]
Reimport schemas [page 43]
Searching for MongoDB documents in the repository [page 44]

3.4.2 MongoDB as a source
Use MongoDB as a source in Data Services and flatten the nested schema by using the XML_Map transform.

The following examples illustrate how to use various objects to process MongoDB sources in data flows.

Example 1: Change the schema of a MongoDB source using the Query transform, and load output to an XML
target.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 33
 Note

Specify conditions in the Query transform. Some conditions can be pushed down and others are processed
by Data Services.

Example 2: Set a dataflow where Data Services reads the schema and then loads the schema directly into an
XML template file.

Example 3: Flatten a schema using the XML_Map transform and then load the data to a table or flat file.

 Note

Specify conditions in the XML_Map transform. Some conditions can be pushed down and others are
processed by Data Services.

MongoDB query conditions [page 35]

Use query criteria to retrieve documents from a MongoDB collection.

Push down operator information [page 35]

SAP Data Services processes push down operators with a MongoDB source in specific ways based on
the circumstance.

Parent topic: MongoDB [page 32]

Related Information

MongoDB metadata [page 32]

Data Services Supplement for Big Data

34 PUBLIC Big data in SAP Data Services
Push down operator information [page 35]

3.4.2.1 MongoDB query conditions

Use query criteria to retrieve documents from a MongoDB collection.

Use query criteria as a parameter of the db.<collection>.find() method. Add MongoDB query conditions
to a MongoDB table as a source in a data flow.

To add a MongoDB query format, enter a value next to the Query criteria parameter in the source editor
Adapter Source tab. Ensure that the query criteria is in MongoDB query format. For example, { type:
{ $in: [‘food’, ’snacks’] } }

 Example

Given a value of {prize:100}, MongoDB returns only rows that have a field named “prize” with a value of
100. If you don’t specify the value 100, MongoDB returns all the rows.

Configure a Where condition so that Data Services pushes down the condition to MongoDB. Specify a Where
condition in a Query or XML_Map transform, and place the Query or XML_Map transform after the MongoDB
source object in the data flow. MongoDB returns only the rows that you want.

For more information about the MongoDB query format, consult the MongoDB Web site.

 Note

If you use the XML_Map transform, it may have a query condition with a SQL format. Data Services
converts the SQL format to the MongoDB query format and uses the MongoDB specification to push down
operations to the source database. In addition, be aware that Data Services does not support push down of
query for nested arrays.

Parent topic: MongoDB as a source [page 33]

Related Information

Push down operator information [page 35]

3.4.2.2 Push down operator information

SAP Data Services processes push down operators with a MongoDB source in specific ways based on the
circumstance.

Push down behavior:

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 35
• Data Services does not push down Sort by conditions.
• Data Services pushes down Where conditions.
• Data Services does not push down the nested array when you use a nested array in a Where condition.
• Data Services does not support push down operators when you use MongoDB as a target.

Data Services supports the following operators when you use MongoDB as a source:

• Comparison operators: =, !=, >, >=, <, <=, like, and in.
• Logical operators: and and or in SQL query.

Parent topic: MongoDB as a source [page 33]

Related Information

MongoDB query conditions [page 35]

3.4.3 MongoDB as a target

Configure options for MongoDB as a target in your data flow using the target editor.

About the <_id> field

SAP Data Services considers the <_id> field in MongoDB data as the primary key. If you create a new
MongoDB document and include a field named <_id>, Data Services recognizes that field as the unique
BSON ObjectID. If a MongoDB document contains more than one <_id> field at different levels, Data Services
considers only the <_id> field at the first level as the BSON Object Id.

The following table contains descriptions for options in the Adapter Target tab of the target editor.

Adapter Target tab options

Option Description

Use auto correct Specifies the mode Data Services uses for MongoDB as a target datastore.

• True: Uses Upsert mode for the writing behavior. Updates the document with the same
<_id> field or it inserts a new <_id> field.

 Note

Selecting True may slow the performance of writing operations.

• False: Uses Insert mode for writing behavior. If documents have the same <_id> field in
the MongoDB collection, then Data Services issues an error message.

Data Services Supplement for Big Data

36 PUBLIC Big data in SAP Data Services
Option Description

Write concern level Specifies the MongoDB write concern level that Data Services uses for reporting the success
of a write operation. Enable or disable different levels of acknowledgement for writing opera
tions.

• Acknowledged: Provides acknowledgment of write operations on a standalone mongod or

the primary in a replica set. Acknowledged is the default setting.
• Unacknowledged: Disables the basic acknowledgment and only returns errors of socket
exceptions and networking errors.
• Replica Set Acknowledged: Guarantees that write operations have propagated success
fully to the specified number of replica set members, including the primary.
• Journaled: Acknowledges the write operation only after MongoDB has committed the
data to a journal.
• Majority: Confirms that the write operations have propagated to the majority of voting
nodes.

Use bulk Specifies whether Data Services executes writing operations in bulk. Bulk may provide better
performance.

• True: Runs write operation in bulk for a single collection to optimize the CRUD efficiency.
If the write operation in a bulk is more than 1000, MongoDB automatically splits into
multiple bulk groups.
• False: Does not run write operation in bulk.

For more information about bulk, ordered bulk, and bulk maximum rejects, see the Mon
goDB documentation at http://help.sap.com/disclaimer?site=http://docs.mongodb.org/man
ual/core/bulk-write-operations/.

Use ordered bulk Specifies the order in which Data Services executes write operations: Serial or Parallel.

• True: Executes write operations in serial.

• False: Executes write operations in parallel. False is the default setting. MongoDB proc
esses the remaining write operations even when there are errors.

Documents per commit Specifies the maximum number of documents that are loaded to a target before the software
saves the data.

• Blank: Uses the maximum of 1000 documents. Blank is the default setting.
• Enter any integer to specify a number other than 1000.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 37
Option Description

Bulk maximum rejects Specifies the maximum number of acceptable errors before Data Services fails the job.

 Note

Data Services continues to load to the target MongoDB even when the job fails.

Enter an integer. Enter -1 so that Data Services ignores and does not log bulk loading errors.

If the number of actual errors is less than, or equal to the number you specify here, Data
Services allows the job to succeed and logs a summary of errors in the adapter instance trace
log.

Applicable only when you select True for Use ordered bulk.

Delete data before loading Deletes existing documents in the current collection before loading occurs. Retains all the
configuration, including indexes, validation rules, and so on.

Drop and re-create Specifies whether Data Services drops the existing MongoDB collection and creates a new one
with the same name before loading occurs.

• True: Drops the existing MongoDB collection and creates a new one with the same name
before loading. Ignores the value of Delete data before loading. True is the default setting.
• False: Does not drop the existing MongoDB collection and create a new one with the same
name before loading.

This option is available for template documents only.

Use audit Specifies whether Data Services creates audit files that contain write operation information.

• True: Creates audit files that contain write operation information. Stores audit files
in the <DS_COMMON_DIR>/adapters/audits/ directory. The name of the file is
<MongoAdapter_instance_name>.txt.
• False: Does not create and store audit files.

Data Services behaves in the following way when a regular load fails:

• Use audit = False: Data Services logs loading errors in the job trace log.
• Use audit = True: Data Services logs loading errors in the job trace log and in the audit log.

Data Services behaves in the following way when a bulk load fails:

• Use audit = False: Data Services creates a job trace log that provides only a summary. It
does not contain details about each row of bad data. There is no way to obtain details
about bad data.
• Use audit = True: Data Services creates a job trace log that provides only a summary but
no details. However, the job trace log provides information about where to find details
about each row of bad data in the audit file.

Parent topic: MongoDB [page 32]

Data Services Supplement for Big Data

38 PUBLIC Big data in SAP Data Services
Related Information

MongoDB metadata [page 32]

MongoDB as a source [page 33]
MongoDB template documents [page 39]
Preview MongoDB document data [page 41]
Parallel Scan [page 42]
Reimport schemas [page 43]
Searching for MongoDB documents in the repository [page 44]

3.4.4 MongoDB template documents

Use template documents as a target in one data flow or as a source in multiple data flows.

Template documents are useful in early application development when you design and test a project. After you
import data for the MongoDB datastore, Data Services stores the template documents in the object library.
Find template documents in the Datastore tab of the object library.

When you import a template document, the software converts it to a regular document. You can use the regular
document as a target or source in your data flow.

 Note

Template documents are available in Data Services 4.2.7 and later. If you upgrade from a previous version,
open an existing MongoDB datastore and then click OK to close it. Data Services updates the datastore so
that you see the Template Documents node and any other template document related options.

Template documents are similar to template tables. For information about template tables, see the Data
Services User Guide and the Reference Guide.

Creating MongoDB template documents [page 40]

Create MongoDB template documents as targets in data flows, then use the target as a source in a
different data flow.

Convert a template document into a regular document [page 41]

SAP Data Services enables you to convert an imported template document into a regular document.

Parent topic: MongoDB [page 32]

Related Information

MongoDB metadata [page 32]

MongoDB as a source [page 33]
MongoDB as a target [page 36]
Preview MongoDB document data [page 41]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 39
Parallel Scan [page 42]
Reimport schemas [page 43]
Searching for MongoDB documents in the repository [page 44]

3.4.4.1 Creating MongoDB template documents

Create MongoDB template documents as targets in data flows, then use the target as a source in a different
data flow.

To use a MongoDB template as the target or source in a data flow, first use the template as a target. To add a
MongoDB template as a target in a data flow, perform the following steps to create the target:

1. Click the template icon from the tool palette.

2. Click inside a data flow in the workspace.

The Create Template dialog box opens.

3. Enter a name for the template in Template name.

 Note

Use the MongoDB collection namespace format: database.collection. Don’t exceed 120 bytes.

4. Select the related MongoDB datastore from the In datastore dropdown list.
5. Click OK.
6. To use the template document as a target in the data flow, connect the template document to the object
that comes before the template document.

Data Services automatically generates a schema based on the object directly before the template
document in the data flow.

 Restriction

The field <_id> is the default primary key of the MongoDB collection. Therefore, make sure that you
correctly configure the <_id> field in the output schema of the object that comes directly before
the target template. If you don't include <_id> in the output schema, the following error appears
when you view the data: “An element named <_id> present in the XML data input does
not exist in the XML format used to set up this XML source in the data flow
<dataflow>. Validate your XML data.”

7. Click Save.

The template document icon in the data flow changes, and Data Services adds the template document to
the object library. Find the template document in the applicable database node under Templates.

Convert the template document into a regular document by selecting to import the template document in the
object library. Then you can use the template document as a source or a target document in other data flows.

Task overview: MongoDB template documents [page 39]

Data Services Supplement for Big Data

40 PUBLIC Big data in SAP Data Services
Related Information

Convert a template document into a regular document [page 41]

3.4.4.2 Convert a template document into a regular

document

SAP Data Services enables you to convert an imported template document into a regular document.

Use one of the following methods to import a MongoDB template document:

• Open a data flow and select one or more template target documents in the workspace. Right-click, and
choose Import Document.
• Select one or more template documents in the Local Object Library, right-click, and choose Import
Document.

The icon changes and the document appears under Documents instead of Template Documents in the object
library.

 Note

The Drop and re-create target configuration option is available only for template target documents.
Therefore it is not available after you convert the template target into a regular document.

Parent topic: MongoDB template documents [page 39]

Related Information

Creating MongoDB template documents [page 40]

3.4.5 Preview MongoDB document data

Use the data preview feature in SAP Data Services Designer to view a sampling of data from a MongoDB
document.

Choose one of the following methods to preview MongoDB document data:

• Expand an applicable MongoDB datastore in the object library. Right-click the MongoDB document and
select View Data from the dropdown menu.
• Right-click the MongoDB document in a data flow and select View Data from the dropdown menu.
• Click the magnifying glass icon in the lower corner of either a MongoDB source or target object in a data
flow.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 41
 Note

By default, Data Services displays a maximum of 100 rows. Change this number by setting the Rows To
Scan option in the applicable MongoDB datastore editor. Entering -1 displays all rows.

For more information about viewing data, see the Designer Guide.

Parent topic: MongoDB [page 32]

Related Information

MongoDB metadata [page 32]

MongoDB as a source [page 33]
MongoDB as a target [page 36]
MongoDB template documents [page 39]
Parallel Scan [page 42]
Reimport schemas [page 43]
Searching for MongoDB documents in the repository [page 44]
MongoDB adapter datastore configuration options

3.4.6 Parallel Scan

SAP Data Services uses the MongoDB Parallel Scan process to improve performance while it generates
metadata for big data.

To generate metadata, Data Services first scans all documents in the MongoDB collection. This
scanning can be time consuming. However, when Data Services uses the Parallel Scan command
parallelCollectionScan, it uses multiple parallel cursors to read all the documents in a collection. Parallel
Scan can increase performance.

 Note

Parallel Scan works with MongoDB server version 2.6.0 and above.

For more information about the parallelCollectionScan command, consult your MongoDB
documentation.

For more information about Mongo adapter datastore configuration options, see the Supplement for Adapters.

Parent topic: MongoDB [page 32]

Data Services Supplement for Big Data

42 PUBLIC Big data in SAP Data Services
Related Information

MongoDB metadata [page 32]

MongoDB as a source [page 33]
MongoDB as a target [page 36]
MongoDB template documents [page 39]
Preview MongoDB document data [page 41]
Reimport schemas [page 43]
Searching for MongoDB documents in the repository [page 44]

3.4.7 Reimport schemas

When you reimport documents from your MongoDB datastore, SAP Data Services uses the current datastore
settings.

Reimport a single MongoDB document by right-clicking the document and selecting Reimport from the
dropdown menu.

To reimport all documents, right-click an applicable MongoDB datastore or right-click on the Documents node
and select Reimport All from the dropdown menu.

 Note

When you enable Use Cache, Data Services uses the cached schema.

When you disable Use Cache, Data Services looks in the sample directory for a sample BSON file with the
same name. If there is a matching file, the software uses the schema from the BSON file. If there isn't a
matching BSON file in the sample directory, the software reimports the schema from the database.

Parent topic: MongoDB [page 32]

Related Information

MongoDB metadata [page 32]

MongoDB as a source [page 33]
MongoDB as a target [page 36]
MongoDB template documents [page 39]
Preview MongoDB document data [page 41]
Parallel Scan [page 42]
Searching for MongoDB documents in the repository [page 44]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 43
3.4.8 Searching for MongoDB documents in the repository

SAP Data Services enables you to search for MongoDB documents in your repository from the object library.

1. Right-click in any tab in the object library and choose Search from the dropdown menu.
The Search dialog box opens.
2. Select the applicable MongoDB datastore name from the Look in dropdown menu.

The datastore is the one that contains the document for which you are searching.
3. Select Local Repository to search the entire repository.
4. Select Documents from the Object Type dropdown menu.
5. Enter the criteria for the search.
6. Click Search.
Data Services lists matching documents in the lower pane of the Search dialog box. A status line at the
bottom of the Search dialog box shows statistics such as total number of items found, amount of time to
search, and so on.

For more information about searching for objects, see the Objects section of the Designer Guide.

Task overview: MongoDB [page 32]

Related Information

MongoDB metadata [page 32]

MongoDB as a source [page 33]
MongoDB as a target [page 36]
MongoDB template documents [page 39]
Preview MongoDB document data [page 41]
Parallel Scan [page 42]
Reimport schemas [page 43]

3.5 Apache Impala

Create an ODBC datastore to connect to Apache Impala in Hadoop.

Before you create an Apache Impala datastore, download the Cloudera ODBC driver and create a data source
name (DSN). Use the datastore to connect to Hadoop and import Impala metadata. Use the metadata as a
source or target in a data flow.

Before you work with Apache Impala, be aware of the following limitations:

• SAP Data Services supports Impala 2.5 and later.

• SAP Data Services supports only Impala scalar data types. Data Services does not support complex types
such as ARRAY, STRUCT, or MAP.

Data Services Supplement for Big Data

44 PUBLIC Big data in SAP Data Services
For more information about ODBC datastores, see the Datastores section in the Designer Guide.

For descriptions of common datastore options, see the Designer Guide.

Download the Cloudera ODBC driver for Impala [page 45]

For Linux users. Before you create an Impala database datastore, connect to Apache Impala using the
Cloudera OBDC driver.

Creating an Apache Impala datastore [page 46]

To connect to your Hadoop files and access Impala data, create an ODBC datastore in SAP Data
Services Designer.

3.5.1 Download the Cloudera ODBC driver for Impala

For Linux users. Before you create an Impala database datastore, connect to Apache Impala using the Cloudera
OBDC driver.

Perform the following high-level steps to download a Cloudera ODBC driver and create a data source name
(DSN). For more in-depth information, consult the Cloudera documentation.

1. Enable Impala Services on the Hadoop server.

2. Download and install the Cloudera ODBC driver (https://www.cloudera.com/downloads/connectors/
impala/odbc/2-5-26.html ):

Select the driver that is compatible with your platform. For information about the correct driver versions,
see the SAP Product Availability Matrix (PAM).
3. Start DSConnectionManager.sh.

Either open the file or run the following command:

cd $LINK_DIR/bin/

$ ./DSConnectionManager.sh

 Example

The following shows prompts and values in DS Connection Manager that includes Kerberos and SSL:

The ODBC ini file is <path to the odbc.ini file>

There are available DSN names in the file:
[DSN name 1]
[DSN name 2]
Specify the DSN name from the list or add a new one:

<New DSN file name>

Specify the User Name:

<Hadoop user name>

Type database password:(no echo)

*Type the Hadoop password. Password does not appear after you type it for security.

Retype database password:(no echo)
Specify the Host Name:

<host name/IP address>

Specify the Port:'21050'

<port number>

Specify the Database:

default

Specify the Unix ODBC Lib Path:

*The Unix ODBC Lib Path is based on where you install the driver.
*For example, /build/unixODBC-2.3.2/lib.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 45
Specify the Driver:

/<path>/lib/64/libclouderaimpalaodbc64.so

Specify the Impala Auth Mech [0:noauth|1:kerberos|2:user|3:user-
password]:'0':

1

Specify the Kerberos Host FQDN:

<hosts fully qualified domain name>

Specify the Kerberos Realm:

<realm name>

Specify the Impala SSL Mode [0:disabled | 1:enabled]:'0'

1

Specify the Impala SSL Server Certificate File:

<path to certificate.pem>

Testing connection...

Successfully added database source.

Task overview: Apache Impala [page 44]

Related Information

Creating an Apache Impala datastore [page 46]

3.5.2 Creating an Apache Impala datastore

To connect to your Hadoop files and access Impala data, create an ODBC datastore in SAP Data Services
Designer.

Before performing the following steps, enable Impala Services on your Hadoop server. Then download the
Cloudera driver for your platform.

 Note

If you didn't create a DSN (data source name) in Windows ODBC Data Source application, you can create a
DSN in the following process.

To create an ODBC datastore for Apache Impala, perform the following steps in Designer:

1. Select Tools New Datastore.

The datastore editor opens.

2. Choose Database from the Datastore Type dropdown list.
3. Choose ODBC from the Database Type dropdown list.
4. Select ODBC Admin.

The ODBC Data Source Administrator opens.

5. Open the System DSN tab and select the Cloudera driver that you downloaded from the System Data
Sources list.
6. Select Configure.

Data Services Supplement for Big Data

46 PUBLIC Big data in SAP Data Services
The Cloudera ODBC Driver for Impala DSN Setup dialog box opens.
7. Enter the required information based on your system
8. Select Advanced Options.
9. Check Use SQL Unicode Types.
10. Close Advanced Options.
11. Optional: Enable Kerberos authentication by performing the following substeps:
a. Choose Kerberos from the Mechanism list.
b. Enter the name of the applicable realm in Realm.

A realm is a set of managed nodes that share the same Kerberos database.
c. Enter the fully qualified domain name (FQDN) of the Hive Server host in Host FQDN.
d. Enter the service principal name for the Hive server in Service Name.
e. Enable the Canonicalize Principal FQDN option.

This option canonicalizes the host FQDN in the server principal name.
12. Optional: To enable Secure Sockets Layer (SSL) protocol, perform the following substeps:
a. Choose No Authentication (SSL) from the Mechanism list.
b. Select Advanced Options.
c. Enter or browse to the Cloudera certificate file in Trusted Certificates.

The default path to the Impala certificate.pem file automatically populates.

d. Close Advanced Options.
13. Close the Cloudera ODBC Driver for Impala DSN Setup dialog box and the ODBC Data Source
Administrator.
14. Select the Cloudera DSN that you created from the Data Source Name list.

The DSN appears in the dropdown list only after you've created it.
15. Select Advanced and complete the advanced options as necessary.
a. Optional: Set the Code page option to utf-8 in the Locale group to process multibyte data in Impala
tables.
b. Optional: In the ODBC Date Function Support group, set the Week option to No.

If you don’t set the Week option to No, the result of the Data Services built-in function
week_in_year() may be incorrect.

Task overview: Apache Impala [page 44]

Related Information

Download the Cloudera ODBC driver for Impala [page 45]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 47
3.6 PostgreSQL

To use your PostgreSQL tables as sources and targets in SAP Data Services, create a PostgreSQL datastore
and import your tables and other metadata.

Prerequisites

Before you configure the PostgreSQL datastore, perform the following prerequisites:

• Download and install the latest supported PostgreSQL Server version from the official PostgreSQL Web
site at https://www.postgresql.org/download/ . Check the Product Availability Matrix (PAM) on the
SAP Support Portal to ensure that you have the supported PostgreSQL version for your version of Data
Services.
• Obtain the ODBC driver that is compatible with your version of PostgreSQL. To avoid potential
processing problems, download the ODBC driver from the official PostgreSQL Web site at https://
www.postgresql.org/ftp/odbc/versions/ . Find option descritions for configuring the ODBC driver at
https://odbc.postgresql.org/docs/config.html .

DSN or DSN-less connections

Create a PostgreSQL datastore using either a DSN or DSN-less connection.

When you create a DSN-less connection, you can optionally protect your connection with SSL/TLS and X509
single sign-on authentication when you configure the datastore.

Bulk loading

Configure the PostgreSQL datastore for bulk loading to the target PostgreSQL database. Before you can
configure the bulk loading options, obtain the PSQL tool from the official PostgreSQL website. In addition
to setting the bulk loader directory and the location for the PSQL tool in the datastore, complete the Bulk
Loader Options tab in the target editor. For complete information about the bulk loading in PostgreSQL and the
options, see the Performance Optimization Guide.

 Note

Data Services supports bulk loading for PostgreSQL DSN-less connections only.

Data Services Supplement for Big Data

48 PUBLIC Big data in SAP Data Services
Pushdown functions

Data Services supports the basic pushdown functions for PostgreSQL. For a list of pushdown functions that
Data Services supports for PostgreSQL, see SAP Note 2212730 .

UTF-8 encoding

To process PostgreSQL tables as sources in data flows, Data Services requires that all data in PostgreSQL
tables use UTF-8 encoding. Additionally, Data Services outputs data to PostgreSQL target tables using UTF 8
encoding.

Conversion to or from internal data types

Data Services converts PostgreSQL data types to data types that it can process. After processing, Data
Services outputs data and converts the data types back to the corresponding PostgreSQL data types.

Datastore options for PostgreSQL [page 49]

Complete options in the datastore editor to set the datastore type, database version, database access
information, and DSN information if applicable.

Configure the PostgreSQL ODBC driver [page 53]

Configure the PostgreSQL ODBC driver for Windows or Linux to update the configuration file with the
applicable driver information.

Configuring ODBC driver for SSL/TLS X509 PostgresSQL [page 54]

For Linux, configure the ODBC driver for SSL/TLS X509 authentication for connecting to your
PostgresSQL databases.

Import PostgreSQL metadata [page 55]

Use the PostgreSQL database datastore to access the schemas and tables in the defined database.

PostgreSQL source, target, and template tables [page 56]

Use PostgreSQL tables as sources and targets in data flows and use PostgreSQL table schemas for
template tables.

PostgreSQL data type conversions [page 57]

When you import metadata from a PostgreSQL table into the repository, SAP Data Services converts
PostgreSQL data types to Data Services native data types for processing.

3.6.1 Datastore options for PostgreSQL

Complete options in the datastore editor to set the datastore type, database version, database access
information, and DSN information if applicable.

The following table contains descriptions for the first set of options, which define the datastore type (database)
and the PostgreSQL version information.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 49
PostgreSQL datastore option descriptions
Option Value

Datastore Type Select Database.

Database Type Select PostgreSQL.

Database Version Select a version.

Check the Product Availability Matrix (PAM) to ensure

that you have the supported PostgreSQL version for your
version of Data Services.

The following table describes the specific options to complete when you create a DSN connection.

PostgreSQL DSN option descriptions

Option Description

Data Source Name Specifies the name of the DSN you create in the ODBC Data
Source Administrator.

Ensure that you create the DSN so that it appears in the

dropdown list.

User Name Specifies the user name to access the data source defined in
the DSN.

Password Specifies the password related to the User Name value.

To create a server-name (DSN-less) connection for the datastore, complete the database-specific options
described in the following table.

PostgreSQL database option descriptions for DSN-less connection

Option Description

Database server name Specifies the database server address. Enter localhost
or an IP address.

Database name Specifies the database name to which this datastore con
nects.

Port Specifies the port number that this datastore uses to access
the database.

User name Specifies the name of the user authorized to access the
database.

Password Specifies the password related to the specified User name.

Enable Automatic Data Transfer Specifies that any data flow that uses the tables imported
with this datastore can use the Data_Transfer transform.
Data_Transfer uses transfer tables to push down certain op
erations to the database server for more efficient process
ing.

Data Services Supplement for Big Data

50 PUBLIC Big data in SAP Data Services
Option Description

Use SSL encryption Specifies to use SSL/TLS encryption for the datastore con
nection to the database. SSL/TLS encryption is applicable
only for DSN-less connections.

• Yes: Creates the datastore with SSL/TLS encryption.

When you select Yes, you must also complete the op
tions in the Encryption Parameters dialog.
• No: Creates the datastore without SSL/TLS encryption.

You can optionally add X509 authentication by selecting

X509 in the Authentication Method option.

Encryption parameters Opens the Encryption Parameters dialog box.

To open the dialog, double-click in the empty cell.

Enabled only when you select Yes for Use SSL encryption.

Authentication Method Specifies how to authenticate SSL encryption. Options in

clude the following:

• Basic: Specifies to authenticate using the user name

and password. When you select Basic, also complete
the User Name and Password options. Basic is the de
fault setting.
• X509: Specifies to authenticate using X509 Single Sign-
On Authentication. When you select X509, also com
plete the User Name option and complete the options in
the Encryption Parameters dialog box.

 Note

The user name must be the same as the user name

mapped to the x509 certificate.

Bulk loader directory Specifies the directory where Data Services stores the files
related to bulk loading, such as log file, error file, and tempo
rary files.

Click the down arrow at the end of the field and select
<Browse> or select an existing global variable that you cre
ated for this location.

If you leave this field blank, Data Services writes the files to
%DS_COMMON_DIR%/log/bulkloader.

 Note

Bulk loading in PostgreSQL is applicable for DSN-less

connections only.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 51
Option Description

PSQL full path Specifies the full path to the location of the PSQL tool.

The PSQL tool is required for PostgreSQL bulk loading. Ob

tain the PSQL tool from the official PostgreSQL website.

For convenience, create a global variable for this value before

you configure the datastore.

Click the down arrow at the end of the field and select
<Browse> or select the global variable.

 Note

Bulk loading in PostgreSQL is applicable for DSN-less

connections only.

The following options are in the Encryption Parameters dialog box when you select Yes for Use SSL encryption.

SSL Mode Specifies whether the connection is verified by the server

and in what priority.

• Prefer: First tries to connect PostgreSQL with SSL; if

that fails, tries to connect without SSL. Prefer is the
default setting.
• Require: Connects only using SSL. If a root CA file
is present, verifies the certificate in the same way as
Verify-CA.
• Verify-CA: Connects only using SSL, and verifies that
the server certificate is issued by a trusted certificate
authority (CA).
• Verify-Full: Connects only using SSL, verifies that the
server certificate is issued by a trusted CA, and verifies
that the requested server host name matches the host
name in the certificate.

Verify Client Certificate When selected, specifies to have the client verify the server
certificate. Your selection for SSL Mode determines the
method:

• If you select Verify-CA for SSL Mode, the server certifi-

cate is verified by the client with the CA.
• If you select Verify-Full for SSL Mode, the client with the
CA verifies the following:
• The server certificate.
• The CN name in the server certificate matches the
hostname of the database to which you are con
necting.

When you select Verify Client Certificate, you must also

complete Certificate, Certificate Key, and Certificate Key
Password.

Data Services Supplement for Big Data

52 PUBLIC Big data in SAP Data Services
Option Description

Certificate Browse to and select the client certificate file. If you've cre
ated a substitution parameter for this file, select the substi
tution parameter from the list.

Certificate Key Browse to and select the client private key file. If you've
created a substitution parameter for this file, select the sub
stitution parameter from the list.

Certificate Key Password Specifies the password used to encrypt the client key file.

SSL CA File Browse to and select the SSL CA certificate file to verify the
server. If you've created a substitution parameter for this file,
select the substitution parameter from the list.

 Note

This option is applicable only when you set SSL Mode to

Verify-Full.

For a list of properties required for each database type, see the Administrator Guide.

For information about bulk loading with PostgreSQL, see the Performance Optimization Guide.

Parent topic: PostgreSQL [page 48]

Related Information

Configure the PostgreSQL ODBC driver [page 53]

Configuring ODBC driver for SSL/TLS X509 PostgresSQL [page 54]
Import PostgreSQL metadata [page 55]
PostgreSQL source, target, and template tables [page 56]
PostgreSQL data type conversions [page 57]

3.6.2 Configure the PostgreSQL ODBC driver

Configure the PostgreSQL ODBC driver for Windows or Linux to update the configuration file with the
applicable driver information.

Download and install the PostgreSQL client ODBC driver from the PostgresSQL Website at https://
www.postgresql.org/ftp/odbc/versions/ . For description of configuration options, see the PostgresSQL
Website at https://odbc.postgresql.org/docs/config.html .

For Windows, use the ODBC Drivers Selector to verify the ODBC driver is installed. For Linux, configure the
ODBC driver using the SAP Data Services Connection Manager.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 53
For DSN-less connections, you can enable SSL/TLS with optional X509 authentication. Ensure that you
complete the necessary steps in your PostgreSQL database before you configure the datastore for SSL/TLS
with optional X509 authentication in SAP Data Services. See your PostgreSQL documentation for details.

Parent topic: PostgreSQL [page 48]

Related Information

Datastore options for PostgreSQL [page 49]

Configuring ODBC driver for SSL/TLS X509 PostgresSQL [page 54]
Import PostgreSQL metadata [page 55]
PostgreSQL source, target, and template tables [page 56]
PostgreSQL data type conversions [page 57]

3.6.3 Configuring ODBC driver for SSL/TLS X509

PostgresSQL

For Linux, configure the ODBC driver for SSL/TLS X509 authentication for connecting to your PostgresSQL
databases.

Before you perform the following steps, download and install the PostgreSQL client ODBC driver from the
PostgresSQL Website at https://www.postgresql.org/ftp/odbc/versions/ . For description of configuration
options, see the PostgresSQL Website at https://odbc.postgresql.org/docs/config.html .

For complete instructions to use the Connection Manager utility, and to implement the graphical user interface
for the utility, see the Server Management section of the Administrator Guide.

1. Open the connection manager using the following command:

$ cd $LINK_DIR/bin

$ source al_env.sh

$ DSConnectionManager.sh

Replace $LINK_DIR with the SAP Data Services installation directory.

2. Complete the options as shown in the following example:

 Example

Values shown as <value> and in bold indicate where you enter your own values or values related to
your system.

********************************

Configuration for PostgreSQL
********************************

The ODBC inst file is <.../odbcinst.ini>

Specify the Driver Version:<'version>'

Specify the Driver Name:

<driver_name>

Data Services Supplement for Big Data

54 PUBLIC Big data in SAP Data Services
Specify the Driver:

<.../psqlodbcw.so
>

Specify the Host Name:

<host_name>

Specify the Port:<'port>'

Specify the Database:

<database_name>

Specify the Unix ODBC Lib Path:'<.../drivers/unixODBC-2.3.2/lib>'

Specify the SSL Mode(0:none|1:prefer|2:require|3:verify-ca|4:verify-
full):'0'
3
Specify the SSL Client Certificate file:

</.../postgres.crt
>

Specify the SSL Client Key file:

</.../drivers/ssl/pgs/postgres.key
>

Specify the SSL Client Key file password:
Specify the SSL CA Certificate:

</.../drivers/ssl/pgs/root.crt>

Specify the Authentication Mode(1:Basic|2:X509):'1'
2
Specify the User Name:

<user_name>

Testing connection...
Successfully added driver.
Press Enter to go back to the Main Menu.
------------------Main Menu-Configure Drivers------------------
Drivers:
Index Database Type Version State Name
------ ------ ------ ------ ------
1: SQL Anywhere 17 not installed DS Bundled
SQL Anywhere
2: PostgreSQL 14.x installed PGS14

Task overview: PostgreSQL [page 48]

Related Information

Datastore options for PostgreSQL [page 49]

Configure the PostgreSQL ODBC driver [page 53]
Import PostgreSQL metadata [page 55]
PostgreSQL source, target, and template tables [page 56]
PostgreSQL data type conversions [page 57]

3.6.4 Import PostgreSQL metadata

Use the PostgreSQL database datastore to access the schemas and tables in the defined database.

Open the datastore and view the metadata available to download. For PostgreSQL, download schemas and the
related tables. Each table resides under a specific schema. For example, each schema contains tables that use
the schema. A table name appears as <dbname>.<schema_name>.<table_name>.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 55
Import metadata by browsing, by name, or by searching.

For more information about viewing metadata, see the Datastore metadata section of the Designer Guide.

For more information about the imported metadata from database datastores, see the Datastores section of
the Designer Guide.

Parent topic: PostgreSQL [page 48]

Related Information

Datastore options for PostgreSQL [page 49]

Configure the PostgreSQL ODBC driver [page 53]
Configuring ODBC driver for SSL/TLS X509 PostgresSQL [page 54]
PostgreSQL source, target, and template tables [page 56]
PostgreSQL data type conversions [page 57]

3.6.5 PostgreSQL source, target, and template tables

Use PostgreSQL tables as sources and targets in data flows and use PostgreSQL table schemas for template
tables.

Drag the applicable PostgreSQL table onto your workspace and connect it to a data flow as a source or target.
Also, use a template table as a target in a data flow and save it to use as a future source in a different data flow.

See the Designer Guide to learn about using template tables. Additionally, see the Reference Guide for
descriptions of options to complete for source, target, and template tables.

Parent topic: PostgreSQL [page 48]

Related Information

Datastore options for PostgreSQL [page 49]

Configure the PostgreSQL ODBC driver [page 53]
Configuring ODBC driver for SSL/TLS X509 PostgresSQL [page 54]
Import PostgreSQL metadata [page 55]
PostgreSQL data type conversions [page 57]

Data Services Supplement for Big Data

56 PUBLIC Big data in SAP Data Services
3.6.6 PostgreSQL data type conversions

When you import metadata from a PostgreSQL table into the repository, SAP Data Services converts
PostgreSQL data types to Data Services native data types for processing.

After processing, Data Services converts data types back to PostgreSQL data types when it outputs the
generated data to the target.

The following table contains PostgreSQL data types and the corresponding Data Services data types.

Data type conversion for PostgreSQL

Converts to or from Data Services
PostgreSQL data type data type Notes

Boolean/Integer/Smallint Int

Serial/Samllserial/Serial4/OID Int

Bigint/BigSerial/Serial8 Decimal(19,0)

Float(1)-Float(24), Real real

Float(25)-Float(53), Double precision double

Money double

Numeric(precision, scale) Decimal(precision, scale)

Numeric/Decimal Decimal(28,6)

Bytea Blob

Char(n) Fixedchar(n)

Text/varchar(n) Varchar(n)

DATE Date

TIMESTAMP Datetime

TIMESTAMPTZ Varchar(127)

TIMETZ Varchar(127)

INTERVAL Varchar(127)

If Data Services encounters a column that has an unsupported data type, it does not import the column.
However, you can configure Data Services to import unsupported data types by checking the Import
unsupported data types as VARCHAR of size option in the datastore editor dialog box.

 Note

When you import tables that have specific PostgreSQL native data types, Data Services saves the data type
as varchar or integer, and includes an attribute setting for Native Type. The following table contains the
column data type in which Data Services saves the PostgreSQL native data type, and the corresponding
attribute.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 57
Data Services saves PostgreSQL native data types
PostgreSQL column native data type Data Services saves as data type Data Services attribute

json Varchar Native Type = JSON

jsonb Native Type = JSONB
xml Native Type = XML
uuid Native Type = UUID

bool Integer Native Type = BOOL

text Native Type = TEXT
bigint Native Type = INT8

Parent topic: PostgreSQL [page 48]

Related Information

Datastore options for PostgreSQL [page 49]

Configure the PostgreSQL ODBC driver [page 53]
Configuring ODBC driver for SSL/TLS X509 PostgresSQL [page 54]
Import PostgreSQL metadata [page 55]
PostgreSQL source, target, and template tables [page 56]

3.7 SAP HANA

Process your SAP HANA data in SAP Data Services by creating an SAP HANA database datastore.

Import SAP HANA metadata using a database datastore. Use the table metadata as sources and targets in
data flows. Some of the benefits of using SAP HANA in Data Services include the following:

• Protect your SAP HANA data during network transmission using SSL/TLS protocol and X.509
authentication with Cryptographic libraries.
• Create stored procedures and enable bulk loading for faster reading and loading.
• Load spatial and complex spatial data from Oracle to SAP HANA.

 Note

Beginning with SAP HANA 2.0 SP1, access databases only through a multitenant database container
(MDC). If you use a version of SAP HANA that is earlier than 2.0 SP1, access only a single database.

Cryptographic libraries and global.ini settings [page 59]

When you create an SAP HANA database datastore with SSL/TLS encryption or X.509 authentication,
configure both server side and client side for the applicable authentication.

X.509 authentication [page 61]

Data Services Supplement for Big Data

58 PUBLIC Big data in SAP Data Services
X.509 authentication is a more secure method of accessing SAP HANA than user name and password
authentication.

JWT authentication [page 62]

SAP Data Services supports the JSON web token (JWT) single sign-on user authentication mechanism
for SAP HANA on-premise and cloud servers.

Bulk loading in SAP HANA [page 63]

SAP Data Services improves bulk loading for SAP HANA by using a staging mechanism to load data to
the target table.

Creating stored procedures in SAP HANA [page 65]

SAP Data Services supports SAP HANA stored procedures with zero, one, or more output parameters.

SAP HANA database datastores [page 66]

To access SAP HANA data for SAP Data Services processes, configure an SAP HANA database
datastore with either a data source name (DSN) or a server name (DSN-less) connection.

Configuring DSN for SAP HANA on Windows [page 72]

To use a DSN connection for an SAP HANA datastore, configure a DSN connection for Windows using
the ODBC Data Source Administrator.

Configuring DSN for SAP HANA on Unix [page 74]

Configure a DSN connection for an SAP HANA database datastore for Unix using the SAP Data
Services Connection Manager.

Datatype conversion for SAP HANA [page 76]

SAP Data Services performs data type conversions when it imports metadata from SAP HANA sources
or targets into the repository and when it loads data into an external SAP HANA table or file.

Using spatial data with SAP HANA [page 78]

SAP Data Services supports spatial data such as point, line, polygon, collection, for specific databases.

3.7.1 Cryptographic libraries and global.ini settings

When you create an SAP HANA database datastore with SSL/TLS encryption or X.509 authentication,
configure both server side and client side for the applicable authentication.

On the server side, the process of configuring the ODBC driver, SSL/TLS, and/or X.509 authentication
automatically sets the applicable settings in the communications section of the global.ini file.

SAP HANA uses the SAP CommonCrypto library for both SSL/TLS encryption and X.509 authentication. The
SAP HANA server installer installs the CommonCryptoLib (libsapcrypto.sar) to $DIR_EXECUTABLE by default.

 Note

SAP CommonCrypto library was formerly known as SAPCrypto library.

 Note

Support for OpenSSL in SAP HANA is deprecated. If you are using OpenSSL, we recommend that you
migrate to CommonCryptoLib. For more information, see 2093286

Obtaining the SAP CommonCryptoLib file in Windows and Unix [page 60]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 59
The SAP CommonCryptoLib files are required for using SSL/TLS encryption and X.509 authentication
in your SAP HANA database datastores.

Parent topic: SAP HANA [page 58]

Related Information

X.509 authentication [page 61]

JWT authentication [page 62]
Bulk loading in SAP HANA [page 63]
Creating stored procedures in SAP HANA [page 65]
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]

3.7.1.1 Obtaining the SAP CommonCryptoLib file in

Windows and Unix

The SAP CommonCryptoLib files are required for using SSL/TLS encryption and X.509 authentication in your
SAP HANA database datastores.

If you use the SAP HANA ODBC driver version 2.9 or higher, you don't have to perform the following steps to
obtain the SAP CommonCrypto library, because it's bundled with the driver.

To obtain the SAP CommonCryptoLib file, perform the following steps based on your platform:

1. For Windows:
a. Create a local folder to store the CommonCryptoLib files.
b. Download and install the applicable version of SAPCAR from the SAP download center.

Use SAPCAR to extract the SAP CommonCryptoLib libraries.

c. Obtain the SAP CommonCryptoLib Library file from the SAP download center.
d. Use SAPCar to extract the library files from libsapcrypto.sar to the local folder that you created to
store the files.
e. Create a system variable in Windows named $SECUDIR and point to the local folder that you created
for the CommonCryptoLib library files.

The process to create a system variable on Windows varies based on your Windows version.

 Example

To create a system variable for Windows 10 Enterprise, access Control Panel as an administrator
and open Systems. Search for System Variables and select Edit System Variables.

Data Services Supplement for Big Data

60 PUBLIC Big data in SAP Data Services
f. Append %SECUDIR% to the PATH variable in Environment Variables.
g. Restart Windows.
2. For Unix:
a. Create a local folder to store the CommonCryptoLib files.
b. Obtain the SAP CommonCryptoLib Library libsapcrypto.sar file from the Software Center .
c. Use SAPCar to extract the library files from libsapcrypto.sar to the local folder that you created to
store the files.
d. Create a system variable named SECUDIR and point to the local folder that you created to store the
files.

export SECUDIR=/PATH/<LOCAL_FOLDER>

e. Append $SECUDIR to PATH.

export PATH=$SECUDIR:$PATH
f. Restart the Job Server.

Task overview: Cryptographic libraries and global.ini settings [page 59]

3.7.2 X.509 authentication

X.509 authentication is a more secure method of accessing SAP HANA than user name and password
authentication.

Include X.509 authentication when you configure an SAP HANA datastore. When you use X.509
authentication, consider the following information:

• Use with or without SSL/TLS protocol.

• Applicable for both DSN and server named (DSN-less) connections.
• Uses the same Cryptographic libraries and global.ini settings as the SSL/TLS protocol.
• Requires an X.509 key store file, which contains the following:
• The X.509 client certificate.
• The SSL server certificate, only if you select to use SSL/TLS encryption and you select to validate the
server certificate in SAP Data Services.

 Note

Support for X.509 authentication begins in SAP Data Services version 4.2 SP14 (14.2.14.16) and is
applicable for SAP HANA Server on premise 2.0 SP05 revision 56 and above, and SAP HANA Cloud. For
SAP HANA client, X.509 is applicable for SAP HANA ODBC client 2.7 and above.

Create and configure the server and client certificate files before you include X.509 authentication for
connecting to your SAP HANA data. For details, see the SAP Knowledge Base article 3126555 .

Parent topic: SAP HANA [page 58]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 61
Related Information

Cryptographic libraries and global.ini settings [page 59]

3.7.3 JWT authentication

SAP Data Services supports the JSON web token (JWT) single sign-on user authentication mechanism for SAP
HANA on-premise and cloud servers.

To use JWT user authentication for a SAP HANA datastore and a MDS/VDS request, edit the HANA datastore
and leave the User Name field empty and enter an encoded JWT into the Password field.

For information about obtaining a JWT for a HANA user on the HANA on-premise server, see Single Sign-On
Using JSON Web Tokens in SAP HANA Security Guide for SAP HANA Platform.

For information about obtaining a JWT for a HANA user on the HANA Cloud server, contact your Cloud server
administrator.

Parent topic: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
Bulk loading in SAP HANA [page 63]
Creating stored procedures in SAP HANA [page 65]
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]

Data Services Supplement for Big Data

62 PUBLIC Big data in SAP Data Services
3.7.4 Bulk loading in SAP HANA
SAP Data Services improves bulk loading for SAP HANA by using a staging mechanism to load data to the
target table.

When Data Services uses changed data capture (CDC) or auto correct load, it uses a temporary staging table
to load the target table. Data Services loads the data to the staging table and applies the operation codes
INSERT, UPDATE, and DELETE to update the target table. With the Bulk load option selected in the target table
editor, any one of the following conditions triggers the staging mechanism:

• The data flow contains a Map CDC Operation transform.

• The data flow contains a Map Operation transform that outputs UPDATE or DELETE rows.
• The data flow contains a Table Comparison transform.
• The Auto correct load option in the target table editor is set to Yes.

If none of these conditions are met, the input data contains only INSERT rows. Therefore Data Services
performs only a bulk insert operation, which does not require a staging table or the need to execute any
additional SQL.

By default, Data Services automatically detects the SAP HANA target table type. Then Data Services updates
the table based on the table type for optimal performance.

The bulk loader for SAP HANA is scalable and supports UPDATE and DELETE operations. Therefore, the
following options in the target table editor are also available for bulk loading:

• Use input keys: Uses the primary keys from the input table when the target table does not contain a
primary key.
• Auto correct load: If a matching row to the source table does not exist in the target table, Data Services
inserts the row in the target. If a matching row exists, Data Services updates the row based on other update
settings in the target editor.

Find these options in the target editor under Update Control.

For more information about SAP HANA bulk loading and option descriptions, see the Data Services
Supplement for Big Data.

SAP HANA target table options [page 64]

When you use SAP HANA tables as targets in a data flow, configure options in the target editor.

Parent topic: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
JWT authentication [page 62]
Creating stored procedures in SAP HANA [page 65]
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 63
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]
SAP HANA target table options [page 64]

3.7.4.1 SAP HANA target table options

When you use SAP HANA tables as targets in a data flow, configure options in the target editor.

The following tables describe options in the target editor that are applicable to SAP HANA. For descriptions of
the common options, see the Reference Guide.

Options
Option Description

Table type Specifies the table type when you use SAP HANA template table as target.

• Column Store: Creates tables organized by column. Column Store is the default
setting.

 Note

Data Services does not support blob, dbblob, and clob data types for column
store table types.

• Row Store: Creates tables organized by row.

Bulk loading
Option Description

Bulk load Specifies whether Data Services uses bulk loading to load data to the target.

• Selected: Uses bulk loading to load data to the target.

• Not selected: Does not use bulk loading to load data to the target.

Mode Specifies the mode that Data Services uses for loading data to the target table:

• Append: Adds new records to the table. Append is the default setting.
• Truncate: Deletes all existing records in the table and then adds new records.

Commit size Specifies the maximum number of rows that Data Services loads to the staging and target
tables before it saves the data (commits).

• default: Uses a default commit size based on the target table type.
• Column Store: Default commit size is 10,000
• Row Store: Default commit size is 1,000
• Enter a value that is greater than 1.

Data Services Supplement for Big Data

64 PUBLIC Big data in SAP Data Services
Option Description

Update method Specifies how Data Services applies the input rows to the target table.

• default: Uses an update method based on the target table type:

• Column Store: Uses UPDATE to apply the input rows.
• Row Store: Uses DELETE-INSERT to apply the input rows.
• UPDATE: Issues an UPDATE to the target table.
• DELETE-INSERT: Issues a DELETE to the target table for data that matches the old
data in the staging table. Issues an INSERT with the new data.

 Note

Do not use DELETE-INSERT if the update rows contain data for only some of the
columns in the target table. If you use DELETE-INSERT, Data Services replaces
missing data with NULLs.

Parent topic: Bulk loading in SAP HANA [page 63]

3.7.5 Creating stored procedures in SAP HANA

SAP Data Services supports SAP HANA stored procedures with zero, one, or more output parameters.

Data Services supports scalar data types for input and output parameters. Data Services does not support
table data types. If you try to import a procedure with table data type, the software issues an error. Data
Services does not support data types such as binary, blob, clob, nclob, or varbinary for SAP HANA procedure
parameters.

Procedures can be called from a script or from a Query transform as a new function call.

 Example

Syntax

The SAP HANA syntax for the stored procedure:

CREATE PROCEDURE GET_EMP_REC (IN EMP_NUMBER INTEGER, OUT EMP_NAME

VARCHAR(20), OUT EMP_HIREDATE DATE) AS

BEGIN
SELECT ENAME, HIREDATE
INTO EMP_NAME, EMP_HIREDATE
FROM EMPLOYEE
WHERE EMPNO = EMP_NUMBER;

END;

Limitations

SAP HANA provides limited support of user-defined functions that can return one or several scalar values.
These user-defined functions are usually written in L. If you use user-defined functions, limit them to the

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 65
projection list and the GROUP BY clause of an aggregation query on top of an OLAP cube or a column table.
These functions are not supported by Data Services.

SAP HANA procedures cannot be called from a WHERE clause.

For more information about creating stored procedurees in a database, see the Functions and Procedures
section in the Designer Guide.

Parent topic: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
JWT authentication [page 62]
Bulk loading in SAP HANA [page 63]
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]

3.7.6 SAP HANA database datastores

To access SAP HANA data for SAP Data Services processes, configure an SAP HANA database datastore with
either a data source name (DSN) or a server name (DSN-less) connection.

You can optionally include secure socket layer (SSL) or transport layer security (TLS) for secure transfer of
data over a network, and you can use X.509 authentication instead of user name and password authentication.

 Note

Support for SSL/TLS with a DSN connection begins in SAP Data Services version 4.2 SP7 (14.2.7.0).
Support for SSL/TLS with a DSN-less connection begins in SAP Data Services version 4.2 SP12 (14.2.12.0).

 Note

Support for X.509 authentication is applicable for SAP HANA Server on premise 2.0 SP05 revision 56 and
above and SAP HANA Cloud. For SAP HANA client, X.509 is applicable for SAP HANA ODBC client 2.7 and
above.

When you create an SAP HANA datastore, and use SAP HANA data in data flows, Data Services requires
the SAP HANA ODBC driver. The following table lists the additional requirements for including additional
authentications.

Data Services Supplement for Big Data

66 PUBLIC Big data in SAP Data Services
Authentication Requirements

SSL/TLS SAP CommonCrypto library

SAP HANA SSL/TLS certificate and key files

X.509 SAP CommonCrypto library

X.509 KeyStore file, which contains the following certifi-

cates:

• X.509 client certificate.

• SSL/TLS server certificate, only when you use SSL en
cryption and validate the server certificate in Data Serv
ices.

For more information about SAP HANA, SSL/TLS, SAP CommonCrypto library, and settings for secure
external connections in the global.ini file, see the “SAP HANA Network and Communication Security”
section of the SAP HANA Security Guide.

 Note

Enabling SSL/TLS encryption slows job performance.

SAP HANA datastore prerequisites [page 68]

Before you configure an SAP HANA datastore, perform prerequisite tasks, such as configuring the
ODBC driver.

SAP HANA datastore option descriptions [page 69]

In addition to the common database datastore options, SAP Data Services requires that you set
options specific to SAP HANA, SSL/TLS, and X.509 authentication.

Parent topic: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
JWT authentication [page 62]
Bulk loading in SAP HANA [page 63]
Creating stored procedures in SAP HANA [page 65]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 67
3.7.6.1 SAP HANA datastore prerequisites
Before you configure an SAP HANA datastore, perform prerequisite tasks, such as configuring the ODBC
driver.

Perform the following prerequisite tasks before you create the SAP HANA datastore:

• Download, install, and configure the SAP HANA ODBC driver.

• For Windows, configure the driver using the ODBC Drivers Selector utility.
• For UNIX, configure the driver using the ODBC Data Source Administrator.

 Note

If you plan to include SSL/TLS or X.509 authentication in the datastore, consider using the SAP HANA
ODBC driver version 2.9 or higher. Version 2.9 or higher is bundled with the SAP CommonCrypto
library, which is required for SSL/TLS and X.509.

• If you use a data source name (DSN), create a DSN connection using the ODBC Data Source Administrator
(Windows) or the SAP Data Services Connection Manager (Unix).
• If you include SSL/TLS encryption:
• Download the SAP CommonCrypto library and set the PATH environment variable as instructed in
Obtaining the SAP CommonCryptoLib file in Windows and Unix [page 60].

 Note

If you use the SAP HANA ODBC driver version 2.9 or higher, set the following environment
variables:
• Windows: Set the SECUDIR and PATH environment variables to HANA Client directory with
Commoncrypto library on Windows system.
• Unix: Set the SECUDIR and LD_LIBRARY_PATH environment variables to HANA Client
directory with Commoncrypto library on Unix system.

• Set SSL/TLS options in the datastore editor (Windows) or the Data Services Connection Manager
(Unix).
• If you use X.509 authentication:
• Download the SAP CommonCrypto library and set the PATH environment variable as instructed in
Obtaining the SAP CommonCryptoLib file in Windows and Unix [page 60].

 Note

If you've downloaded the SAP CommonCrypto library and set the PATH environment variable for
SSL/TLS, you don't have to repeat it for X.509 authentication.

 Note

Data Services Supplement for Big Data

68 PUBLIC Big data in SAP Data Services
• Configure the X.509 key store file. For details, see the SAP Knowledge Base article 3126555 .
• Set X.509 options in the datastore editor (Windows) or the Data Services Connection Manager (Unix).

 Note

Support for X.509 authentication is applicable for SAP HANA Server on premise 2.0 SP05 revision 56
and above and SAP HANA Cloud. For SAP HANA client, X.509 is applicable for SAP HANA ODBC client
2.7 and above.

Parent topic: SAP HANA database datastores [page 66]

Related Information

SAP HANA datastore option descriptions [page 69]

3.7.6.2 SAP HANA datastore option descriptions

In addition to the common database datastore options, SAP Data Services requires that you set options
specific to SAP HANA, SSL/TLS, and X.509 authentication.

For descriptions of common database datastore options, and for steps to create a database datastore, see the
Datastores section of the Designer Guide.

The following table contains the SAP HANA-specific options in the datastore editor, including DSN and DSN-
less settings.

Option Value

Use Data Source Name (DSN) Specifies whether to use a DSN (data source name) connec
tion.

• Select to create a datastore using a DSN.

• Don't select to create a datastore using a server-name
(DSN-less) connection.

The following options appear when you select to use a DSN connection:

Data Source Name Select the SAP HANA DSN that you created previously (see
Prerequisites).

User Name Enter the user name and password connected to the DSN.

Password  Note

If you use X.509 authentication, you don't have to enter

a user name and password.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 69
Option Value

The following options appear when you create a DSN-less connection:

Database server name Specifies the name of the computer where the SAP HANA
server is located.

If you're connecting to SAP HANA 2.0 SPS 01 MDC or later,

enter the SAP HANA database server name for the applica
ble tenant database.

 Note

See SAP HANA documentation to learn how to find the

specific tenant database port number.

Port Enter the port number to connect to the SAP HANA Server.
The default is 30015.

If you're connecting to SAP HANA 2.0 SPS 01 MDC or later,

enter the port number of the specific tenant database.

 Note

See SAP HANA documentation to learn how to find the

specific tenant database port number.

Advanced options

The following table contains descriptions for the advanced options in the SAP HANA datastore editor, including
options for SSL/TLS and X.509.

 Note

Support for X.509 authentication is applicable for SAP HANA Server on premise 2.0 SP05 revision 56 and
above, and SAP HANA Cloud. For SAP HANA client, X.509 is applicable for SAP HANA ODBC client 2.7 and
above.

Option Description

Database name Optional. Enter the specific tenant database name. Applica
ble for SAP HANA version 2.0 SPS 01 MDC and later.

Additional connection parameters Enter information for any additional parameters that
the data source ODBC driver and database supports.
Use the following format: <parameter1=value1;
parameter2=value2>

Data Services Supplement for Big Data

70 PUBLIC Big data in SAP Data Services
Option Description

Use SSL encryption Specifies to use SSL/TLS encryption for the datastore con
nection to the database.

• Yes: Creates the datastore with SSL/TLS encryption.

• No: Creates the datastore without SSL/TLS encryption.

Encryption parameters Opens the Encryption Parameters dialog box.

To open the dialog, either double-click in the empty cell or

select the empty cell and then select the Ellipses (…) icon
that appears at the end.

Enabled only when you select Yes for Use SSL encryption or
Yes for Use X.509 authentication.

The following options are in the Encryption Parameters dialog box when you select Yes for Use SSL encryption:

Validate Certificate Specifies whether the software validates the SAP HANA
server SSL certificate. If you do not select this option, none
of the other SSL options are available to complete.

Crypto Provider Specifies the crypto provider used for SSL/TLS and X.509
communication. Data Services populates Crypto Provider
automatically with commoncrypto. SAP CommonCryptoLib
library is the only supported cryptographic library for SAP
HANA.

Certificate host Specifies the host name used to verify the server identity.
Choose one of the following actions:

• Leave blank. Data Services uses the value in Database

server name.
• Enter a string that contains the SAP HANA server host
name.
• Enter the wildcard character “*” so that Data Services
doesn't validate the certificate host.

Key Store Specifies the location and file name for your key store file.
You can also use a substitution parameter.

 Note

If you choose to use X.509 authentication, the key store

PSE file must contain both the SSL server certificate
and X.509 client certificate. If the key store file name
isn't an absolute path, Data Services assumes the PSE
file is located in $SECUDIR.

X.509 authentication settings:

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 71
Option Description

Use X.509 authentication Specifies to use X.509 authentication.

 Note

X.509 authentication is applicable for SAP HANA Server

on premise 2.0 SP05 revision 56 and above and SAP
HANA Cloud. For SAP HANA client, X.509 is applicable
for SAP HANA ODBC client 2.7 and above.

X.509 key store Specifies the x509 key store file, which includes the X.509
client certificate. If the input isn't an absolute file path, Data
Services assumes that the file is in $SECUDIR. You can also
use a substitution parameter for X.509 key store.

Proxy group: For DSN-less connections only. For connecting to SAP HANA data through the SAP Cloud connector.

Proxy host Specifies the proxy server host.

Proxy port Specifies the proxy server port.

Proxy user name Specifies the proxy server user name.

Proxy password Specifies the proxy server password.

SAP cloud connector account Specifies the SAP Cloud connector account. Complete when
the SAP HANA data is in SAP Big Data Services.

Parent topic: SAP HANA database datastores [page 66]

Related Information

SAP HANA datastore prerequisites [page 68]

3.7.7 Configuring DSN for SAP HANA on Windows

To use a DSN connection for an SAP HANA datastore, configure a DSN connection for Windows using the
ODBC Data Source Administrator.

Optionally include SSL/TLS encryption and the X.509 authentication settings when you configure the DSN.

Perform the prerequisites listed in SAP HANA datastore prerequisites [page 68].

For SSL/TLS encryption, perform the following steps:

1. Copy sapsrv.pse from $SECUDIR of the SAP HANA server.

2. Paste sapsrv.pse to $SECUDIR of the client.
3. Rename sapsrv.pse to sapcli.pse.

To configure DSN for SAP HANA on Windows, perform the following steps:

Data Services Supplement for Big Data

72 PUBLIC Big data in SAP Data Services
1. Open the ODBC Data Source Administrator.

Access the ODBC Data Source Administrator either from the datastore editor in Data Services Designer or
directly from your Start menu.
2. In the ODBC Data Source Administrator, open the System DSN tab and select Add.
3. Choose the SAP HANA ODBC driver and select Finish.

The driver is listed only after you select it in the ODBC Drivers Selector utility.

The ODBC Configuration for SAP HANA dialog box opens.

4. Enter a unique name in Data Source Name, and enter a description if applicable.
5. Choose the database type from the Database type list.
6. Enter the server host name in Host.
7. Enter the port number in Port.
8. Enter the tenant database name in Tenant database.
9. Optional: For SSL/TLS encryption and/or X.509 authentication, select Settings.
10. Optional: Complete the following options for SSL/TLS:
a. In the TLS/SSL group, select the following options:

• Connect to the database using TLS/SSL

• Validate the SSL certificate (optional)
b. Either leave Certificate host blank or enter a value:

If you leave Certificate host blank, Data Services uses the value in Database server name. If you don't
want the value from the Database server name, enter one of the following values:
• A string that contains the SAP HANA server hostname.
• The wildcard character “*”, so that Data Services doesn't validate the certificate host.
c. Specify the location and file name for your key store in Key Store.
d. Select OK.
11. Optional: Complete the following options for X.509 authentication:
a. Optional: In the TLS/SSL group, select Connect to the database using TLS/SSL.
b. Open Advanced ODBC Connection Property Setup and enter a string in Additional connection
properties:
To bypass certificate validation, enter the following string:

authenticationX509:<location_of_PSE>\x509.pse

sslValidateCertificate:FALSE

To validate certificate, enter the following string:

authenticationX509:<Location_of_PSE>\x509.pse
c. Select OK.

Task overview: SAP HANA [page 58]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 73
Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
JWT authentication [page 62]
Bulk loading in SAP HANA [page 63]
Creating stored procedures in SAP HANA [page 65]
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Unix [page 74]
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]

3.7.8 Configuring DSN for SAP HANA on Unix

Configure a DSN connection for an SAP HANA database datastore for Unix using the SAP Data Services
Connection Manager.

Optionally include SSL/TLS encryption and the X.509 authentication settings when you configure the DSN.

Perform the prerequisites listed in SAP HANA datastore prerequisites [page 68].

For SSL/TLS encryption, perform the following steps:

1. Copy sapsrv.pse from $SECUDIR of the SAP HANA server.

2. Paste sapsrv.pse to $SECUDIR of the client.
3. Rename sapsrv.pse to sapcli.pse.

The following instructions assume that you have the user interface for Connection Manager.

1. Export $ODBCINI to a file in the same computer as the SAP HANA data source. For example:

export ODBCINI=<dir_path>/odbc.ini

2. Start SAP Data Services Connection Manager by entering the following command:

$LINK_DIR/bin/DSConnectionManager.sh

3. Open the Data Sources tab and select Add to display the list of database types.
4. In the Select Database Type dialog box, select the SAP HANA database type and select OK.

The configuration page opens with some of the connection information automatically completed:
• Absolute location of the odbc.ini file
• Driver for SAP HANA
• Driver version
5. Complete the following options:

Data Services Supplement for Big Data

74 PUBLIC Big data in SAP Data Services
• DSN Name
• Specify the Driver Name
• Specify the Server Name
• Specify the Server Instance
• Specify the User Name
• Type the database password
• Specify the Host Name
• Specify the Port
6. Optional: To include SSL/TLS encryption, complete the following options:

• Specify the SSL Encryption Option: Select y.

• Specify the Validate Server Certificate Option: Select y or n.
• Specify the HANA SSL provider: Accept commoncrypto.
• Specify the SSL Certificate File: Enter the absolute path of the SSL trust store file.
• Specify the SSL Key File: Enter the absolute path of the SSL key store file.

 Note

If you enable X.509 authentication, the key store file must contain both the SSL server certificate
and the X.509 client certificate.

• Specify the SSL Host Name in Certificate:

• Leave blank to use the database server name.
• Enter a string that contains the SAP HANA server hostname.
• Enter the wildcard character “*” so that Data Services doesn't validate the certificate host.

For descriptions of the DSN and SSL/TLS options, see SAP HANA datastore prerequisites [page 68].
7. Optional: To include X.509 authentication, complete the following X.509 option: Specify the HANA User
Authentication Method: Select 1: x.509.
8. Press Enter .

The system tests the connection. The message “Successfully edited database source.” appears when
you've successfully configured the DSN.

Task overview: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 75
Datatype conversion for SAP HANA [page 76]
Using spatial data with SAP HANA [page 78]

3.7.9 Datatype conversion for SAP HANA

SAP Data Services performs data type conversions when it imports metadata from SAP HANA sources or
targets into the repository and when it loads data into an external SAP HANA table or file.

Data Services uses its own conversion functions instead of conversion functions that are specific to the
database or application that is the source of the data.

Additionally, if you use a template table or Data_Transfer table as a target, Data Services converts from internal
data types to the data types of the respective DBMS.

SAP HANA datatypes [page 76]

SAP Data Services converts SAP HANA data types when you import metadata from an SAP HANA
source or target into the repository.

Parent topic: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
JWT authentication [page 62]
Bulk loading in SAP HANA [page 63]
Creating stored procedures in SAP HANA [page 65]
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]
Using spatial data with SAP HANA [page 78]

3.7.9.1 SAP HANA datatypes

SAP Data Services converts SAP HANA data types when you import metadata from an SAP HANA source or
target into the repository.

Data Services converts data types back to SAP HANA data types when you load data into SAP HANA after
processing.

Data Services Supplement for Big Data

76 PUBLIC Big data in SAP Data Services
Data type conversion on import
SAP HANA data type Converts to Data Services data type

integer int

tinyint int

smallint int

bigint decimal

char varchar

nchar varchar

varchar varchar

nvarchar varchar

decimal or numeric decimal

float double

real real

double double

date date

time time

timestamp datetime

clob long

nclob long

blob blob

binary blob

varbinary blob

The following table shows the conversion from internal data types to SAP HANA data types in template tables.

Data type conversion on load to template table

Data Services data type Converts to SAP HANA data type

blob blob

date date

datetime timestamp

decimal decimal

double double

int integer

interval real

long clob/nclob

real decimal

time time

timestamp timestamp

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 77
Data Services data type Converts to SAP HANA data type

varchar varchar/nvarchar

Parent topic: Datatype conversion for SAP HANA [page 76]

3.7.10 Using spatial data with SAP HANA

SAP Data Services supports spatial data such as point, line, polygon, collection, for specific databases.

Data Services supports spatial data in the following databases:

• Microsoft SQL Server for reading

• Oracle for reading
• SAP HANA for reading and loading

When you import a table with spatial data columns, Data Services imports the spatial type columns as
character based large objects (clob). The column attribute is Native Type, which has the value of the
actual data type in the database. For example, Oracle is SDO_GEOMETRY, Microsoft SQL Server is geometry/
geography, and SAP HANA is ST_GEOMETRY.

When reading a spatial column from HANA, Data Services reads it as EWKT (Extended Well-Known Text). When
loading a spatial column to HANA the input must have the correct SRID (spatial reference identifier). Note that
creating template tables with a spatial column and SRID is not supported.

Limitations

• You cannot create template tables with spatial types because spatial columns are imported into Data
Services as clob.
• You cannot manipulate spatial data inside a data flow because the spatial utility functions are not
supported.

Loading spatial data to SAP HANA [page 79]

Load spacial data from Oracle or Microsoft SQL Server to SAP HANA.

Loading complex spatial data from Oracle to SAP HANA [page 79]
Complex spatial data is data such as circular arcs and LRS geometries.

Parent topic: SAP HANA [page 58]

Related Information

Cryptographic libraries and global.ini settings [page 59]

X.509 authentication [page 61]
JWT authentication [page 62]
Bulk loading in SAP HANA [page 63]
Creating stored procedures in SAP HANA [page 65]

Data Services Supplement for Big Data

78 PUBLIC Big data in SAP Data Services
SAP HANA database datastores [page 66]
Configuring DSN for SAP HANA on Windows [page 72]
Configuring DSN for SAP HANA on Unix [page 74]
Datatype conversion for SAP HANA [page 76]

3.7.10.1 Loading spatial data to SAP HANA

Load spacial data from Oracle or Microsoft SQL Server to SAP HANA.

Learn more about spatial data by reading the SAP HANA documentation.

1. Import a source table from Oracle or Microsoft SQL Server to SAP Data Services.
2. Create a target table in SAP HANA with the appropriate spatial columns.
3. Import the SAP HANA target table into Data Services.
4. Create a data flow with an Oracle or Microsoft SQL Server source as reader.
Include any necessary transformations.
5. Add the SAP HANA target table as a loader.
Make sure not to change the data type of spatial columns inside the transformations.
6. Build a job that includes the data flow and run it to load the data into the target table.

Task overview: Using spatial data with SAP HANA [page 78]

Related Information

Loading complex spatial data from Oracle to SAP HANA [page 79]

3.7.10.2 Loading complex spatial data from Oracle to SAP

HANA

Complex spatial data is data such as circular arcs and LRS geometries.

1. Create an Oracle datastore for the Oracle table.

For instructions, see the guide Supplement for Oracle Applications.

2. Import a source table from Oracle to SAP Data Services using the Oracle datastore.
3. Create a target table in SAP HANA with the appropriate spatial columns.
4. Import the SAP HANA target table into Data Services.
5. Create a data flow in Data Services, but instead of including an Oracle source, include a SQL transform as
reader.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 79
6. Retrieve the data from the Oracle database directly. First, open the SQL transform, then add the SQL
Select statement. Add the SQL Select statement by calling the following functions against the spatial data
column:
• SDO_UTIL.TO_WKTGEOMETRY
• SDO_GEOM.SDO_ARC_DENSIFY

For example, in the SQL below, the table name is “Points”. The “geom” column contains the following
geospatial data:

SELECT

SDO_UTIL.TO_WKTGEOMETRY(
SDO_GEOM.SDO_ARC_DENSIFY(
geom,
(MDSYS.SDO_DIM_ARRAY(
MDSYS.SDO_DIM_ELEMENT('X',-83000,275000,0.0001),
MDSYS.SDO_DIM_ELEMENT('Y',366000,670000,0.0001)
)),
'arc_tolerance=0.001'
)
)
from "SYSTEM"."POINTS"

For more information about how to use these functions, see the Oracle Spatial Developer's Guide on the
Oracle Web page at SDO_GEOM Package (Geometry) .
7. Build a job in Data Services that includes the data flow and run it to load the data into the target table.

Task overview: Using spatial data with SAP HANA [page 78]

Related Information

Loading spatial data to SAP HANA [page 79]

3.8 Amazon Athena

Use the Simba Athena ODBC driver to connect to Amazon Athena.

Once connected to Amazon Athena, you can browse metadata, import tables, read data from tables, and load
data into tables. Note that template table creation is not supported.

 Note

Before you begin, make sure you have the necessary privileges for Athena and for the underlying S3
storage for the table data. If you need help with privileges, please refer to the Amazon documentation or
contact their Support team.

 Note

DELETE is not allowed for the normal external table. The DELETE is transactional and is supported only for
Apache Iceberg tables.

Data Services Supplement for Big Data

80 PUBLIC Big data in SAP Data Services
1. Download and install the Simba Athena ODBC driver. You can download the driver from https://
docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html .
2. Create a data source name (DSN).
For Windows:
1. Open the Microsoft ODBC Data Source Administrator.
2. Populate the fields in the Simba Athena ODBC Driver DSN Setup window. The AWS Region and S3
Output Location fields are required.
3. Click Authentication Options and select IAM Credentials for the Authentication Type.
Enter the User (the access key provided by your AWS account) and the Password (the secret key
provided by your AWS account).
4. Test the connection and close the window.
For Linux:
For information about manually configuring a driver and data source for non-native ODBC data sources,
see Non-Native ODBC data sources.
3. In the Data Services Designer, create and configure an ODBC database datastore and choose the DSN you
created for Athena as the Data Source Name.
For more information, see Creating a database datastore in the Designer Guide.

Related Information

ODBC capacities and functions for Athena [page 82]

3.8.1 Athena data type conversions

When you import metadata from an Athena table into the repository, SAP Data Services converts Athena data
types to Data Services native data types for processing.

Athena database data type Data Services Data Type

Boolean Int

Smallint/Tinyint Int

Int Int

Bigint Decimal(19,0)

Double Double

Decimal(prec,scale) Decimal(prec,scale)

Char(n) Varchar(n)

Varchar(n) Varchar(n)

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 81
Athena database data type Data Services Data Type

String Varchar(N)

 Note

N depends on the setting in the data source.

Date Date

Timestamp Datetime

Array Varchar(N)

 Note

N depends on the setting in the data source.

Map Varchar(N)

 Note

N depends on the setting in the data source.

Struct Varchar(N)

 Note

N depends on the setting in the data source.

Binary Varchar(255)

 Note

Unknown data type. Depends on the setting of the

Import unknown datatype as varchar option in the Data
Services data store.

3.8.2 ODBC capacities and functions for Athena

Capacities and functions that are supported by Amazon Athena.

ODBC capacities
Options Supported

Parameterized SQL Not supported. Option is set to No by default.

Array Fetch Yes

Outer Join Yes

SQL92 Join syntax Yes

AutoCommit Yes

Data Services Supplement for Big Data

82 PUBLIC Big data in SAP Data Services
ODBC functions
Function Push down

Absolute Yes

Ceiling Yes

Floor Yes

Round Yes

Truncate Yes

Sqrt Yes

Log Yes

Ln Yes

Power Yes

Mod Yes

Lower Yes

Upper Yes

Trim Yes

String Length Yes

Substring Yes

Soundex Not supported

Year/Month/Week/DayOfMonth/DayOfYear Yes

Sysdate/Systime Yes

Avg/Count/Sum Yes

Max/Min Yes

Nvl Not supported

Ifthenelse Yes

For more information, see Value descriptions for capability and function options and ODBC Capability and
function options in the Designer Guide.

Data Services Supplement for Big Data

Big data in SAP Data Services PUBLIC 83
4 Cloud computing services

SAP Data Services provides access to various cloud databases and storages to use for reading or loading big
data.

Cloud databases [page 84]

Access various cloud databases through file location objects and file format objects.

Cloud storages [page 117]

Access various cloud storages through file location objects and gateways.

4.1 Cloud databases

Access various cloud databases through file location objects and file format objects.

SAP Data Services supports many cloud database types to use as readers and loaders in a data flow.

Amazon Redshift database [page 85]

Amazon Redshift is a cloud database designed for large data files.

Azure SQL database [page 96]

Developers and administrators who use Microsoft SQL Server can store on-premise SQL Server
workloads on an Azure virtual machine in the cloud.

Google BigQuery [page 98]

The Google BigQuery datastore contains access information and passwords so that the software can
open your Google BigQuery account on your behalf.

Google BigQuery ODBC [page 99]

With a Google BigQuery ODBC datastore, make ODBC calls to your Google BigQuery data sets to
download, process, and upload data in SAP Data Services.

SAP HANA Cloud, data lake database [page 99]

Access your data lake database in an SAP HANA Cloud by creating a database datastore.

Snowflake [page 103]

Snowflake provides a data warehouse that is built for the cloud.

Parent topic: Cloud computing services [page 84]

Related Information

Cloud storages [page 117]

Data Services Supplement for Big Data

84 PUBLIC Cloud computing services
4.1.1 Amazon Redshift database

Amazon Redshift is a cloud database designed for large data files.

In SAP Data Services, create a database datastore to access your data from Amazon Redshift. Additionally,
load Amazon S3 data files into Redshift using the built-in function load_from_s3_to_redshift.

Authentication

Select from two authentication methods when you configure your connection to Amazon Redshift:

• Standard: Use your user name and password to connect to Amazon Redshift.
• AWS IAM (Identification and Access Management): There are two types of IAM to select from:
• AWS IAM credentials: Use Cluster ID, Region, Access Key ID, and Secret Access Key for
authentication.
• AWS IAM profile: Use an AWS profile and enter your Cluster ID, Region, and profile name for
authentication.

 Note

If you enter the server name in the datastore editor as a full server endpoint, Cluster ID and Region
aren't required. However, if you enter a server name as a gateway, Cluster ID and Region are required.

Encryption

You control whether you require SSL for your Amazon Redshift cluster database by enabling or disabling SSL.
In SAP Data Services, enable SSL in the datastore editor and then set the mode in the Encryption Parameters
dialog box. To learn how the SAP Data Services SSL mode settings affect the SSL mode setting in Amazon
Redshift cluster database, see the table in Using SSL and Trust CA certificates in ODBC in your Amazon
Redshift documentation.

 Note

Beginning in SAP Data Services 4.3.02.01, use the options Use SSL encryption and Encryption parameters
to configure SSL encryption instead of entering options in the Additional connection parameters option.

 Note

The SSL modes, verify-ca and verify-full, require the application to check the trusted root CA certificate.
The latest Redshift ODBC driver ships with bundled trusted root CA certificates. Check to ensure that your
installed ODBC driver includes trusted root CA certificates. For more information about certificates, see
Connect using SSL in the AWS Redshift documentation. If the root.crt file isn't in your ODBC driver
library directory (C:\Program Files\Amazon Redshift ODBC Driver\lib), you must download the
server certificate from Amazon. Save the certificate as root.crt in the ODBC library directory, or save the
server certificate in a special directory on your system.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 85
The Redshift ODBC driver sets the boolean data type to string by default. However, you can't insert a string to
the boolean data type column in some ODBC driver versions. Therefore, uncheck the data type option Show
Boolean Column As String during DSN configuration. In the ODBC driver version 1.4.65 and later, you can set
the boolean data type to int.

Connection information

The Amazon Redshift datastore supports either a DSN or server-based connection. The following list contains
information about these connections:

• For a server-based connection on Windows, select the Redshift ODBC driver by using the
ODBCDriverSelector.exe utility located under %link_dir%\bin. Then, configure the Amazon Redshift
datastore and make sure that Use data source name (DNS) isn't selected. For information about
configuring DSN on Linux, see Configuring a Redshift DSN connection on Linux using ODBC [page 90].
• To configure a DSN connection, select the option Use data source name (DSN) in the datastore editor to
configure the DSN.

Amazon Redshift datastores [page 87]

Use an Amazon Redshift datastore to access data from your Redshift cluster datastore for processing
in SAP Data Services.

Configuring a Redshift DSN connection on Linux using ODBC [page 90]

For a data source name (DSN) connection, use the DS Connection Manager to configure Amazon
Redshift as a source.

Amazon Redshift source [page 91]

Option descriptions for using an Amazon Redshift database table as a source in a data flow.

Amazon Redshift target table options [page 92]

Descriptions of options for using an Amazon Redshift table as a target in a data flow.

Amazon Redshift data types [page 94]

SAP Data Services converts Redshift data types to the internal data types when it imports metadata
from a Redshift source or target into the repository.

Parent topic: Cloud databases [page 84]

Related Information

Azure SQL database [page 96]

Google BigQuery [page 98]
Google BigQuery ODBC [page 99]
SAP HANA Cloud, data lake database [page 99]
Snowflake [page 103]

Data Services Supplement for Big Data

86 PUBLIC Cloud computing services
4.1.1.1 Amazon Redshift datastores

Use an Amazon Redshift datastore to access data from your Redshift cluster datastore for processing in SAP
Data Services.

Use a Redshift database datastore for the following tasks:

• Import table metadata.

• Read Redshift tables in a data flow.
• Preview and process data.
• Create and import template tables.
• Load processed data to your Redshift cluster datastore.
• Load Amazon S3 data files into a Redshift table using the built-in function load_from_s3_to_redshift.

The following table describes the options specific for Redshift when you create or edit a datastore. For
descriptions of basic datastore options, see Common datastore options in the Designer Guide.

Main window options

Option Description

Database server name Specifies the server name for the Redshift cluster database. Enter as a gateway
hostname, gateway IP address, or as an endpoint.

If you enter the server name as an endpoint, you don't have to complete the
Cluster ID and Region options.

User Name Specifies the Redshift cluster database user name. User name is required
regardless of the authentication method you choose.

 Note

The cluster database user name isn't the same as your AWS Identity and
Access Management (IAM) user name.

Password Specifies the password that corresponds to the entered User Name. Only appli
cable when you select Standard for Authentication Method.

Additional connection parameters Specifies additional connection information, such as to use integer for boolean
data type.

 Example

The Amazon Redshift ODBC driver uses the string (char) data type. To
insert the boolean data type as an integer, enter BoolsAsChar=0. After
you re-import the table, the Bool data type column shows int.

Use SSL encryption Specifies to use SSL encryption when connecting to your Redshift cluster
database. When you set to Yes, also open the Encryption parameters dialog and
select the SSL mode.

If you set Use SSL Encryption to No, SSL mode is disabled.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 87
Option Description

Encryption parameters Specifies the level of SSL encryption on the client side.

Select the text box to open the Encryption Parameters dialog. Choose from one
of the following modes:

• Allow: If the server requires SSL, connects with SSL encryption.

• Prefer: Connects with SSL encryption.
• Require: Connects with SSL encryption.
• Verify-ca: Connects with SSL encryption after verifying the server certifi-
cate authority (CA).
• Verify-full: Connects with SSL encryption after verifying the server CA and
verifying that the server host name matches the host name attribute on
the server CA.

 Note

SSL settings for Amazon Redshift affect some of the SSL modes. For more
information, see

Using SSL and Trust CA certificates in ODBC in your Amazon Redshift

documentation.

Authentication Method Specifies how to authenticate the connection to Amazon Redshift cluster data
base:

• Standard: Uses your user name and password to authenticate. You must
complete the options User Name and Password.
• AWS IAM Credentials: Uses your AWS Cluster ID, Region, AccessKeyId, and
SecretAccessKey to authenticate.
• AWS Profile: Authenticates using your AWS Cluster ID, Region, and Profile
Name to authenticate.

Cluster ID Specifies the ID assigned to your Amazon Redshift cluster. Obtain from the
AWS Management Console.

Not applicable when the Database server name is an endpoint, which includes
the cluster ID.

Region Specifies the region assigned to your Amazon Redshift cluster. Obtain from the
AWS Management Console.

Not applicable when the Database server name is an endpoint, which includes
the region.

For more information about AWS regions, see your Amazon Redshift documen
tation.

Data Services Supplement for Big Data

88 PUBLIC Cloud computing services
Option Description

AccessKeyId Specifies your AWS IAM access key ID. Obtain from the AWS Management
Console.

 Note

The datastore editor encrypts the access key ID when you enter it so no
one else can see or use it.

Not applicable when you select AWS Profile for Authentication Method.

SecretAccessKey Specifies your AWS IAM secret access key. Obtain from the AWS Management
Console.

 Note

The secret access key appears the first time you create an access key ID in
AWS. After that, you must remember it because AWS won't show it again.
The datastore editor encrypts the secret access key when you enter it so
no one else can see or use it.

Not applicable when you select AWS Profile for Authentication Method.

Profile Name Specifies your AWS profile name when you use AWS IAM Profile for
Authentication Method. For more information about using an authentication
profile, see your AWS Redshift documentation.

 Note

If you choose to use AWS Profile for authentication, a credentials file is

required. The credentials file contains chained roles profiles. The default
location for a credentials file is ~/.aws/Credentials, or ~/.aws/
config. You can use the AWS_SHARED_CREDENTIALS_FILE envi
ronment variable to point to a different credentials file location.

Not applicable when you select AWS IAM Credentials for Authentication
Method.

Parent topic: Amazon Redshift database [page 85]

Related Information

Configuring a Redshift DSN connection on Linux using ODBC [page 90]

Amazon Redshift source [page 91]
Amazon Redshift target table options [page 92]
Amazon Redshift data types [page 94]

Data Services Supplement for Big Data

Cloud computing services PUBLIC 89
4.1.1.2 Configuring a Redshift DSN connection on Linux
using ODBC

For a data source name (DSN) connection, use the DS Connection Manager to configure Amazon Redshift as a
source.

Before performing the following task, download and install the Amazon Redshift ODBC driver for Linux.
For details, and the latest ODBC driver download, see Configuring an ODBC Connection on the Amazon
Website. For complete information about installing and configuring an ODBC driver, find a link to download the
PDF file Amazon Redshift ODBC Data Connector Installation and Configuration Guide. Find a link tor the guide in
the topic “Configuring an ODBC Connection”.

To configure a DSN connection on Linux for an Amazon Redshift cluster database, install the downloaded driver
and then perform the following steps:

1. Configure the following files following directions in the Amazon Redshift ODBC Data Connector Installation
and Configuration Guide:

• amazon.redshiftodbc.ini
• odbc.ini
• odbcinst.ini

If you installed the ODBC driver to the default location, the files are in /opt/amazon/
redshiftodbc/lib/64.
2. Open the amazon.redshiftodbc.ini and add a line at the end of the file to point to the
libodbcinst.so file.

The libodbcinst.so file is in the unixODBC/lib directory.

 Example

/home/ec2-user/unixODBC/lib/libodbcinst.so

3. Find the [Driver] section in the amazon.redshiftodbc.ini file, and set DriverManagerEncoding to
UTF-16.

 Example

[Driver]

DriverManagerEncoding=UTF-16

4. Run DSConnectionManager.sh and configure a data source for Redshift.

The SAP Data Services installer places the DS Connection Manager in $LINK_DIR//bin/
DSConnectionManager.sh by default. Use the following example as a guide.

 Example

Example of DS Connection Manager using an AWS IAM Credentials authentication method:

Specify the DSN name from the list or add a new one:

<DSN_Name>

Specify the Unix ODBC Lib Path:

/build/unixODBC-232/lib

Specify the Driver:

Data Services Supplement for Big Data

90 PUBLIC Cloud computing services

/build/redshiftodbc/lib/64/libamazonredshiftodbc64.so

Specify the Driver Version:

<driver_version>

Specify the Host Name:

<Gateway or Endpoint>

Specify the Port:

<Port_Number>

Specify the Database:

<Database_Name>

Specify the Authentication Method(1:Standard|2:AWS IAM Credentials|3:AWS Profile):'1'

2

Specify the User Name:

<User_Name>

Specify the Cluster ID:

<Cluster_ID>

Specify the Region:

<Region>

Specify the AccessKeyId: (no echo)

Specify the SecretAccessKey: (no echo)

Specify the Redshift SSL certificate verification mode[require|allow|disable|prefer|verify-ca|verify-
full]:'verify-ca'

verify-ca
Testing connection...
Successfully add data source.

Task overview: Amazon Redshift database [page 85]

Related Information

Amazon Redshift datastores [page 87]

Amazon Redshift source [page 91]
Amazon Redshift target table options [page 92]
Amazon Redshift data types [page 94]

4.1.1.3 Amazon Redshift source

Option descriptions for using an Amazon Redshift database table as a source in a data flow.

When you use an Amazon Redshift table as a source, the software supports the following features:

• All Redshift data types

• Optimized SQL
• Basic push-down functions

The following list contains behavior differences from Data Services when you use certain functions with
Amazon Redshift:

• When using add_month(datetime, int), pushdown doesn't occur if the second parameter is not in an
integer data type.
• When using cast(input as ‘datatype’), pushdown does not occur if you use the real data type.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 91
• When using to_char(input, format), pushdown doesn't occur if the format is ‘XX’ or a number such
as ‘099’, ‘999’, ‘99D99’, ‘99G99’.
• When using to_date(date, format), pushdown doesn't occur if the format includes a time part, such
as ‘YYYY-MM-DD HH:MI:SS’.

For more about push down functions, see SAP Note 2212730 , “SAP Data Services push-down operators,
functions, and transforms”. Also read about maximizing push-down operations in the Performance
Optimization Guide.

The following table lists source options when you use an Amazon Redshift table as a source:

Option Description

Table name Name of the table that you added as a source to the data flow.

Table owner Owner that you entered when you created the Redshift table.

Datastore name Name of the Redshift datastore.

Database type Database type that you chose when you created the datastore. You cannot change
this option.

The Redshift source table also uses common table source options.

For more information about pushdown operations and viewing optimized SQL, see the Performance
Optimization Guide.

Parent topic: Amazon Redshift database [page 85]

Related Information

Amazon Redshift datastores [page 87]

Configuring a Redshift DSN connection on Linux using ODBC [page 90]
Amazon Redshift target table options [page 92]
Amazon Redshift data types [page 94]
Amazon Redshift data types [page 94]
Amazon Redshift target table options [page 92]
Amazon Redshift datastores [page 87]

4.1.1.4 Amazon Redshift target table options

Descriptions of options for using an Amazon Redshift table as a target in a data flow.

The Amazon Redshift target supports the following features:

• input keys
• auto correct
• data deletion from a table before loading

Data Services Supplement for Big Data

92 PUBLIC Cloud computing services
• transactional loads
• load triggers, pre-load commands, and post-load commands
• bulk loading
When you use the bulk load feature, Data Services generates files and saves the files to the bulk load
directory that is defined in the Amazon Redshift datastore. If there is no value set for the bulk load
directory, the software saves the data files to the default bulk load location at: %DS_COMMON_DIR%/log/
BulkLoader. Data Services then copies the files to Amazon S3 and executes the Redshift copy command
to upload the data files to the Redshift table.

 Note

The Amazon Redshift primary key is informational only and the software does not enforce key constraints
for the primary key. Be aware that using SELECT DISTINCT may return duplicate rows if the primary key is
not unique.

 Note

The Amazon Redshift ODBC driver does not support parallelize load via ODBC into a single table. Therefore,
the Number of Loaders option in the Options tab is not applicable for a regular loader.

Bulk loader tab

Option Description

Bulk load Select to use bulk loading options to write the data.

Mode Select the mode for loading data in the target table:

• Append: Adds new records to the table.

 Note

Append mode does not apply to template tables.

• Truncate: Deletes all existing records in the table, and then adds new records.

S3 file location Enter or select the path to the Amazon S3 configuration file. You can enter a variable for
this option.

Maximum rejects Enter the maximum number of acceptable errors. After the maximum is reached, the
software stops Bulk loading. Set this option when you expect some errors. If you enter 0,
or if you do not specify a value, the software stops the bulk loading when the first error
occurs.

Column delimiter Enter a single-character column delimiter.

Text delimiter Enter a single-character text delimiter.

If you insert a Text delimiter, other than a single quote (‘), as well as a comma (,) for the
Column delimiter, Data Services will treat the data file as a .csv file.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 93
Option Description

Generate files only Enable to generate data files that you can use for bulk loading.

When enabled, the software loads data into data files instead of the target in the data flow.
The software writes the data files into the bulk loader directory specified in the datastore
definition.

If you do not specify a bulk loader directory, the software writes the files to
<%DS_COMMON_DIR%>\log\bulkloader\<tablename><PID>. Then you man
ually copy the files to the Amazon S3 remote system.

The file name is

<tablename><PID>_<timestamp>_<loader_number>_<number of
files generated by each loader>.dat, where <tablename> is the name
of the target table.

Clean up bulk loader directory Enable to delete all bulk load-oriented files from the bulk load directory and the Amazon
after load S3 remote system after the load is complete.

Parameters Allows you to enter some Amazon Redshift copy command data conversion parameters,
such as escape, emptyasnull, blanksasnull, ignoredblanklines, and so on. These parame
ters define how to insert data to a Redshift table. For more information about the param
eters, see https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html#r_COPY-syn
tax-overview-optional-parameters .

General settings

Option Description

Number of loaders Sets the number of threads to generate multiple data files
for a parallel load job. Enter a positive integer for the number
of loaders (threads).

Parent topic: Amazon Redshift database [page 85]

Related Information

Amazon Redshift datastores [page 87]

Configuring a Redshift DSN connection on Linux using ODBC [page 90]
Amazon Redshift source [page 91]
Amazon Redshift data types [page 94]

4.1.1.5 Amazon Redshift data types

SAP Data Services converts Redshift data types to the internal data types when it imports metadata from a
Redshift source or target into the repository.

The following table lists the internal data type that Data Services uses in place of the Redshift data type.

Data Services Supplement for Big Data

94 PUBLIC Cloud computing services
Convert Redshift to Data Services internal data type

Redshift data type Converts to Data Services data type

smallint int

integer int

bigint decimal(19,0)

decimal decimal

real real

float double

boolean varchar(5)

char char

 Note

The char data type doesn't support multibyte characters. The maximum range is 4096
bytes.

nchar char

varchar varchar

nvarchar
 Note

The varchar and nvarchar data types support UTF8 multibyte characters. The size is the
number of bytes and the max range is 65535.

 Caution

If you try to load multibyte characters into a char or nchar data type column, Redshift
issues an error. Redshift internally converts nchar and nvarchar data types to char and
varchar. The char data type in Redshift doesn't support multibyte characters. Use overflow
to catch the unsupported data or, to avoid this problem, create a varchar column instead of
using the char data type.

date date

timestamp datetime

text varchar(256)

bpchar char(256)

The following data type conversions apply when you use a Redshift template table as the target.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 95
Data Services internal data type to Redshift data type

Data Services data type Redshift template table data type

blob varchar(max)

date date

datetime datetime

decimal decimal

double double precision

int integer

interval float

long varchar(8190)

real float

time varchar(25)

timestamp datetime

varchar varchar/nvarchar

char char/nchar

Parent topic: Amazon Redshift database [page 85]

Related Information

Amazon Redshift datastores [page 87]

Configuring a Redshift DSN connection on Linux using ODBC [page 90]
Amazon Redshift source [page 91]
Amazon Redshift target table options [page 92]

4.1.2 Azure SQL database

Developers and administrators who use Microsoft SQL Server can store on-premise SQL Server workloads on
an Azure virtual machine in the cloud.

The Azure virtual machine supports both Unix and Windows platforms.

Moving files to and from Azure containers [page 97]

Data Services Supplement for Big Data

96 PUBLIC Cloud computing services
To move blobs to or from a Microsoft Azure container, use built-in functions in a script object and a file
location object.

Parent topic: Cloud databases [page 84]

Related Information

Amazon Redshift database [page 85]

Google BigQuery [page 98]
Google BigQuery ODBC [page 99]
SAP HANA Cloud, data lake database [page 99]
Snowflake [page 103]

4.1.2.1 Moving files to and from Azure containers

To move blobs to or from a Microsoft Azure container, use built-in functions in a script object and a file location
object.

 Note

Files are called blobs when they are in an Azure container.

Before you perform the following steps, create a file location object for the Azure container. Also create an
unstructured binary text file format that describes the blob in the Azure container. Use the file format in a data
flow to perform extra operations on the blob.

Use an existing Azure container or create a new one. Because SAP Data Services doesn't internally manipulate
the blobs in an Azure container, the blobs can be of any type. Currently, Data Services supports the block blob
in the container storage type.

To use built-in functions to upload files to a container storage blob in Microsoft Azure, perform the following
high-level steps:

1. Create a storage account in Azure and take note of the primary shared key.

For instructions, see your Microsoft documentation.

2. Create a file location object with the Azure Cloud Storage protocol.
3. Create a job in Data Services Designer and include the applicable file format as source or target.
4. Add scripts to the job that contain one or both of the following built-in functions:

To move files between remote and local directories, use the following scripts:
• copy_to_remote_system
• copy_from_remote_system

 Example

To access a subfolder in your Azure container, specify the subfolder in the following script:

Data Services Supplement for Big Data

Cloud computing services PUBLIC 97
copy_to_remote_system('New_FileLocation', '*', '<container_name>/
<remote_directory>/<sub_folder>')

The script copies all of the files from the local directory specified in the file location object to the
container specified in the same object. When you include the remote directory and subfolder in the
script, the function copies all of the files from the local directory to the subfolder specified in the script.

5. Save and run the job.

Task overview: Azure SQL database [page 96]

4.1.3 Google BigQuery

The Google BigQuery datastore contains access information and passwords so that the software can open your
Google BigQuery account on your behalf.

After accessing your account, SAP Data Services can load data to or extract data from your Google BigQuery
projects:

• Extract data from a Google BigQuery table to use as a source for Data Services processes.
• Load generated data from Data Services to Google BigQuery for analysis.
• Automatically create and populate a table in your Google BigQuery dataset by using a Google BigQuery
template table.

You can also reference the Google BigQuery datastore in the built-in function load_from_gcs_to_gbq, to
load data from your GCS to a Google BigQuery table. For details about the function, see the Reference Guide.

 Note

We recommend that you create a Google BigQuery ODBC datastore instead of the Google BigQuery
datastore. The Google BigQuery ODBC datastore uses the Magnitude Simba ODBC driver for BigQuery,
which supports standard SQL and more data types than the Google BigQuery datastore. For more
information, see SAP Note 3241713 .

For complete information about how Data Services supports Google BigQuery, see the Supplement for Google
BigQuery.

Parent topic: Cloud databases [page 84]

Related Information

Amazon Redshift database [page 85]

Azure SQL database [page 96]
Google BigQuery ODBC [page 99]
SAP HANA Cloud, data lake database [page 99]
Snowflake [page 103]

Data Services Supplement for Big Data

98 PUBLIC Cloud computing services
4.1.4 Google BigQuery ODBC

With a Google BigQuery ODBC datastore, make ODBC calls to your Google BigQuery data sets to download,
process, and upload data in SAP Data Services.

To access the data in your Google BigQuery account, the datastore uses the Simba ODBC driver for Google
BigQuery, which supports the OAuth 2.0 protocol for authentication and authorization. Configure the driver to
provide your credentials and authenticate the connection to the data using either a Google user account or a
Google service account.

 Note

Beginning with Data Services 4.2.13, we recommend that you create a Google BigQuery ODBC datastore
instead of the Google BigQuery datastore. The Google BigQuery ODBC datastore uses the Magnitude
Simba ODBC driver for BigQuery, which supports standard SQL and more data types than the Google
BigQuery datastore.

 Note

Data Services supports Google BigQuery ODBC datastore on Windows and Linux platforms only.

For information about how Data Services supports Google BigQuery ODBC, see the Supplement for Google
BigQuery.

Parent topic: Cloud databases [page 84]

Related Information

Amazon Redshift database [page 85]

Azure SQL database [page 96]
Google BigQuery [page 98]
SAP HANA Cloud, data lake database [page 99]
Snowflake [page 103]

4.1.5 SAP HANA Cloud, data lake database

Access your data lake database in an SAP HANA Cloud by creating a database datastore.

SAP HANA Cloud, data lake consists of a data lake Relational Engine and data lake Files:

• The data lake Relational Engine provides high-performance analysis for petabyte volumes of relational
data.
• Data lake Files provide managed access to structured, semistructured, and unstructured data that is
stored as files in the data lake.

For more information about SAP HANA Cloud, data lake, see the documentation on the SAP Customer Portal.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 99
SAP HANA Cloud, data lake database is a cloud version of SAP IQ. It uses native ODBC support. Access your
data in your data lake with a server-named (DSN-less) and a data source name (DSN) connection. SSL/TLS 1.2
protocol is required.

SAP HANA Cloud, data lake supports bulk loading and pushdown to the database level. Further, you don't need
to use VPN or VPC to connect to your data lake database.

Prerequisites

Download the SAP HANA client server package from the SAP Software center at https://
launchpad.support.sap.com/#/softwarecenter . Enter the search term HANADLCLIENT. Select to download
the SAP HANA data lake client package based on your operating system and the applicable version. The
package includes the native ODBC driver, which you must configure using either the ODBC Driver Selector
(Windows) or the DS Connection Manager (Linux).

For more information about installing the SAP HANA Cloud data lake client, see the SAP HANA Cloud, Data
Lake Client Interfaces guide.

Datastore options for SAP HANA Cloud, data lake [page 100]
The datastore for SAP HANA Cloud, data lake contains the connection information that SAP Data
Services uses to connect to your database.

Configure ODBC driver in DS Connection Manager for Linux [page 101]

Configure the ODBC driver for connecting to your SAP HANA Cloud, data lake database for Linux using
the DS Connection Manager. Also configure the DSN in the DS Connection Manager, when applicable.

Parent topic: Cloud databases [page 84]

Related Information

Amazon Redshift database [page 85]

Azure SQL database [page 96]
Google BigQuery [page 98]
Google BigQuery ODBC [page 99]
Snowflake [page 103]

4.1.5.1 Datastore options for SAP HANA Cloud, data lake

The datastore for SAP HANA Cloud, data lake contains the connection information that SAP Data Services
uses to connect to your database.

Create and configure a data lake datastore following the basic instructions in Creating a database datastore in
the Designer Guide.

The following table contains the options to complete that are specific for data lake.

Data Services Supplement for Big Data

100 PUBLIC Cloud computing services
Option Value

Datastore Type Database

Database Type SAP HANA data lake

Use SSL encryption SSL/TLS encryption is required, therefore you can't change
this value from Yes.

Encryption parameters Double-click in the cell to open the Encryption Parameters

dialog box.

Encryption Parameters dialog box

TLS options Enter the SSL/TLS options. Enter the following string:
tls_type=rsa;direct-yes
• TLS_Type: RSA is the only supported encryption type
for the connection.
• DIRECT: Yes is the only acceptable value because DI
RECT through a proxy is the only supported connection
type.

Make sure not to include any spaces in the string.

Parent topic: SAP HANA Cloud, data lake database [page 99]

Related Information

Configure ODBC driver in DS Connection Manager for Linux [page 101]

4.1.5.2 Configure ODBC driver in DS Connection Manager

for Linux

Configure the ODBC driver for connecting to your SAP HANA Cloud, data lake database for Linux using the DS
Connection Manager. Also configure the DSN in the DS Connection Manager, when applicable.

The following example shows the steps to configure the ODBC driver for SAP HANA Cloud, data lake using
the SAP Data Services Connection Manager. For details about using the Connection Manager, see Using the
Connection Manager in the Administrator Guide.

 Example

The following sample code show the options in the DS Connection Manager to configure the ODBC driver
for a DSN-less connection. Options for which you supply your own information are shown in <> in bold.

Specify one database type:

...list of database types...
15) SAP Hana Data Lake ODBC Driver
Specify database index #'1'
15
********************************

Data Services Supplement for Big Data

Cloud computing services PUBLIC 101
Configuration for SAP Hana Data Lake
********************************

The ODBC inst file is /<path>/odbcinst.ini.

Specify the Driver Name:

<Driver_Name>

Specify the User Name:

<User_Name>

Type database password:(no echo)
Retype database password:(no echo)
Specify the Driver:

/<path>/libdbodbc17.so

Specify the Host Name:

<url.hanacloud.ondemand.com
>

Specify the Port:'443'
Testing connection...
Successfully added driver.
Press Enter to go back to the Main Menu.

When you use a data source name (DSN) connection, configure the DSN in the DS Connection Manager as
shown in the following example:

 Example

Specify one database type:

1) ...list of database types...
21) SAP Hana Data Lake
Specify database index #'1'
21
********************************
Configuration for SAP Hana Data Lake
********************************

The ODBC ini file is /<path>/odbc.ini

There are available DSN names in the file:

<[list of existing DSN names]>

Specify the DSN name from the list or add a new one:

<new_dsn>

Specify the User Name:

<user_name>

Type database password:(no echo)
Retype database password:(no echo)
Specify the Driver:

/<path>/libdbodbc17.so

Specify the Host Name:

<url.hanacloud.ondemand.com
>

Specify the Port:'443'
Testing connection...
Successfully added database source.
Press Enter to go back to the Main Menu.

Parent topic: SAP HANA Cloud, data lake database [page 99]

Related Information

Datastore options for SAP HANA Cloud, data lake [page 100]

Data Services Supplement for Big Data

102 PUBLIC Cloud computing services
4.1.6 Snowflake

Snowflake provides a data warehouse that is built for the cloud.

After connecting to Snowflake, you can do the following:

• Import tables
• Read or load Snowflake tables in a data flow
• Create and load data into template tables
• Browse and import the tables located under different schemas (for example, Netezza)
• Preview data
• Push down base SQL functions and Snowflake-specific SQL functions (see SAP Note 2212730 )
• Bulkload data (possible through AWS S3 File Location or Azure Cloud Storage File Location)

For instructions to create a target template table, see the Template tables section of the Designer Guide.

Using an ODBC driver to connect to Snowflake on Windows [page 104]

Use the ODBC Drivers Selector utility bundled with SAP Data Services to configure Snowflake as a
source for SAP Data Services.

Using an ODBC driver to connect to Snowflake on Linux [page 105]

Use the DS Connection Manager to configure Snowflake as a source for SAP Data Services.

Snowflake datastore [page 111]

Options specific to Snowflake when you create or edit a datastore.

Snowflake source [page 111]

Option descriptions for using a Snowflake database table as a source in a data flow.

Snowflake data types [page 113]

SAP Data Services converts Snowflake data types to the internal data types when it imports metadata
from a Snowflake source or target into the repository.

Snowflake target table options [page 114]

When you use a Snowflake table as a target in SAP Data Services, complete the target options.

Parent topic: Cloud databases [page 84]

Related Information

Amazon Redshift database [page 85]

Azure SQL database [page 96]
Google BigQuery [page 98]
Google BigQuery ODBC [page 99]
SAP HANA Cloud, data lake database [page 99]

Data Services Supplement for Big Data

Cloud computing services PUBLIC 103
4.1.6.1 Using an ODBC driver to connect to Snowflake on
Windows

Use the ODBC Drivers Selector utility bundled with SAP Data Services to configure Snowflake as a source for
SAP Data Services.

For more information about the ODBC Drivers Selector utility, see Using the ODBC Drivers Selector for
Windows in the Administrator Guide.

Before you start to configure the connection to Snowflake, make sure you meet the following prerequisites:

• You have correctly installed the Snowflake ODBC Driver from the Snowflake official website.
See the SAP Data Services Product Availability Matrix (PAM) for the latest client version that we support.
For more information about the Snowflake ODBC driver, see the Snowflake User Guide on the Snowflake
website.
• You have database account credentials for the ODBC connection to Snowflake.

To connect to Snowflake through a DSN, perform the following steps:

1. Configure a Data Source Name (DSN).

1. Open ODBC Data Source Administrator from your Windows Create a Snowflake database datastore
using the DSN you just created. For more information, see Start menu or click the ODBC Administrator
button in the Datastore Editor when you create the Snowflake datastore in Data Services.
2. In the ODBC Administrator, open the System DSN tab and select the ODBC driver for Snowflake that
you just installed.
3. Click Configure.
4. Enter the required information (for example, UserCreate a Snowflake database datastore using
the DSN you just created. For more, Password, Server, Database, and so on) into the Snowflake
Configuration Dialog window and click Save. For information about the connection parameters, see the
Snowflake User Guide.
2. Creating a database datastore in the Designer Guide and Snowflake datastore [page 111].

To connect to Snowflake in server-based mode, perform the following steps:

1. Open the Data Services Designer and perform the steps in Creating a database datastore.
2. Select the correct values for the datastore options.
• Datastore Type: Database
• Database Type: Snowflake
• Database Version: Select the latest compatible database version.
• Use data source name (DSN): Leave this option unchecked.
For descriptions of the remaining options for the Snowflake datastore, see Common datastore options in
the Designer Guide and Snowflake datastore [page 111].
3. Click Apply or OK.

Parent topic: Snowflake [page 103]

Data Services Supplement for Big Data

104 PUBLIC Cloud computing services
Related Information

Using an ODBC driver to connect to Snowflake on Linux [page 105]

Snowflake datastore [page 111]
Snowflake source [page 111]
Snowflake data types [page 113]
Snowflake target table options [page 114]

4.1.6.2 Using an ODBC driver to connect to Snowflake on

Linux

Use the DS Connection Manager to configure Snowflake as a source for SAP Data Services.

Before you perform the steps in this topic, read Configure drivers with data source name (DSN) connections in
the Administrator Guide.

For details about using the Connection Manager, see Using the Connection Manager in the Administrator
Guide.

This topic contains the following information:

• Preparation [page 105]

• Connect to Snowflake in server-based mode [page 106]
• Connect to Snowflake through DSN [page 108]

Preparation

Before using an ODBC driver to connect to Snowflake, make sure you perform the following required steps:

• Download and install the Snowflake ODBC driver from the Snowflake website.
See the SAP Data Services Product Availability Matrix (PAM) for the latest supported client version.
For more information about the Snowflake ODBC driver, see the Snowflake User Guide on the Snowflake
website.
• Obtain database account credentials for the ODBC connection to Snowflake.
• Install the unixODBC tool (version V2.3 or higher) on your Linux system with the software package tool
‘yum’ for Redhat or ‘zypper’ for SuSE Linux.
• Set the following system environment variables:
• ODBCSYSINI=<Path where you place the $ODBCINI and $ODBCINST files>
• ODBCINI=<DSN configuration file>
• ODBCINST=<Driver configuration file>
For example:

export ODBCSYSINI=/home/dbcfg/

export ODBCINI=/home/dbcfg/odbc.ini

export ODBCINST=/home/dbcfg/odbcinst.ini

Data Services Supplement for Big Data

Cloud computing services PUBLIC 105
Data source sample in $ODBCINI:

[SF_DSN_EXAMPLE]

Driver=/usr/lib64/snowflake/odbc/lib/libSnowflake.so

UID=USER_EXAMPLE

PWD=******

SERVER=******.snowflakecomputing.com

PORT=443

DATABASE=DB_EXAMPLE

SCHEMA=
WAREHOUSE=

ROLE=AUTHENTICATOR=SNOWFLAKE_JWT

PRIV_KEY_FILE=/home/dbcfg/snowflake/rsa_key.p8

PRIV_KEY_FILE_PWD=******

Driver sample in $ODBCINST:

[SF_EXAMPLE]

driver=/usr/lib64/snowflake/odbc/lib/libSnowflake.so

unixODBC=/usr/lib64

 Note

• The job engine internally uses the bundled file $LINK_DIR/bin/ds_odbc.ini to get to the ODBC
Driver Manager, so the file cannot be referenced by the system environment ODBCINI.
• The two files that are referenced by $ODBCINI and $ODBCINST must be accessible, readable, and
writable by SAP Data Services.

Connect to Snowflake in server-based mode

To use an ODBC driver to connect to Snowflake in server-based mode, do the following:

1. Navigate to the home directory for SAP Data Services.

2. Move into the subfolder bin and then source ./al_env.sh.
3. Run the Connection Manager utility: DSConnectionManager.sh.
For more information, see Using the Connection Manager in the Administrator Guide.
4. Configure the ODBC driver (enter 2).

***************************************************

SAP Data Services Connection Manager
***************************************************
------------------Start Menu------------------
Connection Manager is used to configure Data Sources or Drivers.
1: Configure Data Sources
2: Configure Drivers
q: Quit Program
Select one command:'1'
2

5. Add a new driver (enter a) or edit an existing driver (enter e).

------------------Main Menu-Configure Drivers------------------

Drivers:
Index Database Type Version State Name
------ ------ ------ ------ ------
1: Sybase IQ 16 installed SAPIQ16
The command list:
Use Ctrl+B+Enter to go back to the main menu.

Data Services Supplement for Big Data

106 PUBLIC Cloud computing services
a: Add a new driver
e: Edit an existing driver
d: Delete an existing driver
s: Restart the Data Services Job Service and the EIM Adaptive Processing
Server
q: Quit to Start Menu
Select one command:

6. Specify a database type. In this example you would enter 10 for Snowflake. Note that numbers may change
from release to release.

1) MySQL Driver for version [8.x, 5.7, 5.6, 5.5, 5.1, 5.0]

2) HANA Driver for version [2, 1]
3) Teradata Driver for version [17.10, 16.20, 16.00, 15.10, 15.00,
14.10, 14.00, 13.10, 13.0, 12]
4) Netezza Driver for version [11, 7, 6, 5, 4]
5) Sybase IQ Driver for version [15, 16]
6) Informix Driver for version [14, 12, 11]
7) DB2 UDB Driver for version [11, 10, 9]
8) Oracle Driver for version [19, 18, 12, 11, 10, 9]
9) SQL Anywhere Driver for version [17, 16, 12]
10) Snowflake Driver for version [3.x, 5.x]
11) Hive Server Driver for version [2.x.y, 2.1.2, 2.1.1]
12) PostgreSQL Driver for version [10.x, 12.x, 13.x, 14.x]
13) Google BigQuery Driver
14) Amazon Redshift Driver for version [8.x]
15) SAP Hana Data Lake ODBC Driver

7. Complete the datastore options.

1) Driver Version: <Version of the driver (3.x or 5.x)>

2) Driver Name: <Driver name that is used by Data Services and saved in
DSConfig.txt>

3) Driver: <Snowflake ODBC driver (see Preparation section above for more
information)>

4) Host Name: <Host system name>

5) Port: <Port number for the database server>

6) Database Name: <Database name>

7) Unix ODBC Lib Path: <unixODBC library path (see Preparation section
above for more information)>

8) User Name: <Database user name>

9) Authentication Method: <Authentication method (Basic or Key Pair)>

10) Password: <Password (required for Basic authentication)>

11) Private Key File: <Private key file generated by Openssl>

12) Private Key File Password: <Password used to encrypt the private key
file (required for Key Pair authentication)>

13) JWT timeout: <JSON-based web token timeout value (required for Key
Pair authentication)>

For more information about the Snowflake security-related parameters, see the Snowflake User Guide on
the Snowflake website.

The following examples show the steps necessary to configure the ODBC driver for Snowflake using the SAP
Data Services Connection Manager.

Basic authentication example:

Data Services Supplement for Big Data

Cloud computing services PUBLIC 107
Specify the Driver:

/usr/lib64/snowflake/odbc/lib/libSnowflake.so

Specify the Host Name:

******.snowflakecomputing.com

Specify the Port:'443'

Specify the Database:

DB_EXAMPLE

Specify the Unix ODBC Lib Path:’’

/usr/lib64

Specify the User Name:

USER_EXAMPLE

Specify the Authentication Method(1:Basic|2:Key Pair):'1'

1

Type database password:(no echo)
Retype database password:(no echo)
Testing connection...
Successfully added driver.
Press Enter to go back to the Main Menu.

Key Pair authentication example:

********************************

Configuration for Snowflake
********************************
The ODBC inst file is <***/***>/odbcinst.ini.
Specify the Driver Version:'3.x'

5.x

Specify the Driver Name:

SF_EXAMPLE

Specify the Driver:

/usr/lib64/snowflake/odbc/lib/libSnowflake.so

Specify the Host Name:

******.snowflakecomputing.com

Specify the Port:'443'

Specify the Database:

DB_EXAMPLE

Specify the Unix ODBC Lib Path:’’

/usr/lib64

Specify the User Name:

USER_EXAMPLE

Specify the Authentication Method(1:Basic|2:Key Pair):'1'

2

Specify the Private Key File:

/home/ssl/snowflake/rsa_key.p8

Specify the Private Key File Password:
Specify the JWT timeout(in seconds):'30'
Testing connection...
Successfully added driver.
Press Enter to go back to the Main Menu.

Connect to Snowflake through DSN

To use an ODBC driver to connect to Snowflake through DSN, do the following:

1. Navigate to the home directory for SAP Data Services.

Data Services Supplement for Big Data

108 PUBLIC Cloud computing services
4. Configure the data source (enter 1).

***************************************************

SAP Data Services Connection Manager
***************************************************
------------------Start Menu------------------
Connection Manager is used to configure Data Sources or Drivers.
1: Configure Data Sources
2: Configure Drivers
q: Quit Program
Select one command:'1'
1

5. Add a new database source (enter a) or edit an existing database source (enter e).

------------------Main Menu-Configure Data Sources------------------

Data Sources:
Index Database Type Name
------ ------ ------
No database source is configured.
The command list:
Use Ctrl+B+Enter to go back to the main menu.
a: Add a new database source
e: Edit an existing database source
d: Delete an existing database source
r: Replicate an existing database source
s: Restart the Data Services Job Service and the EIM Adaptive Processing
Server
q: Quit to Start Menu
Select one command:
a

6. Specify a database type. In this example you would enter 18 for Snowflake. Note that numbers may change
from release to release.

Specify one database type:

1) MySQL
2) Microsoft SQL Server via DataDirect
3) SAP HANA
4) IBM DB2 on iSeries or zSeries via DataDirect
5) Teradata
6) Netezza NPS
7) Sybase IQ
8) Sybase ASE
9) IBM Informix IDS
10) Attunity
11) SQL Anywhere
12) Vertica
13) Amazon Redshift
14) Apache Cassandra
15) Hive Server
16) Apache Impala
17) SAP Vora
18) Snowflake
19) PostgreSQL
20) Google BigQuery
21) SAP Hana Data Lake

7. Complete the remaining options.

1) DSN Name: <Data source name that will be shown in $ODBCINI>

2) User Name: <Database user name>

3) Password: <Password (required for Basic authentication)>

4) Unix ODBC Lib Path: <unixODBC library path (see Preparation section
above for more information)>

5) Driver Version: <Driver version>

Data Services Supplement for Big Data

Cloud computing services PUBLIC 109
6) Driver: <Snowflake ODBC driver (see Preparation section above for more
information)>

7) Host Name: <Host system name>

8) Port: <Port number for the database server>

9) Database Name: <Database name>

For more information about the Snowflake security-related parameters, see the Snowflake User Guide on
the Snowflake website.

The following example shows the steps to configure a data source for Snowflake using the SAP Data Services
Connection Manager.

********************************

Configuration for Snowflake
********************************
Specify the DSN name from the list or add a new one:

SF_DSN_EXAMPLE

Specify the User Name:

USER_EXAMPLE

Type database password:(no echo)
Retype database password:(no echo)
Specify the Unix ODBC Lib Path:

/usr/lib64

Specify the Driver Version:'3.x'

5.x

Specify the Driver:

/usr/lib64/snowflake/odbc/lib/libSnowflake.so

Specify the Host Name:

******.snowflakecomputing.com

Specify the Port:'443'
Specify the Database:

DB_EXAMPLE

Testing connection...
Successfully added driver.
Press Enter to go back to the Main Menu.

 Note

After configuration, the information will be reflected in the local file $ODBCINI and $LINK_DIR/bin/
ds_odbc.ini.

For steps to define a database datastore, see Creating a database datastore in the Designer Guide.

For information about Snowflake database datastore options, see Snowflake datastore [page 111].

Parent topic: Snowflake [page 103]

Related Information

Using an ODBC driver to connect to Snowflake on Windows [page 104]

Snowflake datastore [page 111]
Snowflake source [page 111]
Snowflake data types [page 113]
Snowflake target table options [page 114]

Data Services Supplement for Big Data

110 PUBLIC Cloud computing services
4.1.6.3 Snowflake datastore
Options specific to Snowflake when you create or edit a datastore.

For information about creating a database datastore and descriptions for common datastore options, see the
Designer Guide.

Snowflake-specific datastore options

Option Description

Authentication Method Authentication methods include:

• Basic (default): User and password authentication.

• Key Pair: Enhanced authentication security that uses a client key and
password (for encrypted version private keys only).

 Note

The Key Pair authentication method is supported on server-based

Snowflake (DSN-less) connections only.

Private Key File The private key file.

Private Key File Password The password used to protect the private key file. A password is required for
encrypted version private keys only.

JWT Timeout (in seconds) JSON-based web token timeout value. The default is 30 seconds.

For more detailed information about key pair authentication, see the Snowflake documentation.

Parent topic: Snowflake [page 103]

Related Information

Using an ODBC driver to connect to Snowflake on Windows [page 104]

Using an ODBC driver to connect to Snowflake on Linux [page 105]
Snowflake source [page 111]
Snowflake data types [page 113]
Snowflake target table options [page 114]
Creating a database datastore
Common datastore options

4.1.6.4 Snowflake source

Option descriptions for using a Snowflake database table as a source in a data flow.

When you use a Snowflake table as a source, the software supports the following features:

Data Services Supplement for Big Data

Cloud computing services PUBLIC 111
• All Snowflake data types
• SQL functions and snowflake-specific SQL function
• Push-down ODBC generic functions

The following table lists source options when you use a Snowflake table as a source:

Option Description

Table name Name of the table that you added as a source to the data flow.

Table owner Owner that you entered when you created the Snowflake table.

Datastore name Name of the Snowflake datastore.

Database type Database type that you chose when you created the datastore. You cannot change
this option.

Table Schema Name of the table schema.

Performance settings

Option Description

Join Rank Specifies the rank of the data file relative to other tables and
files joined in a data flow.

Enter a positive integer. The default value is 0.

The software joins sources with higher join ranks before join
ing sources with lower join ranks.

If the data flow includes a Query transform, the join rank

specified in the Query transform overrides the Join Rank
specified in the File Format Editor.

For new jobs, specify the cache only in the Query transform
editor.

For more information about setting “Join rank”, see “Source-

based performance options” in the Performance Optimiza
tion Guide.

Data Services Supplement for Big Data

112 PUBLIC Cloud computing services
Option Description

Cache Specifies whether the software reads the data from the
source and load it into memory or pageable cache.

• Yes: Always caches the source unless the source is the

outer-most source in a join. Yes is the default setting.
• No: Never caches the source.

If the data flow includes a Query transform, the cache set

ting specified in the Query transform overrides the Cache
setting specified in the Format File Editor tab.

For new jobs, specify the cache only in the Query transform
editor.

For more information about caching, see “Using Caches” in

the Performance Optimization Guide.

Array fetch size Indicates the number of rows retrieved in a single request
to a source database. The default value is 1000. Higher num
bers reduce requests, lowering network traffic, and possibly
improve performance. Maximum value is 5000.

Parent topic: Snowflake [page 103]

Related Information

Using an ODBC driver to connect to Snowflake on Windows [page 104]

Using an ODBC driver to connect to Snowflake on Linux [page 105]
Snowflake datastore [page 111]
Snowflake data types [page 113]
Snowflake target table options [page 114]

4.1.6.5 Snowflake data types

SAP Data Services converts Snowflake data types to the internal data types when it imports metadata from a
Snowflake source or target into the repository.

The following table lists the internal data type that Data Services uses in place of the Snowflake data type.

Snowflake data type Converts to Data Services data type Notes

byteint/tinyint decimal(38,0)

Data Services Supplement for Big Data

Cloud computing services PUBLIC 113
Snowflake data type Converts to Data Services data type Notes

smallint decimal(38,0)

int/integer decimal(38,0)

bigint decimal(38,0)

number/numeric/decimal decimal Default of precision is 38.

float double

double double

real double

varchar varchar

char varchar Default is 1 byte.

string/text varchar Default is 16 mbyte.

boolean int

binary blob

varbinary blob

datetime/timestamp datetime

date date

time time

semi-structure not supported VARIANT, OBJECT, ARRAY

Parent topic: Snowflake [page 103]

Related Information

Using an ODBC driver to connect to Snowflake on Windows [page 104]

Using an ODBC driver to connect to Snowflake on Linux [page 105]
Snowflake datastore [page 111]
Snowflake source [page 111]
Snowflake target table options [page 114]

4.1.6.6 Snowflake target table options

When you use a Snowflake table as a target in SAP Data Services, complete the target options.

The Snowflake target supports the following features:

Data Services Supplement for Big Data

114 PUBLIC Cloud computing services
• transactional loads
• load triggers, pre-load commands, and post-load commands
• bulk loading

After you add the Snowflake table as a target in a data flow, open the Bulk loader tab and complete the options
as described in the following table.

Bulk loader tab

Option Description

Bulk load Select to use bulk loading options to write the data.

Mode Select the mode for loading data in the target table:

• Append: Adds new records to the table.

 Note

Append mode does not apply to template tables.

• Truncate: Deletes all existing records in the table, and

then adds new records.

Remote Storage Select the remote storage method:

• Amazon S3: Utilizes the S3 file location to copy the local

data file into the staging storage temporarily and then
load it to the target Snowflake table. Both the staging lo
cal file and the duplicated one in the staging storage can
be cleaned according to the setting to the bulk loader.
• Microsoft Azure: Utilizes the Azure file location to copy
the local data file into the staging storage temporarily
and then load it to the target Snowflake table. Both the
staging local file and the duplicated one in the staging
storage can be cleaned according to the setting to the
bulk loader.

File Location Enter or select the corresponding Amazon S3 or Microsoft

Azure file location. You can enter a variable for this option.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 115
Option Description

Generate files only Enable to generate data files that you can use for bulk load
ing.

When enabled, the software loads data into data files instead
of the target in the data flow. The software writes the data
files into the bulk loader directory specified in the datastore
definition.

If you do not specify a bulk loader directory, the

software writes the files to %DS_COMMON_DIR%/log/
BulkLoader/<filename> . Then you manually copy
the files to the Amazon S3 or Microsoft Azure remote sys
tem.

The file name is

SFBL_<DatastoreName>_<DBname>_<SchemaNam
e>_<TableName>_***.dat, where <TableName> is
the name of the target table.

Clean up bulk loader directory after load Enable to delete all bulk load-oriented files from the bulk
load directory after the load is complete.

Complete the General settings as described in the following table:

Option Description

Column comparison Specifies how the software maps the input columns to per
sistent cache table columns.

• Compare by position: The software disregards the col

umn names and maps source columns to target col
umns by position.
• Compare by name: The software maps source columns
to target columns by name. Compare by name is the
default setting.

Number of loaders Sets the number of threads to generate multiple data files
for a parallel load job. Enter a positive integer for the number
of loaders (threads).

Also complete error handling and transaction control options as described in the following topics in the
Reference Guide:

• Error Handling options

• Transaction control

Parent topic: Snowflake [page 103]

Data Services Supplement for Big Data

116 PUBLIC Cloud computing services
Related Information

Using an ODBC driver to connect to Snowflake on Windows [page 104]

Using an ODBC driver to connect to Snowflake on Linux [page 105]
Snowflake datastore [page 111]
Snowflake source [page 111]
Snowflake data types [page 113]

4.2 Cloud storages

Access various cloud storages through file location objects and gateways.

File location objects specify specific file transfer protocols so that SAP Data Services safely transfers data from
server to server.

For information about the SAP Big Data Services to access Hadoop in the cloud, see Supplement for Big Data
Services.

Amazon S3 [page 118]

Amazon Simple Storage Service (S3) is a product of Amazon Web Services that provides scalable
storage in the cloud.

Azure Blob Storage [page 123]

In SAP Data Services Designer, we refer to Azure Blob Storage as Azure Cloud Storage.

Azure Data Lake Storage [page 130]

Use an Azure Data Lake Storage file location object to download data from and upload data to your
Azure Data Lake Storage.

Google Cloud Storage file location [page 134]

A Google Cloud Storage (GCS) file location contains file transfer protocol information for moving large
data files, 10 MB and larger, between GCS and SAP Data Services.

Parent topic: Cloud computing services [page 84]

Related Information

Cloud databases [page 84]

Upload data to HDFS in the cloud [page 15]

Data Services Supplement for Big Data

Cloud computing services PUBLIC 117
4.2.1 Amazon S3

Amazon Simple Storage Service (S3) is a product of Amazon Web Services that provides scalable storage in
the cloud.

Store large volumes of data in an Amazon S3 cloud storage account. Then use SAP Data Services to securely
download your data to a local directory. Configure a file location object to specify both your local directory and
your Amazon S3 directory.

The following table describes the Data Services built-in functions specific for Amazon S3.

Built-In Functions
Function Description

load_from_s3_to_redshift Moves data from Amazon S3 to Amazon Redshift.

copy_to_remote_system Concatenates the remote directory that you specify in the

copy function with the information in the file location object
to form a full directory structure that includes subfolders.

copy_from_remote_system Concatenates the remote directory that you specify in the

copy function with the information in the file location object
to form a full directory structure that includes subfolders.

load_from_fl_to_snowflake Loads data from files in Azure Cloud Storage or Amazon S3

storage to a specified table in your Snowflake cloud storage.

For descriptions of all built-in functions, see the “” section in the .

Amazon S3 file location protocol options [page 118]

When you configure a file location object for Amazon S3, complete all applicable options, especially the
options specific to Amazon S3.

Parent topic: Cloud storages [page 117]

Related Information

Azure Blob Storage [page 123]

Azure Data Lake Storage [page 130]
Google Cloud Storage file location [page 134]

4.2.1.1 Amazon S3 file location protocol options

When you configure a file location object for Amazon S3, complete all applicable options, especially the options
specific to Amazon S3.

Use a file location object to access data or upload data stored in your Amazon S3 account. To view options
common to all file location objects, see the Reference Guide. The following table describes the file location
options that are specific to the Amazon S3 protocol.

Data Services Supplement for Big Data

118 PUBLIC Cloud computing services
 Restriction

You must have "s3:ListBucket" rights in order to view a list of buckets or a special bucket.

Option Description

Access Key Specifies the Amazon S3 identification input value.

Secret Key Specifies the Amazon S3 authorization input value.

Region Specifies the name of the region you're transferring data to

and from; for example, "South America (Sao Palo)".

Server-Side Encryption Specifies the type of encryption method to use.

Amazon S3 uses a key to encrypt data at the object level

as it writes to disks in the data centers and then decrypts it
when the user accesses it:

• None
• Amazon S3-Managed Keys
• AWS KMS-Managed Keys
• Customer-Provided Keys

Data Services displays either one or none of the three re

maining encryption options based on your selection here.

Encryption Algorithm Specifies the encryption algorithm to use to encode the

data. For example, AES256,aws:kms.

AWS KMS Key ID Specifies whether to create and manage encryption keys via
the Encryption Keys section in AWS IAW console.

Leave this option blank to use a default key that is unique to

you, the service you're using, and the region in which you're
working.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 119
Option Description

AWS KMS Encryption Context Specifies the encryption context of the data.

The value is a base64-encoded UTF-8 string holding JSON

with the encryption context key-value pairs.

 Example

If the encryption context is {"fullName": "John Connor" },

you need base64-encoded: echo {"fullName":
"John Connor" } | openssl enc -base64
eyJmdWxsTmFtZSI6ICJKb2huIENvbm5vciIgf
SANCg==

Enter
eyJmdWxsTmFtZSI6ICJKb2huIENvbm5vciIgf
SANCg== in the encryption context option.

Customer Key Specifies the key.

Enter a value less-than or equal-to 256 bits.

Communication Protocol/Endpoint URL Specifies the communication protocol you use with S3.

• http
• https
• Enter the endpoint URL.

If you choose to enter the endpoint URL, consider the follow

ing information:

• If the endpoint URL is for http, and it contains a region,

ensure that you use a dash before the region.

 Example

For example, enter http://s3-

<region>.amazonaws.com. For the U.S. East
(N.Virginia) region, the endpoint is http://
s3.amazonaws.com. Notice the period instead
of a dash.

• If the endpoint URL is for https, enter the endpoint URL

using either a dash or a period.

 Example

Enter either https://s3-

<region>.amazonaws.com or https://
s3.<region>.amazonaws.com.

Data Services Supplement for Big Data

120 PUBLIC Cloud computing services
Option Description

Request Timeout (ms) Specifies the time to wait before the system times out while
trying to complete the request to the specified Amazon S3
account in milliseconds (ms).

The default setting is 20,000 ms, which equals 20 seconds.

If you set Request Timeout (ms) to 0, the system uses
the HTTPRequestTimeout option setting (in seconds)
in the AL_Engine section of the DSConfig.txt file.
The HTTPRequestTimeout option has a default value of
2000 seconds.

The system logs the timeout value in the trace file when you
configure the job to log trace messages:

1. Enable the Transform property in the batch job trace

properties.

2. Set the Rest_Debug_Level option to 1 in the

AL_Engine section of the DSConfig.txt file.
For more information about job tracing, see the following
topics:

• Batch job trace properties in the Designer Guide.

• Executing batch jobs in the Management Console Guide.

Compression Type Specifies the compression type to use.

The software compresses the files before uploading to S3

and decompresses the files after download from S3.

 Note

When you upload a file to the Amazon S3 cloud

server using the copy_to_remote_system()
function and gzip compression, Data Services
adds a .gz file extension to the file name.
For example, sample.txt.gz. Read about the
copy_to_remote_system() built-in function in
the Reference Guide.

When you download the file, Data Services decom

presses the file and removes the .gz extension from
the file name. For example, sample.txt.

Connection Retry Count Specifies the number of times the software tries to upload or
download data before stopping the upload or download.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 121
Option Description

Batch size for uploading data, MB Specifies the size of the data transfer to use for uploading
data to S3.

Data Services uses single-part uploads for files less than 5

MB in size, and multipart uploads for files larger than 5 MB.
Data Services limits the total upload batch size to 100 MB.

Batch size for downloading data, MB Specifies the size of the data transfer Data Services uses to
download data from S3.

Number of threads Specifies the number of upload threads and download

threads for transferring data to S3.

Storage Class Specifies the S3 cloud storage class to use to restore files.

• STANDARD: Default storage class.

• REDUCED_REDUNDANCY: For noncritical, reproducible
data.
• STANDARD_IA: Stores object data redundantly across
multiple geographically separated availability zones.
• ONEZONE_IA: Stores object data in only one availability
zone.

 Note

The GLACIER storage class isn't supported. Data Serv

ices can't specify GLACIER storage class during object
creation.

For more information about the storage classes, see the Am
azon AWS documentation.

Remote directory Optional. Specifies the name of the directory for Amazon S3
to transfer files to and from.

Bucket Specifies the name of the Amazon S3 bucket that contains

the data.

Local directory Optional. Specifies the name of the local directory to use to
create the files. If you leave this field empty, Data Services
uses the default Data Services workspace.

Proxy host, port, user name, password Specifies proxy information when you use a proxy server.

For information about load_from_s3_to_reshift and other built-in functions, see the Reference Guide.

For more information about file location common options, see the Reference Guide.

Parent topic: Amazon S3 [page 118]

Data Services Supplement for Big Data

122 PUBLIC Cloud computing services
Related Information

Amazon Redshift datastores [page 87]

4.2.2 Azure Blob Storage

In SAP Data Services Designer, we refer to Azure Blob Storage as Azure Cloud Storage.

To access your Azure Blob Storage from Data Services, create an Azure Cloud Storage file location object.

The following table describes the Data Services built-in functions specific for Azure Cloud Storage.

Built-In Functions
Function Description

load_from_azure_cloudstorage_to_synapse Loads data from your Azure Cloud Storage to your Azure
Synapse Analytics workspace.

load_from_fl_to_snowflake Loads data from files in Azure Cloud Storage or Amazon S3

storage to a specified table in your Snowflake cloud storage.

For details about the built-in function, see the “Data Services Functions and Procedures” section in the
Reference Guide.

Azure Cloud Storage file location protocol [page 124]

Use a file location object to download data from or upload data to your Azure Blob Storage account.

Number of threads for Azure blobs [page 129]

The number of threads is the number of parallel uploaders or downloaders to be run simultaneously
when you upload or download blobs.

Parent topic: Cloud storages [page 117]

Related Information

Amazon S3 [page 118]

Azure Data Lake Storage [page 130]
Google Cloud Storage file location [page 134]
Azure Data Lake Storage [page 130]

Data Services Supplement for Big Data

Cloud computing services PUBLIC 123
4.2.2.1 Azure Cloud Storage file location protocol

Use a file location object to download data from or upload data to your Azure Blob Storage account.

The following table lists the file location object descriptions for the Azure Cloud Storage protocol. To view
options common to all file location objects, see the Reference Guide.

 Note

To work with Azure Blob data in your Data Lake Storage account, see Azure Data Lake Storage [page 130].

Option Description

Name Enter a unique name for the file location object.

Protocol Choose Azure Cloud Storage for the type of file transfer pro
tocol.

Account Name Enter the name for the Azure Data Lake Storage account.

Storage Type Choose the storage type to access.

Data Services supports only one type for Azure Blob Stor
age: Container.

Authorization Type Choose the type of storage access signature (SAS) authori
zation for Azure storage.

• Shared Key: Account level SAS for Azure Storage Serv

ices. Accesses resources in one or more storage serv
ices.
• File (Blob) Shared Access Signature: Service level SAS
for Azure blob storage services. Accesses a specific file
(blob).

 Note

File (Blob) Shared Access Signature isn’t applicable

for Snowflake bulk loading. Snowflake bulk loading
requires file names that are generated randomly at
runtime. Therefore, select Shared Key or Container
Shared Access Signature for Snowflake with bulk
loading

• Container Shared Access Signature: Service level SAS

for Azure container storage services. Accesses files
(blobs) in a container.

Data Services Supplement for Big Data

124 PUBLIC Cloud computing services
Option Description

Shared Access Signature URL Enter the access URL that enables access to a specific file
(blob) or blobs in a container. Azure recommends that you
use HTTPS instead of HTTP.

 Note

Applicable when you select File (Blob) Shared Access

Signature or Container Shared Access Signature for
Authorization Type.

To access blobs in a container,

include the following elements in
the URL: https://<storage_account_name>/
<container_name>/<signature value>

To access a specific file (blob),

include the following elements in
the URL: https://<storage_account_name>/
<container_name>/<file_name>/<signature
value>

Account Shared Key Specify the Account Shared Key. Obtain a copy from the
Azure portal in the storage account information.

 Note

For security, the software doesn’t export the account

shared key when you export a data flow or file location
object that specifies Azure Cloud Storage as the proto
col.

Connection Retry Count Specify the number of times the computer tries to create a
connection with the remote server after a connection fails.

The default value is 10. The value can’t be zero.

After the specified number of retries, Data Services issues

an error message and stops the job.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 125
Option Description

Request Timeout (ms) Specifies the time to wait before the system times out while
trying to complete the request to the specified Azure Cloud
Storage in milliseconds (ms).

The default setting is 20,000 ms, which equals 20 seconds.

The system logs the timeout value in the trace file when you
configure the job to log trace messages:

1. Enable the Transform property in the batch job trace

properties.

2. Set the Rest_Debug_Level option to 1 in the

AL_Engine section of the DSConfig.txt file.
For more information about job tracing, see the following
topics:

• Batch job trace properties in the Designer Guide.

• Executing batch jobs in the Management Console Guide.

Batch size for uploading data, MB Specify the maximum size of a data block per request when
transferring data files. The limit is 100 MB.

 Caution

Accept the default setting unless you’re an experienced

user with an understanding of your network capacities
in relation to bandwidth, network traffic, and network
speed.

Batch size for downloading data, MB Specify the maximum size of a data range to be downloaded
per request when transferring data files. The limit is 4 MB.

 Caution

Accept the default setting unless you’re an experienced

user with an understanding of your network capacities
in relation to bandwidth, network traffic, and network
speed.

Data Services Supplement for Big Data

126 PUBLIC Cloud computing services
Option Description

Number of threads Specify the number of upload threads and download threads
for transferring data to Azure Cloud Storage. The default
value is 1.

When you set this parameter correctly, it could decrease the

download and upload time for blobs. For more information,
see Number of threads for Azure blobs [page 129].

Remote Path Prefix Optional. Specify the file path for the remote server, exclud
ing the server name. You must have permission to this direc
tory.

If you leave this option blank, the software assumes that the
remote path prefix is the user home directory used for FTP.

When an associated file format is used as a reader in a

data flow, the software accesses the remote directory and
transfers a copy of the data file to the local directory for
processing.

When an associated file format is used as a loader in a data

flow, the software accesses the local directory location and
transfers a copy of the processed file to the remote direc
tory.

Container type storage is a flat file storage system and it

doesn’t support subfolders. However, Microsoft allows for
ward slashes with names to form the remote path prefix, and
a virtual folder in the container where you upload the files.

 Example

You currently have a container for finance database files.

You want to create a virtual folder for each year. For
2016, you set the remote path prefix to: 2016/. When
you use this file location, all of the files upload into the
virtual folder “2016”.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 127
Option Description

Local Directory Specify the path of your local server directory for the file
upload or download.

Requirements for local server:

• must exist
• located where the Job Server resides
• you have appropriate permissions for this directory

For Azure Blob Storage location only: When an associated

file format is used as a reader in a data flow, the software
accesses the remote directory and transfers a copy of the
data file to the local directory for processing. Not applicable
for FTP.

When an associated file format is used as a loader in a data

flow, the software accesses the local directory location and
transfers a copy of the processed file to the remote direc
tory.

Container Specify the Azure container name for uploading or down

loading blobs to your local directory.

If you specified the connection information, including ac

count name, shared key, or proxy information, select the
Container field. The software requests a list of existing con
tainers from the server for the specific account. Either select
an existing container or specify a new one. When you specify
a new one, the software creates it when you run a job using
this file location object.

Proxy Host, Port, User Name, Password Optional. Specify the proxy information when you use a
proxy server.

Parent topic: Azure Blob Storage [page 123]

Related Information

Number of threads for Azure blobs [page 129]

Data Services Supplement for Big Data

128 PUBLIC Cloud computing services
4.2.2.2 Number of threads for Azure blobs

The number of threads is the number of parallel uploaders or downloaders to be run simultaneously when you
upload or download blobs.

The Number of threads setting affects the efficiency of downloading and uploading blobs to or from Azure
Cloud Storage.

Determine the number of threads

To determine the number of threads to set for the Azure file location object, base the number of threads on the
number of logical cores in the processor that you use.

Example thread settings

Processor logical cores Set Number of threads

8 8

16 16

The software automatically re-adjusts the number of threads based on the blob size you are uploading or
downloading. For example, when you upload or download a small file, the software may adjust to use fewer
numbers of threads and use the block or range size you specified in the Batch size for uploading data, MB or
Batch size for downloading data, MB options.

Upload Blob to an Azure container

When you upload a large file to an Azure container, the software may divide the file into the same number
of lists of blocks as the setting you have for Number of threads in the file location object. For example, when
the Number of threads is set to 16 for a large file upload, the software divides the file into 16 lists of blocks.
Additionally, each thread reads the blocks simultaneously from the local file and also uploads the blocks
simultaneously to the Azure container.

When all the blocks are successfully uploaded, the software sends a list of commit blocks to the Azure Blob
Service to commit the new blob.

If there is an upload failure, the software issues an error message. If they already existed before the upload
failure, the blobs in the Azure container stay intact.

When you set the number of threads correctly, you may see a decrease in upload time for large files.

Download Blob from an Azure container

When you download a large file from the Azure container to your local storage, the software may divide the
file into the Number of threads setting in the file location object. For example, when the Number of threads is

Data Services Supplement for Big Data

Cloud computing services PUBLIC 129
set to 16 for a large file download to your local container, the software divides the blobs into 16 lists of ranges.
Additionally, each thread downloads the ranges simultaneously from the Azure container and also writes the
ranges simultaneously to your local storage.

When your software downloads a blob from an Azure container, it creates a temporary file to hold all of the
threads. When all of the ranges are successfully downloaded, the software deletes the existing file from your
local storage if it existed, and renames the temporary file using the name of the file that was deleted from local
storage.

If there is a download failure, the software issues an error message. The existing data in local storage stays
intact if it existed before the download failure.

When you set the number of threads correctly, you may see a decrease in download time.

Parent topic: Azure Blob Storage [page 123]

Related Information

Azure Cloud Storage file location protocol [page 124]

4.2.3 Azure Data Lake Storage

Use an Azure Data Lake Storage file location object to download data from and upload data to your Azure Data
Lake Storage.

For information about using variables and parameters to increase the flexibility of jobs, work flows, and data
flows, see the Designer Guide.

Data Services provides a built-in function that loads data from your Azure Data Lake Storage to Azure Synapse
Analytics. The function is load_from_azure_datalakestore_to_synapse. For details about the built-in
function, see the Reference Guide.

Azure Data Lake Storage Gen1

Use Azure Data Lake Storage Gen1 for big data analytic processing. To create a file location object for Gen1,
make sure that you select Gen1 for the Version option.

Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 has all of the capabilities of Gen1 plus it’s built on Azure Blob storage. To create
a file location object for Gen2, make sure that you select Gen2 for the Version option.

Data Services Supplement for Big Data

130 PUBLIC Cloud computing services
Azure Data Lake Storage file location object options

The following table describes the file location options for the Azure Data Lake Storage protocol. The table
combines options for Azure Data Lake Storage Gen1 and Gen2. The table contains a Version column that
indicates the applicable version for the option.

 Note

For descriptions of the common file location options, see the Reference Guide.

Option Description Version

Name Enter a name for the file location. Both

Protocol Select Azure Data Lake Store. Both

Optionally, choose a substitution parameter.

Version Select Gen1 or Gen2. Both

Authorization Type Select Shared Key or Service Principal. Gen2

Account Shared Key When Authorization Type is set to Shared Key, enter the Gen2
account shared key that you obtain from your adminis
trator.

Communication Protocol/Endpoint HTTPS or the endpoint URL. Gen2

URL

Request Timeout (ms) Specifies the time to wait before the system times out Gen2
while trying to complete the request to the specified
Azure Data Lake Storage in milliseconds (ms).

The default setting is 20,000 ms, which equals 20

seconds. If you set Request Timeout (ms) to 0, the
system uses the HTTPRequestTimeout option set
ting (in seconds) in the AL_Engine section of the
DSConfig.txt file. The HTTPRequestTimeout
option has a default value of 2000 seconds.

The system logs the timeout value in the trace file when
you configure the job to log trace messages:

1. Enable the Transform property in the batch job

trace properties.

2. Set the Rest_Debug_Level option to 1 in the

AL_Engine section of the DSConfig.txt file.
For more information about job tracing, see the follow
ing topics:

• Batch job trace properties in the Designer Guide.

• Executing batch jobs in the Management Console
Guide.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 131
Option Description Version

Data Lake Store name Name of the Azure Data Lake Storage to access. Both

Optionally use a substitution parameter.

Service Principal ID Obtain from your Azure Data Lake Storage administra Gen1

tor.

Tenant ID Obtain from your Azure Data Lake Storage administra Gen1

tor.

Password Obtain from your Azure Data Lake Storage administra Gen1

tor.

Batch size for uploading data (MB) Maximum size of a data block to upload per request Both

when transferring data files.

• Gen1: Default value is 4 MB. Maximum value is 4

MB.
• Gen2: Default value is 10 MB. Maximum value is
100 MB.

 Caution

Keep the default setting unless you’re an experi

enced user with an understanding of your network
capacities in relation to bandwidth, network traffic,
and network speed.

Batch size for downloading data Maximum size of a data range to download per request Both
(MB)
when transferring data files.

• Gen1: Default value is 4 MB. Maximum value is 4

MB.
• Gen2: Default value is 10 MB. Maximum value is
100 MB.

 Caution

Keep the default setting unless you’re an experi

enced user with an understanding of your network
capacities in relation to bandwidth, network traffic,
and network speed.

Data Services Supplement for Big Data

132 PUBLIC Cloud computing services
Option Description Version

Number of threads Number of parallel uploaders or downloaders to run Both

simultaneously. The default value is 1.

 Tip

Setting the Number of threads to more than 1 can

reduce the time to upload Azure blob storage to or
download azure blob storage from Azure Data Lake
Storage.

Remote path prefix Directory path for your files in the Azure Data Lake Both

Storage. Obtain the directory path from Azure Data

Lake Storage Properties.

 Example

If the following is the directory path for your Azure

Data Lake Storage Properties:

adl://
<yourdatastoreName>.azuredatalakes
tore.net/<FolderName>/
<subFolderName>

Then the Remote path prefix value is:

Permission to access the directory is required.

Optionally use a substitution parameter.

Local directory Path to the local directory for your local Data Lake Stor Both

age data.

Permission to access this directory required.

Optionally use a substitution parameter.

Container Container name. Must be lower case and have a length Gen2
of more than 3 characters.

Parent topic: Cloud storages [page 117]

Related Information

Amazon S3 [page 118]

Azure Blob Storage [page 123]

Data Services Supplement for Big Data

Cloud computing services PUBLIC 133
Google Cloud Storage file location [page 134]

4.2.4 Google Cloud Storage file location

A Google Cloud Storage (GCS) file location contains file transfer protocol information for moving large data
files, 10 MB and larger, between GCS and SAP Data Services.

To work with your data from the Google Cloud Platform, create a file location object that contains your account
connection information and Google file transfer protocol. Use the file location in the following ways:

• As a source, select the GCS file location name in the Cloud Storage File Location object option in a Google
BigQuery datastore configuration.
• As a target, enter the location and file name for the GCS file location object in the Location or File location
option in a target editor.

You can also use the information from the GCS file location in the built-in function load_from_gcs_to_gbq,
to load data from your GCS to a Google BigQuery table. The function includes the Google BigQuery datastore,
which names the GCS file location. For details about the function, see the Reference Guide.

Uploading to GCS can use a large amount of local disk space. When you upload data to GCS in flat files, XML
files, or JSON files, consider setting the number of rows in the target editor option Batch size (rows). The option
can reduce the amount of local disk space that Data Services uses to upload your data.

 Note

If you set Location or File location to a location and file name for a non-GCS file location object, Data
Services ignores the setting in Batch size (rows).

To learn about file location objects, and to understand how SAP Data Services uses the file transfer protocol in
the file location configuration, see the Reference Guide.

GCS file location option descriptions [page 135]

Complete the options in a Google Cloud Storage (GCS) file location to be able to extract data from and
load data to your GCS account.

Decrease local disk usage when uploading [page 141]

Decrease the local disk space used when SAP Data Services uploads a large file of generated data to
your Google Cloud Storage (GCS) account.

Parent topic: Cloud storages [page 117]

Related Information

Amazon S3 [page 118]

Azure Blob Storage [page 123]
Azure Data Lake Storage [page 130]

Data Services Supplement for Big Data

134 PUBLIC Cloud computing services
4.2.4.1 GCS file location option descriptions

Complete the options in a Google Cloud Storage (GCS) file location to be able to extract data from and load
data to your GCS account.

The GCS file location contains connection and access information for your GCS account. The following table
lists the file location object descriptions for the Google Cloud Storage file location. For descriptions of common
options, and for more information about file location objects, see the Reference Guide.

GCS file location option descriptions

Option Description

Name Specifies a unique name for the file location object.

Protocol Specifies the type of file transfer protocol.

Select Google Cloud Storage.

Project Specifies the Google project name to access.

Upload URL Specifies the URL for uploading data to GCS. Accept
the default, which is https://www.googleapis.com/
upload/storage/v1.
Download URL Specifies the URL for extracting data from GCS. Accept
the default, which is https://www.googleapis.com/
storage/v1.
Authentication Server URL Specifies the Google server URL plus the name of the Web
access service provider, which is OAuth 2.0.

Accept the default, which is https://

accounts.google.com/o/oauth2/token.

Authentication Access Scope Specifies the specific type of data access permission.

• Read-only: Access for reading data, including listing

buckets.
• Read-write: Access for reading and changing data. Not
applicable for metadata like access control lists (ACLs).
• Full-control: Access with full control over data, including
the ability to modify ACLs.
• Cloud-platform.read-only: Access for viewing your
data across Google Cloud Platform services. For
Google Cloud Storage, this option is the same as
devstorage.read-only.
• Cloud-platform: Access for viewing and managing
data across all Google Cloud Platform services. For
Google Cloud Storage, this option is the same as
devstorage.full-control. Cloud-platform: is
the default setting.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 135
Option Description

OAuth Mechanism Specifies the authentication mechanism.

• Service Authentication (default): Data Services holds

the credentials of a service account to complete au
thentication.
• User Authentication: Data Services obtains credentials
from the end user. If you don't have a refresh token, you
must sign into Google to complete the authentication.
You can also use an existing refresh token.
When you specify User Authentication, you must also
populate Client ID, Client Secret, and Refresh Token.

Service Account Email Address Specifies the email address from your Google project. This
email is the same as the service account email address that
you enter into the applicable Google BigQuery datastore.

 Note

Obtain the email address when you sign in to your Goo

gle project.

Service Account Private Key Specifies the P12 or JSON file you generated from your Goo
gle project and stored locally.

 Note

Log in to your Google project and select to generate the

private key file in either P12 or JSON format, and save
the file to your local computer.

Click Browse and open the location where you saved the file.
Select the .p12 or .JSON file and click Open.

Service Account Signature Algorithm Specifies the algorithm type that Data Services uses to sign
JSON Web tokens.

Accept the default, which is SHA256withRSA.

Data Services uses this value, along with your service ac
count private key, to obtain an access token from the Au
thentication Server.

Substitute Access Email Address Optional, for Google BigQuery. Enter the substitute email
address from your Google BigQuery datastore.

Client ID Specifies the OAuth client ID for Data Services. To get the
client ID, go to the Google API Console.

For more detailed instructions, see the Google documenta

tion.

Data Services Supplement for Big Data

136 PUBLIC Cloud computing services
Option Description

Client Secret Specifies the OAuth client secret for Data Services. To get
the client secret, go to the Google API Console.

For more detailed instructions, see the Google documenta

tion.

Refresh Token When OAuth mechanism is set to User Authentication, speci

fies the refresh token that is required for the Google Cloud
Storage file location connection.

To enter an existing refresh token, do the following:

1. Click the ellipsis (…) next to the Refresh Token field.

2. In the Get Refresh Token window, select Refresh Token.
3. Click OK. The refresh token appears (masked for pri
vacy and security reasons) in the Refresh Token field.

To generate a new refresh token, enter an authorization

code.

1. Click the ellipsis (…) next to the Refresh Token field.

2. In the Get Refresh Token window, select Authorization
Code.
3. Enter an authorization code.
To get an authorization code, follow the in
structions in the Google documentation (https://
developers.google.com/identity/protocols/oauth2/na
tive-app#step-2:-send-a-request-to-googles-oauth-2.0-
server ).
4. Click OK. A new refresh token is generated and ap
pears (masked for privacy and security reasons) in the
Refresh Token field.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 137
Option Description

Request Timeout (ms) Specifies the time to wait before the system times out while
trying to complete the request to the specified Google Cloud
Storage in milliseconds (ms).

The default setting is 20,000 ms, which equals 20 seconds.

The system logs the timeout value in the trace file when you
configure the job to log trace messages:

1. Enable the Transform property in the batch job trace

properties.

2. Set the Rest_Debug_Level option to 1 in the

AL_Engine section of the DSConfig.txt file.
For more information about job tracing, see the following
topics:

• Batch job trace properties in the Designer Guide.

• Executing batch jobs in the Management Console Guide.

Web Service URL Specifies the Data Services Web Services server URL that
the data flow uses to access the Web server.

Compression Type Specifies the type of compression to use.

• None: Doesn't use compression.

• gzip: Enables you to upload gzip files to GCS.

 Note

If you set the Batch size (rows) option to a non

zero positive integer, Data Services ignores the
Compression Type of gzip. Batch size (rows) is in
the target file format editor. Applicable only for flat
files, XML files, and JSON files.

Connection Retry Count Specifies the number of times Data Services tries to create
a connection with the remote server after the initial connec
tion attempt fails.

The default value is 10. The value can't be zero.

After the specified number of retries, Data Services issues

an error notification and stops the job.

Data Services Supplement for Big Data

138 PUBLIC Cloud computing services
Option Description

Batch size for uploading data, MB Specifies the maximum size for a block of data to upload per
request.

The limit is 5 TB.

For each request, Data Services uploads a block of data from

the local temporary file to your GCS account. The block of
data is in the specified size.

Batch size for downloading data, MB Specifies the maximum size for a block of data to download
per request.

The limit is 5 TB.

For each request, Data Services downloads a block of data in

the specified size from GCS to your local file.

Number of threads Specifies the number of threads to run in parallel when

transferring data to GCS.

Enter a number from 1 through 30. The default is 1.

Processing with multiple threads can increase job perform

ance.

If you enter any number outside of 1 through 30, Data Serv

ices automatically adjusts the number at runtime. For exam
ple, Data Services changes a setting of 50 to 30 at runtime.

Bucket Specifies the GCS bucket name, which is the name of the
basic container that holds your data in GCS.

Select a bucket name from the list.

Create a new bucket by entering a name that doesn't exist

in your Google account. When Google doesn't find a bucket
by that name, it creates the bucket when you upload data to
your GCS account.

When you create a new bucket, also select a region in the

Region (for new bucket only) option.

Region (for new bucket only) Select a region for the specified bucket. Applicable only
when you enter a new bucket name for Bucket. If you select
an existing bucket, Data Services ignores the region selec
tion.

Data Services Supplement for Big Data

Cloud computing services PUBLIC 139
Option Description

Remote Path Prefix Specifies the location of the Google Cloud Storage bucket.

To enter a folder structure, ensure that the path prefix ends

with a forward slash (/). For example, test_folder1/
folder2/. You must have permission to this directory.

If you leave this option blank, Data Services assumes that

your GCS bucket is in the home directory of your file transfer
protocol.

Source: If source is an associated file format, Data Services

accesses the remote directory and transfers a copy of the
data file to the local directory for processing.

Target: If target is an associated file format, Data Services

accesses the local directory and transfers a copy of the gen
erated file to the remote directory.

Local Directory Specifies the file path of the local server that you use for this
file location object.

The local server directory is located where the Job Server

resides. You must have permission to this directory.

 Note

If this option is blank, Data Services assumes the direc

tory %DS_COMMON_DIR%/workspace as the default
directory.

Source: If source is an associated file format, Data Services

accesses the remote directory and transfers a copy of the
data file to the local directory for processing.

Target: If target is an associated file format, Data Services

accesses the local directory and transfers a copy of the gen
erated file to the remote directory.

Proxy Host, Port, User Name, Password Optional. Species the proxy server information when you use
a proxy server.

Parent topic: Google Cloud Storage file location [page 134]

Related Information

Decrease local disk usage when uploading [page 141]

Data Services Supplement for Big Data

140 PUBLIC Cloud computing services
4.2.4.2 Decrease local disk usage when uploading

Decrease the local disk space used when SAP Data Services uploads a large file of generated data to your
Google Cloud Storage (GCS) account.

Data Services uses the file transfer protocol in a GCS file location to load large amounts of data to your GCS
account. Large size is 10 MB or more. During the load process, Data Services first loads all records to a local
temporary file before it uploads data to your GCS account. The first load process uses local disk space equal to
the size of the file.

Decrease local disk space usage by using the option Batch size (rows). The option limits the load to a set
number of rows. Data Services loads the set number of rows in batches to the local file, which decreases the
local disk space used.

Find the Batch size (rows) option in the target editor for flat files, XML files, and JSON files. The following figure
shows the setting in the target editor for a flat file:

When you set Batch size (rows) to 1 or more, Data Services uploads the batch rows in a single thread.
Therefore, Data Services ignores the setting in Parallel process threads. Additionally, Data Services does not
use compression, so it ignores a setting of gzip for Compression type in the GCS file location.

 Caution

If you set Location or File location to a location and file name for a non-GCS file location object, Data
Services ignores the setting in Batch size (rows).

Disable Batch size (rows) by leaving the default setting of zero (0).

Data Services Supplement for Big Data

Cloud computing services PUBLIC 141
 Note

The following example uses simple row and file sizes for illustration. Sizes may not realistically reflect actual
data.

 Example

Your generated file contains 10,000 rows and is about 10 MB. You set Batch size (rows) to 2000, which is 2
MB. Then you execute the job.

Data Services loads the first 2000 rows to the local temporary file, using 2 MB of local disk space. After all
2000 rows finish loading, it uploads the 2000 rows from the local file to your GCS account. When it finishes
loading the 2000 rows to GCS, Data Services deletes the rows in the local file, which frees up 2 MB of
local disk storage. Data Services performs this process five times for a total of 10,000 rows. The maximum
amount of local disk storage the process uses is 2 MB, compared to 10 MB without using the Batch size
(rows) option.

Row batch size considerations [page 142]

Consider your environment and your data before you set the Batch size (rows) option.

Parent topic: Google Cloud Storage file location [page 134]

Related Information

GCS file location option descriptions [page 135]

4.2.4.2.1 Row batch size considerations

Consider your environment and your data before you set the Batch size (rows) option.

Before you set the option Batch size (rows) in the target file format editor, consider your goals. Also keep in
mind the size of one row of data. When each row contains many fields, the record size may determine the
number of rows to set.

The following table describes scenarios for three specific goals.

Goal and action scenarios

Goal Action

You want the best performance regardless of the local disk Set Batch size (rows) to zero (0) to disable it, and set the
space the process uses.
performance optimization settings in the GCS file location.

You achieve the best performance when you use the settings
in the file location object. However, the upload uses local
disk space equal to the size of the entire file.

Data Services Supplement for Big Data

142 PUBLIC Cloud computing services
Goal Action

You want to conserve local disk space, but you’re also inter Use Batch size (rows).
ested in optimizing performance.
Run test jobs to determine the setting that best meets your
goal.

• A higher setting uses more local disk space, but still less
space than without using the option.
• A lower setting uses less local disk space than without
the option.

You don't want to use more than a specific amount of local Work backwards to determine the setting for Batch size
disk space. (rows), beginning with the maximum amount of disk space
to use for the upload transaction.

Examples

 Example

Best performance

You plan to execute the job that uploads data to your GCS account over night when there isn’t a large
demand for local disk space. Therefore, you leave the Batch size (rows) option set to the default of zero (0).
Then you configure the GCS file location and set the following options for optimized performance:

• Compression Type
• Batch size for uploading data, MB
• Number of threads

 Note

The settings in the GCS file location do not affect loading the data to the local temporary file, but they
affect loading from the temporary file to GCS.

 Example

Compare settings

The following table compares option settings for Batch size (rows) and how it affects local disk storage and
performance. For this example, generated data contains 10,000 rows and the total size is 10 MB.

Compare settings
Settings Example 1 Example 2

Batch size (rows) 5,000 1,000

Total disk space 5 MB 1 MB

Number of connections 2 10

Data Services Supplement for Big Data

Cloud computing services PUBLIC 143
Settings Example 1 Example 2

Performance (up ↑ or down ↓) ↑ ↓

 Example

Maximum batch size

You design the Data Services data flow to add additional columns to each row of data, making each row
longer and larger than before processing. You base the number of rows to set for Batch size (rows) using a
maximum disk space.

• Maximum disk space to use is 2 MB

• Convert MB to Bytes: 2 MB = 2,097,152 Bytes
• Generated row size is 1,398 bytes
• 2,097,152 ÷ 1,398 = 1,500
• Set Batch size (rows) to 1500

If the generated data contains 10,000 rows, Data Services connects to GCS a total of seven times, with the
last batch containing only 500 rows.

Parent topic: Decrease local disk usage when uploading [page 141]

Data Services Supplement for Big Data

144 PUBLIC Cloud computing services
Important Disclaimers and Legal Information

Hyperlinks
Some links are classified by an icon and/or a mouseover text. These links provide additional information.
About the icons:

• Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your
agreements with SAP) to this:

• The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.

• SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any
damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.

• Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering an SAP-hosted Web site. By using
such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this
information.

Videos Hosted on External Platforms

Some videos may point to third-party video hosting platforms. SAP cannot guarantee the future availability of videos stored on these platforms. Furthermore, any
advertisements or other content hosted on these platforms (for example, suggested videos or by navigating to other videos hosted on the same site), are not within
the control or responsibility of SAP.

Beta and Other Experimental Features

Experimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by
SAP at any time for any reason without notice. Experimental features are not for productive use. You may not demonstrate, test, examine, evaluate or otherwise use
the experimental features in a live operating environment or with data that has not been sufficiently backed up.
The purpose of experimental features is to get feedback early on, allowing customers and partners to influence the future product accordingly. By providing your
feedback (e.g. in the SAP Community), you accept that intellectual property rights of the contributions or derivative works shall remain the exclusive property of SAP.

Example Code
Any software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax
and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of
example code unless damages have been caused by SAP's gross negligence or willful misconduct.

Bias-Free Language
SAP supports a culture of diversity and inclusion. Whenever possible, we use unbiased language in our documentation to refer to people of all cultures, ethnicities,
genders, and abilities.

Data Services Supplement for Big Data

Important Disclaimers and Legal Information PUBLIC 145
www.sap.com/contactsap

No part of this publication may be reproduced or transmitted in any form

or for any purpose without the express permission of SAP SE or an SAP
affiliate company. The information contained herein may be changed
without prior notice.

Some software products marketed by SAP SE and its distributors

contain proprietary software components of other software vendors.
National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for

informational purposes only, without representation or warranty of any
kind, and SAP or its affiliated companies shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP or
SAP affiliate company products and services are those that are set forth
in the express warranty statements accompanying such products and
services, if any. Nothing herein should be construed as constituting an
additional warranty.

SAP and other SAP products and services mentioned herein as well as
their respective logos are trademarks or registered trademarks of SAP
SE (or an SAP affiliate company) in Germany and other countries. All
other product and service names mentioned are the trademarks of their
respective companies.

Please see https://www.sap.com/about/legal/trademark.html for

additional trademark information and notices.

THE BEST RUN

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
RTD Feast Dev en Master
No ratings yet
RTD Feast Dev en Master
162 pages
Simply Rethink DB
No ratings yet
Simply Rethink DB
193 pages
Integrating Data and Managing Spaces in SAP
No ratings yet
Integrating Data and Managing Spaces in SAP
330 pages
SAP CPI-DS - Doc
No ratings yet
SAP CPI-DS - Doc
444 pages
Hci10 Help Sap en
No ratings yet
Hci10 Help Sap en
408 pages
Gcloud 1
No ratings yet
Gcloud 1
388 pages
Hci10 Help Sap En
No ratings yet
Hci10 Help Sap En
418 pages
SAP HANA Developer Guide For SAP HANA Web Workbench
No ratings yet
SAP HANA Developer Guide For SAP HANA Web Workbench
552 pages
SAP HANA Developer Guide For SAP HANA Web Workbench en
No ratings yet
SAP HANA Developer Guide For SAP HANA Web Workbench en
540 pages
Coldfusion 10 Admin PDF
No ratings yet
Coldfusion 10 Admin PDF
162 pages
Rapidminer Studio Operator Reference 9
No ratings yet
Rapidminer Studio Operator Reference 9
1,204 pages
DWC Administering DWC
No ratings yet
DWC Administering DWC
250 pages
Enterprise Data Catalog Resource Configuration Reference
No ratings yet
Enterprise Data Catalog Resource Configuration Reference
29 pages
What's New: Informatica Cloud Data Integration April 2022
No ratings yet
What's New: Informatica Cloud Data Integration April 2022
29 pages
Sap Hana Eim Adapter SDK en
No ratings yet
Sap Hana Eim Adapter SDK en
72 pages
Database Link User Guide
No ratings yet
Database Link User Guide
122 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
124 pages
Trafodion JDBCT4 Reference Guide
No ratings yet
Trafodion JDBCT4 Reference Guide
128 pages
Gcloud Python
No ratings yet
Gcloud Python
398 pages
Certification Study Guide Series IBM Tivoli Application Dependency Discovery Manager V7.1 Sg247764
No ratings yet
Certification Study Guide Series IBM Tivoli Application Dependency Discovery Manager V7.1 Sg247764
224 pages
Data Engineering
No ratings yet
Data Engineering
92 pages
Data Engineering
No ratings yet
Data Engineering
91 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
102 pages
Data Engineering Cookbook
100% (1)
Data Engineering Cookbook
125 pages
Docs Citusdata Com en v9.4
No ratings yet
Docs Citusdata Com en v9.4
342 pages
Docs Citusdata Com en v9.2
No ratings yet
Docs Citusdata Com en v9.2
336 pages
IBM Cloud Pak For Data v4.5 Redbook
No ratings yet
IBM Cloud Pak For Data v4.5 Redbook
690 pages
Duckdb Docs
No ratings yet
Duckdb Docs
721 pages
Data Engineering Cookbook
No ratings yet
Data Engineering Cookbook
125 pages
MongoDB Administrator Training
100% (1)
MongoDB Administrator Training
216 pages
User Guide DRBD 9
100% (1)
User Guide DRBD 9
213 pages
SAP Predictive Service User Guide
No ratings yet
SAP Predictive Service User Guide
174 pages
Teste
No ratings yet
Teste
558 pages
Discovering and Classifying Network Assets 2
No ratings yet
Discovering and Classifying Network Assets 2
67 pages
DOC
No ratings yet
DOC
114 pages
Data Engineering Cookbook
100% (2)
Data Engineering Cookbook
127 pages
PostgreSQL Database Handbook PDF
100% (1)
PostgreSQL Database Handbook PDF
57 pages
Administering SAP Datasphere
100% (1)
Administering SAP Datasphere
280 pages
The Data Engineering Cookbook Mastering The Plumbing Of Data Science 3rd Edition Andreas Kretz - Download the ebook now for instant access to all chapters
100% (1)
The Data Engineering Cookbook Mastering The Plumbing Of Data Science 3rd Edition Andreas Kretz - Download the ebook now for instant access to all chapters
43 pages
Instant Access to The Data Engineering Cookbook Mastering The Plumbing Of Data Science 3rd Edition Andreas Kretz ebook Full Chapters
100% (1)
Instant Access to The Data Engineering Cookbook Mastering The Plumbing Of Data Science 3rd Edition Andreas Kretz ebook Full Chapters
47 pages
datanucleus-accessplatform-docs-4.2
No ratings yet
datanucleus-accessplatform-docs-4.2
1,188 pages
Admin en
No ratings yet
Admin en
262 pages
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
No ratings yet
The Data Engineering Cookbook: Andreas Kretz December 2, 2018 v0.1
40 pages
JDBC PDF
No ratings yet
JDBC PDF
244 pages
SAP HANA Modeling Guide For SAP HANA Studio en
No ratings yet
SAP HANA Modeling Guide For SAP HANA Studio en
264 pages
SAP HANA Modeling Guide For SAP HANA Studio en PDF
No ratings yet
SAP HANA Modeling Guide For SAP HANA Studio en PDF
264 pages
MongoDB Manual Master
No ratings yet
MongoDB Manual Master
1,117 pages
Installing and Deploying IBM Connections
100% (1)
Installing and Deploying IBM Connections
304 pages
SAP HANA EIM Installation and Configuration Guide en
No ratings yet
SAP HANA EIM Installation and Configuration Guide en
598 pages
WP CrunchyData Diamanti White Paper
No ratings yet
WP CrunchyData Diamanti White Paper
24 pages
Duckdb-Docs-0 9 2
No ratings yet
Duckdb-Docs-0 9 2
897 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
A To Z of Internet: Everything You Wanted to Know
From Everand
A To Z of Internet: Everything You Wanted to Know
Bittu Kumar
No ratings yet
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Mastering Python Advanced Concepts and Practical Applications
From Everand
Mastering Python Advanced Concepts and Practical Applications
Aissa Younes
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Spark Concept
No ratings yet
Spark Concept
18 pages
Master of Applied Cyber Security
No ratings yet
Master of Applied Cyber Security
2 pages
Interleaving For Burst Error Correction
No ratings yet
Interleaving For Burst Error Correction
8 pages
7600M1063 Quick Start 1020 FMD
No ratings yet
7600M1063 Quick Start 1020 FMD
24 pages
nx64 g703
100% (2)
nx64 g703
16 pages
Econ4 1 6 1 Global Guide r2 - 1
100% (1)
Econ4 1 6 1 Global Guide r2 - 1
75 pages
TutorialsNinja Web Application - Test Cases (Register Account)
No ratings yet
TutorialsNinja Web Application - Test Cases (Register Account)
27 pages
Ds Syllabus
No ratings yet
Ds Syllabus
6 pages
6.3.1.8 Packet Tracer - Exploring Internetworking Devices
100% (8)
6.3.1.8 Packet Tracer - Exploring Internetworking Devices
6 pages
USACE Civil 3D Template Presentation 2018
No ratings yet
USACE Civil 3D Template Presentation 2018
29 pages
Cambridge International AS & A Level: Information Technology 9626/31
No ratings yet
Cambridge International AS & A Level: Information Technology 9626/31
12 pages
Polyscope Manual: August 1, 2013
No ratings yet
Polyscope Manual: August 1, 2013
61 pages
Cisco 1 BL
No ratings yet
Cisco 1 BL
11 pages
Data ENtry
100% (1)
Data ENtry
10 pages
"Web Designing": A Seminar Report On
No ratings yet
"Web Designing": A Seminar Report On
30 pages
Download Complete Beginning Spring Boot 3: Build Dynamic Cloud-Native Java Applications and Microservices - Second Edition K. Siva Prasad Reddy PDF for All Chapters
No ratings yet
Download Complete Beginning Spring Boot 3: Build Dynamic Cloud-Native Java Applications and Microservices - Second Edition K. Siva Prasad Reddy PDF for All Chapters
51 pages
Crash 2024 10 11 - 21.54.17 Client
No ratings yet
Crash 2024 10 11 - 21.54.17 Client
16 pages
3065034_E
No ratings yet
3065034_E
3 pages
Alex Karasulu - Workshop Embed ApacheDS
No ratings yet
Alex Karasulu - Workshop Embed ApacheDS
49 pages
SAP IdM
No ratings yet
SAP IdM
34 pages
Loadrunner Interview Questions and Answers - Performance Testing Interview Questions
No ratings yet
Loadrunner Interview Questions and Answers - Performance Testing Interview Questions
20 pages
CLL F399 Technical Reference Manual
No ratings yet
CLL F399 Technical Reference Manual
33 pages
Computer Science-3
No ratings yet
Computer Science-3
4 pages
Crack The Code CVC Words
No ratings yet
Crack The Code CVC Words
7 pages
Cloud Computing - Prelim Exam 22-23
No ratings yet
Cloud Computing - Prelim Exam 22-23
1 page
Biometric Systems Seminar Report
No ratings yet
Biometric Systems Seminar Report
48 pages
Oops 4
No ratings yet
Oops 4
76 pages
Baja Saeindia Dfmea Template
No ratings yet
Baja Saeindia Dfmea Template
1 page
Principles of SAP HANA Sizing - On Premise and Cloud-1
No ratings yet
Principles of SAP HANA Sizing - On Premise and Cloud-1
47 pages
CX - Airtel and Amdocs Case Study - FINAL
No ratings yet
CX - Airtel and Amdocs Case Study - FINAL
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sap Hana

Uploaded by

Sap Hana

Uploaded by

PUBLIC

Document Version: 4.3 Support Package 2 (14.3.2.0) – 2023-04-27

Data Services Supplement for Big Data

THE BEST RUN

1 About this supplement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Naming conventions and variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Big data in SAP Data Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Data Services Supplement for Big Data

4 Cloud computing services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Data Services Supplement for Big Data

The supplement contains information about the following:

• Supported big data products

Data Services Supplement for Big Data

SAP Data Services documentation uses the following terminology:

Data Services Supplement for Big Data

INSTALL_DIR The installation directory for SAP applications such as Data

• For Windows: C:\Program Files (x86)\SAP

INSTALL_DIR isn't an environment variable. The in­

BIP_INSTALL_DIR The directory for the BI or IPS platform.

• For Windows: <INSTALL_DIR>\SAP

• For UNIX: <INSTALL_DIR>/enterprise_xi40

These paths are the same for both BI and IPS.

BIP_INSTALL_DIR isn't an environment variable.

Data Services Supplement for Big Data

<LINK_DIR> An environment variable for the root directory of the Data

C:\Program Files (x86)\SAP

Data Services Supplement for Big Data

<DS_COMMON_DIR> An environment variable for the common configuration di­

• If your system is on Windows (Vista and newer):

The default value of <AllUsersProfile> environ­

• If your system is on Windows (Older versions such as

The default value of <AllUsersProfile> en­

C:\Documents and Settings\All

• UNIX systems (for compatibility)

The installer automatically creates this system environment

Starting with Data Services 4.2 SP6, users

Data Services Supplement for Big Data

<DS_COMMON_DIR> during installation. If you can't find

<DS_USER_DIR> The environment variable for the user-specific configuration

• If you're on Windows (Vista and newer):

The default value of <UserProfile> environment

• If you're on Windows (Older versions such as XP):

The default value of <UserProfile> en­

The system uses <DS_USER_DIR> only for Data

The installer automatically creates this system environment

Data Services Supplement for Big Data

Apache Cassandra [page 10]

Apache Hadoop [page 13]

HP Vertica [page 16]

MongoDB [page 32]

Apache Impala [page 44]

PostgreSQL [page 48]

SAP HANA [page 58]

Amazon Athena [page 80]

3.1 Apache Cassandra

Use Cassandra data for the following tasks:

• Use as sources, targets, or template tables

• Add the appropriate environment variables to the al_env.sh file.

Data Services Supplement for Big Data

Setting ODBC driver configuration on Linux [page 11]

Data source properties for Cassandra [page 12]

3.1.1 Setting ODBC driver configuration on Linux

Do not point to the Data Services ODBC .ini file.

<LINK_DIR> is the Data Services installation directory.

Data Services Supplement for Big Data

7. To test the connection, click Test Connection.

Task overview: Apache Cassandra [page 10]

Data source properties for Cassandra [page 12]

3.1.2 Data source properties for Cassandra

Data Services Supplement for Big Data

Apache Cassandra • User Name

• Cassandra SSL Server Certificate File

Parent topic: Apache Cassandra [page 10]

Setting ODBC driver configuration on Linux [page 11]

3.2 Apache Hadoop

Hadoop Distributed File System (HDFS) [page 14]

INSTALL_DIR isn't an environment variable. The in

<DS_COMMON_DIR> An environment variable for the common configuration di

The default value of <AllUsersProfile> environ

The default value of <AllUsersProfile> en

The default value of <UserProfile> en

-d Precedes the HP Vertica database name with which to con