Netezza Data Loading Guide PDF
Netezza Data Loading Guide PDF
Netezza Data Loading Guide PDF
20525 Rev. 3
Note: Before using this information and the product that it supports, read the information in “Notices and Trademarks” on
page D-1.
Preface
1 Overview
Data Loading Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Data Loading Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
New Decimal Delimiter Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
2 External Tables
About External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Privileges Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Displaying External Table Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Log Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Backing Up and Restoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Explicit Schema Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Implicit Schema Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Exporting Data Using Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Remote Transient External Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Supported Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Integer Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Fixed-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Floating-Point Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Character Strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Time Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Transient External Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Fixed-Length Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
Standard Unloading and Reloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
iii
Back up and Restore a User Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
iv
Option Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
Counting Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
Handling Bad Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Delineating Input Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Matching Input Fields to Table Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Using String and Non-string Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Handling the Absence of a Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Enabling Load Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Handling Legal Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
4 Using nzload
How the nzload Command Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Protection and Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Concurrency and Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Program Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Using the nzload Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Additional Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Using a Control File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
Configuration File Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
5 Unloading Data
Unloading Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
Unloading Data to a Remote Client System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
v
Building the Fixed-Length Format Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
End-of-Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Record Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Skipping Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Temporal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Numeric Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
Logical Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
Null Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
Appendix B: Troubleshooting
Tips for Successful Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Create Your Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Determine Your Data Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Consider the Load Source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
Run the Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
Troubleshoot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Handle Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Validate the Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Generate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
nzload Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
Reporting Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
Understanding nzload Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
vi
Notices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
Trademarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3
Electronic Emission Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4
Regulatory and Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-7
Index
Rev. -vii
-viii Rev.
Preface
The IBM Netezza Data Loading Guide describes the functionality for data loading.
External Table options to use, and how the sys- Chapter 3, “External Table Options”
tem processes them
How to enter external table options on the com- Appendix C, “Option Names”
mand line, in a control file, or in a SQL command
ix
If You Need Help
If you are having trouble using the IBM Netezza appliance, you should:
1. Retry the action, carefully following the instructions given for that task in the
documentation.
2. Go to the IBM Support Portal at: http://www.ibm.com/support. Log in using your IBM
ID and password. You can search the Support Portal for solutions. To submit a support
request, click the Service Requests & PMRs tab.
3. If you have an active service contract maintenance agreement with IBM, you can con-
tact customer support teams via telephone. For individual countries, visit the Technical
Support section of the IBM Directory of worldwide contacts (http://www14.soft-
ware.ibm.com/webapp/set2/sas/f/handbook/contacts.html#phone).
x
CHAPTER 1
Overview
What’s in this chapter
Data Loading Components
Data Loading Formats
New Decimal Delimiter Option
This chapter provides general information about the data loading methods now available.
Note that loading data takes a significant allocation of system resources, which may affect
performance.
1-1
Netezza Data Loading Guide
This chapter describes external tables, as well as best practices and restrictions for using
them. For options for using external table, see Chapter 3, “External Table Options.” For
examples of how to use external tables, see Appendix A, “Examples and Grammar.”
In the Netezza environment, there are the following types of tables:
System tables – Stored on the host
User tables – Stored on the SPUs
External tables – Stored as flat files on the host or client systems
2-1
Netezza Data Loading Guide
Privileges Required
To create an external table, you must have LIST privilege on the database and CREATE
EXTERNAL TABLE administration privilege. The database user who issues the CREATE
EXTERNAL TABLE command owns the resultant table. The operating system user must
have proper permission on the data object (READ permission for loading, WRITE permis-
sion for unloading).
Log Files
By default, loading errors go into the following log files:
nzbad – <tablename>.<dbname>.nzbad
nzlog – <tablename>.<dbname>.nzlog
You can override the default by specifying a file for errors to go by using the following with
a filename:
bf <filename> for nzbad
lf <filename> for nzlog
Usage
Use external tables to do the following:
Load data into the Netezza appliance from an external table and structure the loading
operation to manipulate the data by using casts, joins, dropping columns, and so on.
Store data outside the Netezza appliance, either to transfer to another application, or
as a table backup. See “Backing Up and Restoring” on page 2-4.
Create an external table and use data from an external table as part of a SQL query.
The power of external tables is that the entire Extraction-Transformation-Loading (ETL) pro-
cess is mapped to plain SQL. Since a SQL-based ETL process can be initiated/executed
from any SQL client that can talk to the Netezza appliance, it reduces or avoids the require-
ment of specialized ETL tools.
To load an external data file into the Netezza appliance as an external table, you can do
either of the following:
Use a FROM clause of a SELECT SQL statement/command, like any normal table.
Use a WHERE clause of an UPDATE or DELETE SQL statement.
To unload an external table into an external data file, use the table as the target table in
any of the following SQL statements:
INSERT SQL
SELECT INTO SQL
CREATE TABLE AS SELECT SQL
All references to columns in the external table can be complex SQL expressions used for
the transformation of external data during a load/unload process. For more information, see
“Restrictions” on page 2-13.
Parsing
For loads, the sequence of rows are parsed one-by-one from the external data file, and con-
verted into internal records of the external table. There could be errors during the parsing of
each row, or each column. For example, there could be errors in identifying the column
value itself, as in the case of a missing delimiter. Or there could be errors during the con-
version from external format to internal records of the external table, such as alphabets
mentioned for an integer column in Text-Delimited format.
Each error is logged in detail in an nzlog file, and bad rows are logged in an nzbad file.
These files help user to identify bad rows in the external data file and correct them for
reloading. Depending on the load options of the external table in use, each bad row would
either cause the row to be skipped, or the entire load to be aborted. Similarly, each bad col-
umn of a bad row could cause the rest of the row to be ignored, or if possible to recover, the
load could continue to parse subsequent columns of the same row.
Note that if there is an error in the project-expression on the external table columns, then
the entire load is aborted and the transaction rolled back. Errors of this nature are not
logged in nzbad or nzlog files, as they are outside of the scope of the external table load
mechanism. Once the processing reaches the normal SQL engine, the external table is
treated as if it is a normal table.
Unlike an external table that has external rows in an ordered sequence, normal user tables
have no implicit row order other than hidden rowid columns. So there is no way for a user
not using rowids to identify the bad row in a SQL engine. In this case, the Netezza system
just returns an error that a particular column caused an error, without identifying the bad
row. It is as if the query was selecting from a normal table and inserting into another nor-
mal table, with some row that caused the error during insertion.
Command Syntax
The CREATE EXTERNAL TABLE command has the following syntax.
To create an external table based on another table:
CREATE EXTERNAL TABLE table_name
SAMEAS table_name
USING external_table_options
To create an external table by defining columns:
CREATE EXTERNAL TABLE table_name
({ column_name type
[ column_constraint [ ... ] ]} [, ... ]
)
[USING external_table_options]
Note: The system permits and maintains PRIMARY KEY, DEFAULT, UNIQUE, and REFER-
ENCES. UNIQUE, PRIMARY KEY and REFERENCES are ignored for external tables. The
system does not support constraint checks and referential integrity. The user must ensure
constraint checks and referential integrity. For more information, see “Column Constraint
Rules for Empty Strings” on page 2-10.
Syntax
The following is the syntax for a TET:
INSERT INTO <table> SELECT <column_list | *>
FROM EXTERNAL 'filename' [(schema_definition)]
[USING (external_table_options)];
Remote external table loads work by sending the contents of a file from the client system to
the Netezza server where the data is then parsed. This method minimizes CPU usage on
the client system during a remote external table load.
time with time zone 01:15:33 -05 See “Time with time zone” on
page 2-12.
Syntax [‘+’|’-’]<digit>…
Syntax [‘+’|’-’]<digit>…[‘.’[<digit>…]]
[‘+’|’-’]’.’<digit>…
[‘+’|’-’]<digit>…[‘,’[<digit>…]]
[‘+’|’-’]’,’<digit>…
The syntax of fixed-point values is the same as the syntax of integer values with the addi-
tion of an optional decimal digit that can occur anywhere — from before the first decimal
digit to after the last decimal digit.
The optional decimal point can be followed by zero or more decimal digits, if there is at
least one decimal digit before the decimal point; followed by one or more decimal digits if
there are no decimal digits before the decimal point.
If there is no explicit decimal point, the system assumes a decimal point immediately fol-
lowing the last decimal digit.
You can also specify a comma as a separator, using it like the decimal digit. For examples
of how to do this, see “Decimal Delimiter Examples” on page A-4.
Table 2-5 describes the fixed-point precision and representation:
Syntax [ '+' | '-' ] <digit>… [ '.' [ <digit>… ] ] [( 'e' | 'E' ) [ '+' | '-' ] <digit>… ]
[ '+' | '-' ] '.' <digit>… [ ( 'e' | 'E' ) [ '+' | '-' ] <digit>… ]
[ '+' | '-' ] <digit>… [ ',' [ <digit>… ] ] [( 'e' | 'E' ) [ '+' | '-' ] <digit>… ]
[ '+' | '-' ] ',' <digit>… [ ( 'e' | 'E' ) [ '+' | '-' ] <digit>… ]
The syntax of fixed-point values is the same as the syntax of fixed-point values augmented
by an optional trailing exponent specification.
The optional decimal point can be followed by zero or more decimal digits, if there is at
least one decimal digit before the decimal point; followed by one or more decimal digits if
there are no decimal digits before the decimal point.
If there is no explicit decimal point, the system assumes a decimal point immediately fol-
lowing the last decimal digit.
You can also specify a comma as a separator, using it like the decimal digit. For examples
of how to do this, see “Decimal Delimiter Examples” on page A-4.
The optional power of ten exponent is ‘e’ (lower or uppercase), with an optional sign,
non-empty sequence of decimal digits.
Table 2-7 describes the floating-point precision and representation:
Character Strings
Char(n)/nchar(n) are character strings of length n. Varchar(n)/nvarchar(n) are vari-
able-length character strings of maximum length n. A valid character is between the ASCII
values 32 to 255.
If the record contains fewer data values than the actual columns defined in the table’s
schema, the system writes an error to the nzlog file and discards the record. To override this
behavior, use the -fillRecord option, which applies to the entire load operation.
The -fillRecord option tells the system to use a null value in place of any missing fields.
You can use this option as long as the columns whose values are missing allow nulls. If
these columns are defined as not null, the system writes an error to the nzlog file and dis-
cards the record. You must resolve this conflict by changing the schema to allow null values
or modifying the data file to include a valid non-null value.
Time
The Netezza appliance time is an exact, eight-byte data type stored internally as a signed
integer representing the number of microseconds since midnight.
The system accepts both 24 hour and 12 hour AM/PM time values. You can specify the for-
mat with the -timeStyle option. The default is the 24-hour format.
The time format consists of five components: hour, minute, second, fraction of a second,
and AM/PM token. You must have hour and minute; second and fraction of second are
optional. The AM or PM token is required for 12 hour and not allowed for 24-hour format.
The time options have the following formats. Note that the delimited examples use the
default time delimiter, which is a colon (:).
12-hour delimited HH:MM:SS.FFF [AM | PM] (such as 10:12 PM, or 1:02:46.12345
AM)
12-hour undelimited HHMMSS.FFF [AM | PM] (such as 1012 PM or 010246.12345
PM)
24-hour delimited HH:MM:SS.FFF (such as 19:15 or 1:15:00.1234)
24-hour undelimited HHMMSS.FFF (such as 1915 or 10246.12345 PM)
In these formats, note the following:
HH is a one- or two-digit hour value from 1–12 in the 12-hour notation or 1–24 in the
24-hour notation. In undelimited format, you must specify two digits such as 01, 02,
and so on.
MM is a one- or two-digit minute value from 1–60. In undelimited format, you must
specify two digits such as 01, 02, and so on.
SS is a one- or two-digit seconds value from 1–60. In undelimited format, you must
specify two digits such as 01, 02, and so on.
FFF specifies a fraction of a second. If you specify a fractional value, you must precede
it with a decimal point. If the value can be stored without loss of precision, it is
accepted. If the value cannot be stored without loss of precision, it is rejected. You can
use the -timeRoundNanos option to allow rounding when the full precision of any frac-
tional digits cannot be preserved, as described in “Using the -timeRoundNanos
Option” on page 10-22.
Syntax The input format of time with time zone value is identical to that of simple time
followed by a trailing signed offset from Coordinated Universal Time (UTC — formerly
Greenwich Mean Time GMT). The time section must conform to the -timeStyle and
-timeDelim in effect during the nzload job.
You must specify a signed, time-zone hour, whereas the time-zone minute is optional. If
you use the minute, separate it with a colon (the default timeDelim character).
Note: You cannot use named time zones, such as EST.
time with time zone <time> ( '+' | '-' ) <digit> [ <digit> [ ':' <digit> [ <digit> ] ] ]
Timestamp
The Netezza appliance timestamp is an exact data type stored as eight bytes. The stored
offset has the same 1μS resolution as the time data type.
Syntax The input format of a timestamp value is a date value followed by a time value.
You can have optional spaces between the date and the time. The date section must con-
form to the -dateStyle and -dateDelim in effect during the load job.
Restrictions
The following are restrictions and considerations for use with external tables:
Always consider your source and target systems, and whether the data is properly for-
matted for loading.
To insert and drop an external table, use the INSERT and DROP commands.
You cannot delete, truncate, or update an external table. After creating an external
table, you can alter as well as drop the table definition. (Dropping an external table
deletes the table definition, but it does not delete the data file that is associated with
the table.) You can select the rows in the table, as well as insert rows into the table
(following a table truncation).
While you cannot select from more than one external table at a time in a query or sub-
query, you can move data from one external table to another, such as using SELECT
and INSERT. The system displays an error if you incorrectly specify multiple external
tables in a SQL query, or if you reference the same external table more than once in a
query:
*ERROR: Multiple external table references in a query not
allowed*
To specify more than two external tables, load the data in into a non-external table and
specify this table in the query.
You cannot perform a union operation involving two or more external tables.
Using the nzbackup command to back up external tables backs up the schema but not
the data.
Host-side operations, such as selects and rowsetlimit user and group property interac-
tions, are not supported for compressed external tables.
The DecimalDelim option is not supported for compressed external tables.
There is a maximum limit of 300 concurrent loads for multiple loads.
Best Practices
When specifying external tables, note the following:
An external table reference can appear as the source table of a SELECT FROM state-
ment. Note that a transient external table reference in a SELECT FROM clause infers
its shape from the preceding INSERT INTO clause.
In Netezza Release 4.6 and later, the system catalog datatypes TEXT and NAME are
treated as NVARCHAR. If these types are used in the table that is referenced in the
select_clause, include the encoding option in the CREATE EXTERNAL TABLE com-
mand to specify internal encoding. Otherwise you could receive the error “LATIN9
encoding cannot be specified with NCHAR/NVARCHAR column definitions.” For
example:
create external table '/tmp/ext1' using (encoding 'internal') as
select username from _t_user;
The CREATE EXTERNAL TABLE AS statement supports an optional table name. If you
do not provide a table name, the table is transient, which means the external table def-
inition does not persist in the system catalog. If you supply a table name, the external
table becomes a named object in the system catalog.
The USING clause in the inline external statement is optional. If you omit it, the result-
ing external table has the default settings. Note that you must specify the USING
clause in the CREATE EXTERNAL TABLE SAMEAS statement, because the SAMEAS
table might be another external table.
When you insert data into an external table that references an existing data file, the
system truncates the file before inserting the external table’s data.
You cannot use external tables in complex SQL statements. If the statement is not sup-
ported, the system displays an error.
Before you reload an external table, verify that the destination table in the database is
empty or that it does not already contain the rows in the external table that you are about to
reload. If the destination table already contains the rows contained in the external table,
unintended problems may occur. These problems could also occur if you accidentally
reload the external table more than once.
For example, loading a text-format external table into a destination table that already con-
tains the same data creates duplicate data in the database. The rows will have unique row
IDs, but the data will be duplicated. To fix this problem, you would have to delete the
duplicate rows or truncate the database table and reload the external table again (but only
once).
If you load a compressed binary format external table into a destination table that already
has the same rows, you will create duplicate rows with duplicate row IDs in the database
table. The system restores the rows using the same row IDs saved in the compressed binary
format file.
Duplicate row IDs can cause incorrect query results and could lead to problems in the data-
base. You can check for duplicate rowIDs using the rowid keyword as follows:
SELECT rowid FROM employee_table GROUP BY rowid HAVING count(rowid)
>1;
If the query returns multiple rows that share the same row ID, truncate the database table
and reload the external table (but only once).
After you load data from an external table into a user table, you should run GENERATE
STATISTICS to update the statistics for the user table. This improves the performance of
queries that run against that table.
Examples
The following examples show how to use the CREATE EXTERNAL TABLE command.
To create an external table, enter:
CREATE EXTERNAL TABLE ext_orders(ord_num INT, ord_dt
TIMESTAMP)USING(dataobject('/tmp/order.tbl') DELIMITER '|');
To create an external table that uses column definitions from an existing table, enter:
CREATE EXTERNAL TABLE demo_ext SAMEAS emp USING (dataobject
('/tmp/demo.out') DELIMITER '|');
To create an external table and specify the escape character (‘\’), enter:
CREATE EXTERNAL TABLE extemp SAMEAS emp USING( dataobject
('/tmp/extemp.dat') DELIMITER '|' escapechar '\');
To unload data from your database into a file by using an insert statement, enter:
INSERT INTO demo_ext SELECT * FROM emp;
To drop an external table, enter:
DROP TABLE extemp
The system removes only the external table’s schema information from the system cat-
alog. The file defined in the dataobject option remains unaffected in the filesystem.
To back up by creating an external table, enter:
CREATE EXTERNAL TABLE '/path/extfile' USING (FORMAT 'internal'
COMPRESS true) AS SELECT * FROM source_table;
To restore from an external table, enter:
INSERT INTO t_desttbl SELECT * FROM EXTERNAL'/path/extfile'
USING(FORMAT 'internal' COMPRESS true);
To make the source file FIXED format with the schema as defined:
select * from external '<file>' (schema) USING (FORMAT 'FIXED'
LAYOUT (...))
To make the source file FIXED format and the table takes on the schema of the target
table:
insert into <table> select * from external '<file>' USING (FORMAT
'FIXED' LAYOUT (...))
The following example will not work, because you cannot unload data into a FIXED for-
mat external table:
create external table '<file>' [(schema)] USING (FORMAT 'FIXED'
LAYOUT ... )
Fixed-Length Format
The following examples show how to use Fixed-Length format with external tables:
To load data in fixed format, enter:
INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT
‘FIXED’ LAYOUT (BYTES 20, REF BYTES 3, BYTES @2) )
To load data with different date/time delimiters for different zones, enter:
INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT
‘FIXED’ LAYOUT ( YMD ‘-‘ BYTES 15, DMY ‘/’ BYTES 15 ) )
To load spatial data (binary data into VARCHAR), enter:
INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT
‘FIXED’ CTRLCHARS true LAYOUT ( BYTES 100, REF BYTES 4, BYTES @2) )
To load fixed format data with record-length and no record-delimiter, enter:
INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT
‘FIXED’ RECORDDELIM ‘’ RECORDLENGTH @1 LAYOUT( REF BYTES 2, BYTES
120, REF BYTES 2, BYTES @3) )
To load data with different NULLIF clauses for different zones, enter:
INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT
‘FIXED’ LAYOUT ( BYTES 15 NULLIF ‘2000-10-10’, BYTES 2 & = ‘12’) )
To load data with NULLIF clauses referring other zones, enter:
INSERT INTO t SELECT * FROM EXTERNAL ‘/data/fixed’ USING ( FORMAT
‘FIXED’ LAYOUT ( REF BYTES 2, BYTES @1 NULLIF @1 = -1, REF BYTES 4,
BYTES 100 NULLIF &&3 = ‘null’ ) )
To load data into user table EMP from external table EXTEMP, enter:
TRUNCATE TABLE emp;
INSERT INTO emp SELECT * FROM extemp;
This chapter describes the options used with external tables. For examples of how to use
external tables, see Appendix A, “Examples and Grammar.”
Options
When you create an external table definition, you can specify options. There are different
types of options: some are for records/rows, some are for fields, and some are for loads. Use
these options when loading from an external table or when using the external table directly
in a SQL query.
Note: The best method to verify that the load processing has been successful is to ensure
the system records any errors to the nzlog and nzbad files. Check these files occasionally.
Table 3-1 lists the external table options, and a description of each option follows. In the
Valid Formats column, Text refers to Text-Delimited format and Fixed refers to Fixed-Length
format. In the Data type column, enumeration refers to the system accepting a specified
set of quoted or unquoted string values.
Table 3-1: External Table Options
Valid Unload
Option Values Default Data Type
Formats Y/N
3-1
Netezza Data Loading Guide
Valid Unload
Option Values Default Data Type
Formats Y/N
Valid Unload
Option Values Default Data Type
Formats Y/N
Option Details
The following sections details the different options.
BoolStyle
Specifies the boolean style. During a load, the loader can handle only a specific style of
boolean values.
Table 3-2 lists the styles and their values.
1_0 1 or 0
T_F T or F
Y_N Y or N
YES_NO YES or NO
The default style is 1_0. The values can be expressed in mixed case, so ‘true’ can be ‘True’
or ‘TRUE’ or ‘tRuE’.
If you specify the YES_NO option on the command line, the system assumes that the data
in the Boolean field is in the form yes or no. If the data is any of the other values: true,
false, 1, 0, t, f, y, or n, the system discards the record to the nzbad file and logs an error
with the record number in the nzlog file.
Compress
Specifies whether the source datafile data is compressed or not. The valid values are true
or on, false or off. The default is false. This can only be true if the format is set to ‘internal’.
CRinString
Specifies whether to allow unescaped carriage returns in char/varchar and nchar/nvarchar
fields. Acceptable values are true or false, on or off. Do not put quotes around the value.
False – Default, treats all cr or crlf as end-of-record.
True – Accepts unescaped CR in char/varchar fields (LF becomes only end of row).
Note: This option is different for Fixed-Length format. For more information, see “Changed
Options” on page 6-3.
CtrlChars
Specifies whether to allow an ASCII value 1-31 in char/varchar and nchar/nvarchar fields.
You must escape NULL, CR, and LF characters. Acceptable values are: true or false, on or
off. The default is false. Do not insert quotes around the value.
Note: This option is different for Fixed-Length format. For more information, see “Changed
Options” on page 6-3.
DataObject
Specifies the OS-path to the source datafile (or any media that can be treated as a file).
There is no default, and this must be specified. When the remotesource option is not set (or
set to empty string), this path has to be an absolute path and not a relative path. The file-
name must be a valid UTF-8 string.
For loads, this file has to be an existing file with READ permission for the OS user ini-
tiating the load.
For unloads, the parent directory of this file has to have READ-WRITE permissions for
the OS user initiating the unload, and the data file is overwritten if it already exists.
DateDelim
Specifies the delimiter character that separates the date components, and used with the
dateStyle option. The default is ‘-‘ for all dateStyles except MONDY[2], where the default is
‘ ‘ (space). This is a single-byte string.
If you specify the option as an empty string, which means that there is no delimiter
between the date components, you must specify days and months as two-digit num-
bers. Single-digit months and days are not supported.
With MonDY or MonDY2, the default dateDelim option is space.
With days and months less than 10, use either one or two digits, or a space followed by
a single digit.
With the dateDelim option as a space, the system allows a comma after the day.
With any component (day, month, year) as zero, or any day/month inconsistency, such
as August 32 or February 30, the system returns an error.
Table 3-3 lists dateDelim option examples.
Note: If not using delimiters, the date will be determined as in the following example for
June 12, 2009:
06122009
DateStyle
Specifies how to interpret the date format. The date style settings ‘YMD’, ‘MDY’, ‘DMY’,
‘DMONY’, ‘MONDY’. The default is YMD.
Note: The two-digit year formats (Y2MD, MDY2, DMY2, DMONY2 and MONDY2) are not
supported for unloads.
The dateStyle options are shown in Table 3-4, with an example date of March 21, 2012.
The examples shown have no date delimiter between the values.
For example, if your datafile has a date of 03-21-12, with a DateDelim of ‘-’, use the for-
mat MDY2.
The default dateStyle is YMD, and the SQL standard stipulates that the legal years are
0001 to 9999. There is no provision in SQL for years prior to 0001 AD or later than 9999
AD.
Date example: In the data file jan-01.data, the data are specified as the following (with the
date format shown in Bold):
14255932|30/06/2002|20238|20127|40662|157|
Because the date value is using the DD/MM/YYYY format, specify the following dateStyle
and dateDelim options:
nzload -t agg_month -df jan-01.data -delim ‘|’ -dateStyle DMY -datedelim '/'
DecimalDelim
Specifies the decimal delimiter for the following data types, for both text-delimited and
fixed-length formats: float, double, numeric, time, timetz, and timestamp. Default is ‘.’.
For examples of usage, see “Decimal Delimiter Examples” on page A-4.
Delimiter
Specifies the field delimiter. The default is the pipe character ‘|’. You can specify charac-
ters in the 7-bit ASCII range using either a quoted value (for example: delimiter '|') or
by its unquoted decimal number (delimiter 124) . To specify a byte value above 127,
use the decimal number. This is a single-byte string.
Note: For nzload, the default is ‘\t’(tab).
The system processes an input row by identifying the successive fields within that row. A
single character field delimiter separates adjacent fields. The lack of a field delimiter
between fields is an error. You can use a trailing field delimiter following the last field in a
row (but it is not required).
You can specify the following delimiters:
Numeric – 0xNN or NN where NN is a number for either hexadecimal or decimal.
Encoding
Specifies the encoding of the datafile for the character set. The default is ‘internal’.
You can also specify ‘utf8’ if the whole file is in UTF-8 encoding and has only nchar/nvar-
char data and no char/varchar data. Use ‘internal’ if the file could have both Latin-9 and
UTF-8 data– or either type– using char, varchar, nchar, or nvarchar data.
The system supports single-byte characters in Latin9 encoding, and Unicode data in the
multi-byte UTF-8 encoding. Use the encoding option to specify the type of data in the file.
The encoding option has three values:
A value of ‘latin9’ indicates that the whole file is in Latin-9 char/varchar data and has
no nchar/nvarchar data. (If the file contains any nchar/nvarchar data, it will be rejected
by the load operation.)
A value of ‘utf8’ indicates the whole file is in UTF-8 encoding and has only nchar/nvar-
char data and no char/varchar data. (If the file contains any char/varchar data, it will be
rejected by the load operation.)
The value ‘internal’ indicates that the file could have either or both Latin-9 and UTF-8
data using any or all of the char, varchar, nchar, or nvarchar data types. As a best prac-
tice, use ‘internal’ if you are not certain of the data encoding.
For more information, see the “Using International Character Sets” chapter in the IBM
Netezza Database User’s Guide.
Use the nzconvert command to convert character encoding before loading with external
tables. For the command options and examples, refer to “Converting Legacy Formats” in
the IBM Netezza Database User’s Guide.
Note: This option is not supported for Fixed-Length format.
EscapeChar
Specifies the use of an escape character. The character immediately following the ‘\’ is
escaped. The only supported value is ‘\’, and the default is no escaping.
By default, the system expects fields to be delimited by a field-delimiter character or by an
end-of-row sequence. The system assumes all other characters are part of the field’s value.
Although efficient, this representation has the drawback that string fields may not contain
instances of the field delimiters. In addition, one value typically becomes inexpressible
because you have used it to convey the absence of any value (that is, that column is null).
One solution is to use an escape character for the delimiter. For example, the following
command line demonstrates using the escapeChar option.
nzload -escapeChar ‘\’ -nullValue ‘NULL’ -delim ‘|’
|NULL| – A null input field
|\NULL| – A non-null input field containing the text NULL
|\|| – A non-null input field containing the single character |
|\\| – A non-null input field containing the single character \
Note: This option is not supported for Fixed-Length format.
FillRecord
Specifies whether to allow an input line with fewer columns than the table definition. Miss-
ing or trailing input fields should be treated as nulls if the columns are nullable. The
default is false.
The system expects one input field for every column in the target table’s schema, and
rejects a row with fewer fields. If you specify the fillRecord option, the system allows omit-
ting one or more trailing (rightmost) fields, as long as all corresponding columns can be
null.
Note: This option is not supported for Fixed-Length format.
Format
Specifies the data format of the source file to load and unload. The valid values are as
follows:
‘text’ (default) – Data in Text-Delimited format
‘fixed’– Data in new Fixed-Length format
‘internal’ – Data in compressed binary format (to use this, the compress option must be
set to true)
IgnoreZero
Specifies discarding byte value zero in char() and varchar() fields. The default is false. If
true, the command accepts binary value zeroes in input fields and discards them.
Note: This option is not supported for Fixed-Length format.
IncludeZeroSeconds
Specifies that “00” seconds values will be unloaded to the external table. For example, a
time value such as 12:34:00 or 12:34 will be unloaded to the external table in the format
12:34:00. The default is false.
Note: This option is not supported for Fixed-Length format, and is only for unloading.
Layout
Specifies the zone definitions.
Note: This option is used only with the Fixed-Length format. For more information, see
“New Options” on page 6-2.
LogDir
Specifies the directory to which nzlog and nzbad files are generated for loads. This is not
used for unloads. The default value is '/tmp'. Note that when doing remote loads from Win-
dows clients (through ODBC/JDBC), the default output directory is mapped to "C:\". The
directory name must be a valid UTF-8 string.
MaxErrors
Specifies the number of errors at which the system stops processing rows. If the count of
rejected rows reaches this threshold, the system immediately aborts and rolls back the
load.
The default value is 1. This default has the effect of committing a load only if it contains
no errors. A maxErrors value n (where n is greater than 1) allows the first n-1 row rejections
to be recoverable errors, not including the number of rows processed in the skipped row
range.
Use this option to specify a different value, from 0 (unlimited errors) up to 2,147,483,647
(the largest signed 32-bit integer).
Note: This option is different for Fixed-Length format. For more information, see “Changed
Options” on page 6-3.
MaxRows
Specifies to stop processing after this initial number of rows. Use a limit clause with the
select statement to limit loading data. The default is 0 (load all rows).
After processing a row (whether inserted, skipped or rejected), the system decides whether
to look for another input row:
If you did not specify the maxRows option, the system attempts to locate the next input
row.
If you specified the maxRows option and the input row counter is equal to the
maxRows count, the system ends the load and commits all inserted records, not
including the rows processed in the skipped row range. Otherwise, the system attempts
to locate the next input row.
NullValue
Specifies the string to use for the null value, with a maximum 4-byte UTF-8 string. The
default is ‘NULL’. You can specify a value such as a space (' ') or any string up to four char-
acters. Conceptually a field contains either a value or an indication that there is no value.
The system provides some flexibility in how you indicate that a field contains no value. For
more information about how the system handles nulls, see “Column Constraint Rules for
Empty Strings” on page 2-10.
The system determines a field’s type and whether it is null by inspecting the corresponding
column declaration:
If there is no value, the system sets the corresponding value in the candidate binary
record to null.
If you declared the target column “not null,” then an absence of a value is an error.
If a field does not indicate null, the system assumes it contains a value. The system
analyzes the contents of that field, converts its textual input representation to binary,
and sets the corresponding value in the candidate binary record to that value.
QuotedValue
Specifies whether data values are quoted or not. The default is false. Specify SINGLE or
YES to require single quotes or DOUBLE to require double quotation marks. You can pre-
cede the opening quote or follow the closing quote with spaces. You can use the actual
quote characters if you enclose them in double quotes. The system recognizes the end of
the field by a field-delimiter character or an end-of-row sequence.
The system recognizes a quoted value when the first non-space character is the quote char-
acter specified in the quotedValue option. If the first non-space character is not the
specified quote character, then the system handles it according to the normal rules. In par-
ticular, leading or trailing spaces in string fields are considered part of the string’s value.
For example, the following command line demonstrates using the quotedValue option.
nzload -quotedValue SINGLE -nullValue ‘NULL’ -delim ‘|’
|NULL| – A null input field
|’NULL’| – A null input field
| I’m | – A non-null input field containing the text “I’m “
| ‘I’’m’ | – A non-null input field containing the text “I’m“
| ‘|’ | – A non-null input filed containing the single character “|”
|’ ‘| – A non-null input field containing a single space
| | – A non-null input field containing a single space
| ‘‘ | – A non-null input field containing a zero-length string
|| – A non-null input field containing a zero-length string
Note that unlike the escapeChar option, the quotedValue option is not able to force the sys-
tem to accept the nullValue token as a valid non-null input value. The system overhead for
processing quoted value syntax is much greater than the default unquoted syntax. In addi-
tion, except for strings containing three or more field delimiters that need to be escaped
and no embedded quotes, using the quotedValue option results in more bytes of input data
than the escapeChar option. When you have a choice, use unquoted syntax.
If you expect all values in all input fields (string or otherwise) to be uniformly enclosed in
quotes, then use the requireQuotes option to cause the system to enforce this usage. Using
the requireQuotes option improves the parsing overhead and provides extra robustness.
Note: This option is not supported for Fixed-Length format.
RecordDelim
Specifies that the row/record delimiter to be used is the string literal. Valid values must be
a maximum 8-byte UTF-8 string.
Note: This option is used only with the Fixed-Length format. For more information, see
“New Options” on page 6-2.
RecordLength
Specifies the length of the entire record. Includes the length itself, but does not include
the RecordDelimiter.
Note: This option is used only with the Fixed-Length format. For more information, see
“New Options” on page 6-2.
RemoteSource
Specifies the source datafile is remote, and takes the following values: ODBC, JDBC,
OLE-DB, or empty string. External tables created with the remote source set to ODBC,
JDBC, or OLE-DB are usable only through those, respectively. External tables created with
the remote source not set (or set to empty string) are usable from any client (the source
datafile path is assumed to be on the Netezza host, even if the load/unload is initiated
remotely from a different host).
Note that nzsql does not support remote loads/unloads using external tables (you can only
create external tables remotely), though it does support loads/unloads locally on the host.
This option is automatically set to ODBC if the hostname option is set to anything but local-
host or the reserved IP address (127.0.0.1).
RequireQuotes
Specifies if quotes are mandatory. The default is false. If set to true, the quoted value must
be set to YES, SINGLE, or DOUBLE. See “QuotedValue” on page 3-10.
Note: This option is not supported for Fixed-Length format.
SkipRows
Specifies the number of initial rows to skip before loading the data. The default is 0 (none).
After the system has a candidate binary record from an input row, it determines whether to
insert that record into the target table:
If you did not specify this option, the system inserts every record.
If you specified this option and the input row counter is less than or equal to the
skipRows count, the system discards the candidate binary record (skipped). Otherwise,
the system inserts the record.
Note: If you use the skipRows option, the system skips that number of rows, and then
begins the count for the maxErrors and/or maxRows options (if you have specified them).
Note that this cannot be used for 'header' row processing in a datafile, as even the skipped
rows are processed first, so the data in the header rows should be valid with respect to the
external table definition.
This option can be used for doing a dry-run to validate the datafile is correct, before loading
into a user table, by setting a maximum value.
SocketBufSize
Specifies the chunk size at which to read the data from the source file, expressed in bytes.
Valid values range from 64KB to 800MB, with a default value of 8MB. Values outside this
range result in a system notice that the value will be reset to the appropriate minimum or
maximum level. This is used to fine-tune the performance of loads, depending on the
speed at which the source data is available for loads.
TimeDelim
Specifies the single-byte character that separates the time components. The default is ':'.
If you specify the timeDelim option as an empty string, you must specify the hour, min-
utes, and optional seconds as two-digit numbers.
If you specify the 12-hour format, you can precede the AM or PM token with a single
space. Note that the tokens, AM and PM are case-insensitive.
The system checks syntax and range errors. If an error occurs, the system discards the
record to the nzbad file and logs an error with the record number in nzlog file.
TimeRoundNanos
Rounds the time value to six fractional seconds digits. You can use the timeRoundNanos
option to specify allowing but rounding non-zero digits with smaller than microsecond
precision.
If you do not use the timeRoundNanos option, a value is accepted, as long as it can be
stored without loss of precision.
If you specify this option, the value is accepted, even when full precision of any frac-
tional seconds cannot be preserved. In this case, the value is rounded.
For example, consider the following timestamps:
1999/12/31 23:59:59.9999994
1999/12/31 23:59:59.9999995
Both of these timestamps specify finer than microsecond resolution. Without the
option, each would be rejected. Using the option, the first sample timestamp would
round to:
1999/12/31 23:59:59.999999
The second sample would round to:
2000/01/01 00:00:00.0
Note: This option is not supported for Fixed-Length format, and is also referred to as the
TimeExtraZeros option.
TimeStyle
Specifies the time format (‘24HOUR’, ‘12HOUR’) used in the data file. The default is
‘24HOUR’.
TruncString
Specifies truncating a string and inserting it into the declared storage.
False – Default, the system reports an error when a string exceeds its declared storage.
True – Truncate any string value that exceeds its declared char/varchar storage.
Note: This option is not supported for Fixed-Length format.
Y2Base
If you specify the Y2-style date, use the -y2Base option to specify the start of the 100-year
range. Table 3-5 provides some examples of date ranges and their corresponding input
values.
In Y2 input
Option Processing
This section contains additional information on how the system processes the options.
Counting Rows
The system uses a line-oriented input format – one line of text is an input row. It operates
by isolating successive rows in the input stream. Every time it finds a new row, it incre-
ments a row counter (starting with number 1) and analyzes the contents of the row.
During analysis two sorts of errors can occur:
The input text may not match the expected format.
A field value might fail to meet a requirement imposed by the target table schema.
If a row contains no errors, the system converts the row into a candidate binary record.
The system uses the following rules based on whether the field is a string field:
If the field is a string field – All characters from the beginning of the field to the termi-
nating delimiter or end of row sequence contribute to the field’s value.
If the field is a non-string field – The system skips any leading spaces, interprets or
converts the field’s contents, and skips any trailing spaces.
The string/non-string distinction also affects the details of how a field indicates that it is
null. For more information, see “Handling the Absence of a Value” on page 3-14.
token cannot have preceding or trailing adjacent spaces. The explicit null token
method makes it impossible to express a string consisting of exactly the text of the null
token.
The implicit method interprets an empty field as null. This method is always available
to non-string fields independent of any nullValue option setting and works even if the
non-string field contains spaces. You can use the implicit method on string fields only
if you have set the nullValue option to the empty string ('').
The system considers a string field empty (potentially null) only if it contains truly zero
characters (no spaces). Setting nullValue to the empty string makes it impossible to set
any character varying (alias varchar(n)) column to an empty, zero-length string. In other
words, if the system encounters an empty string and the nullValue is set to '', then the
system treats the empty string as a null value.
Load continuation cannot operate on any table that has one or more materialized views in
an active state. Before enabling load continuation, suspend the associated materialized
views. You can suspend active materialized views either through the NzAdmin tool or by
issuing the ALTER VIEWS command. Sample syntax for ALTER VIEWS follows.
ALTER VIEWS ON <table> MATERIALIZE SUSPEND
Once loading has completed, you can update and activate the materialized views for the
table. Sample syntax follows.
ALTER VIEWS ON <table> MATERIALIZE REFRESH
For more information, see the IBM Netezza System Administrator’s Guide.
Specify the crInString option to permit unescaped carriage returns (cr) in char/varchar
fields. If you specify the crlnString option, line feed (LF) becomes the default
end-of-row indicator.
Specify the escapeChar option to permit any character preceded with a backslash (\) to
be interpreted as an escape character. In this way, you could use the zero (byte 0), line
feed (byte 10), carriage return (byte 13), or the closing delimiter.
Specify the ignoreZero option to cause the system to check every character for zero.
This causes the system to skip over each zero it finds and to consider the next charac-
ter. If you specify this option, you cannot include a zero byte in a string.
For example, assume <nul> is a null byte, the field delimiter is '|' and you have speci-
fied ignoreZero:
..|<nul>AB<nul>CDEF<nul>|..
fills a char(6) column with 'ABCDEF'.
..|<nul>127<nul>|..
fills a byteint column with binary 01111111 (= 0x7F).
Table 3-6 lists the end-of-row and control characters that are permitted with the different
nzload system options. The mark indicates that the option is specified or allowed.
Note: In Fixed-Length format, control characters are treated differently. For more informa-
tion, see Chapter 6, “Using Fixed-Length Format.”
Session Variables
The following session variables work as nzload options.
LOAD_REPLAY_REGION – See “Enabling Load Continuation” on page 3-15.
MAX_QUERY_RESTARTS – The number of restarts allowed for load continuation. See
“Enabling Load Continuation” on page 3-15.
LOAD_LOG_MAX_FILESIZE – The maximum allowed size in MB for the log file.
This chapter describes the nzload command. Netezza SQL is the Netezza Structured Query
Language (SQL), which runs on the Netezza data warehouse appliance. Throughout this
document, the term SQL refers to Netezza’s SQL implementation. For nzload usage exam-
ples, see Appendix A, “Examples and Grammar.”
Ensure that the GROUP nz has READ permissions for the data file to load.
Use the -host option with the nzload command (such as nzload -host <hostname>).
4-1
Netezza Data Loading Guide
Program Invocation
The nzload command is a command-line program that accepts input values from multiple
sources. The precedence order is the following:
Command line
Control file. Without a control file, you can only do one load at a time, and using a con-
trol file allows multiple loads. See “Using a Control File” on page 4-5.
Environmental variables (only used for user, password, database, and host)
Built-in defaults
Option names are case insensitive. Every option has a standard name for use in either the
command line or the control file. For more information about the input values, see
Table 4-1 on page 4-3.
Many options include a token argument, which you can enclose in either single or double
quotes. The nzload command treats alphabetic characters in option token arguments as
case-insensitive (for example -boolStyle YES_NO is equivalent to -boolStyle yes_no).
Note: You must quote options that require a punctuation character as a token, and use an
escape character if quotes are part of the argument.
Syntax
The nzload command uses the following syntax:
nzload [-h|-rev] [options]
Inputs
The nzload command uses many of the options for external tables, as detailed in
Chapter 3, “External Table Options.” Particular options for nzload are shown in Table 4-1.
Option Description
-cf filename Specifies the control file. For more information, see “Using a Con-
trol File” on page 4-5.
-df filename Specifies the datafile to load. If you do not specify a path, the sys-
tem uses the special token <stdin> to store the filepath string.
Corresponds to the DataObject external table option.
-lf filename Specifies the log file name. If the file exists, this appends to it.
-bf filename Specifies the bad/rejected rows filename (overwrite if the file
exists).
-outputDir dir Specifies the output directory for the log and bad/rejected rows
files. Corresponds to the LogDir external table option.
-fileBufSize Specifies the chunk size (MB for fileBufSize or bytes for fileBufBy-
-fileBufByteSize teSize) at which to read the data from the source file. Corresponds
to the SocketBufSize external table option.
Additional Options
The nzload takes the following additional options:
Option Description
Option Description
-caCertFile path Specifies the pathname of the root CA certificate file on the cli-
ent system. This argument is used by Netezza clients who use
peer authentication to verify the Netezza host system. The default
value is NULL which skips the peer authentication process.
-securityLevel level Specifies the security level that you want to use for the session.
The argument has four values:
• 0 – preferredUnsecured – This is the default value. Specify
this option when you would prefer an unsecured connection,
but you will accept a secured connection if the Netezza sys-
tem requires one.
• 1 – onlyUnsecured – Specify this option when you want an
unsecured connection to the Netezza system. If the Netezza
system requires a secured connection, the connection will be
rejected.
• 2 – preferredSecured – Specify this option when you want a
secured connection to the Netezza system, but you will accept
an unsecured connection if the Netezza system is configured
to use only unsecured connections.
• 3 – onlySecured – Specify this option when you want a
secured connection to the Netezza system. If the Netezza sys-
tem accepts only unsecured connections, or if you are
attempting to connect to an Netezza system that is running a
release prior to 4.5, the connection will be rejected.
Note: If you specify an invalid value for the -securityLevel argu-
ment of the nzload command, the command defaults to the
preferredUnsecured (0) level.
-t table Specifies the table name. You can specify a fully qualified name
for this value.
-port Specifies the port to use, allowing you to override the default.
Outputs
The nzload command exits with the following codes:
0 – Successful, all input records were inserted.
1 – Failed, no records were inserted due to an error or errors found during the load.
2 – Successful, but errors found during the input did not exceed the error threshold
(-maxErrors), good records were inserted.
Options
Within a control file, you can specify the following options:
Any of the valid options for an external table. For more information, see Appendix C,
“Option Names.” You can specify the long format name of the option or the short for-
mat name.
Database – Specifies the name of the database to load.
Table – Specifies the name of the table to load the data.
Badfile (bf) – Specifies the name of the nzbad file, which contains any records which
could not be loaded. The default is table.database.nzbad.
Logfile (lf) – Specifies the name of the nzload log file, which contains messages and
errors that occurred during the load processing. The default is table.database.nzlog.
Datafile – Specifies the pathname of the file that you want to load into the specified
table and database. The datafile option must be the first line of the control file, fol-
lowed by list of control file options in curly braces {}. You can specify more than one
datafile, each with its own set of options, in the control file.
Decimal delimiter – Specifies to use a comma instead of a period as a decimal delim-
iter. The default delimiter is a period.
The options in a control file are case-insensitive. For example, you could specify the option
in letter formats such as database, DataBase, Database, or DATABASE.
Note that command line options take precedence over any equivalent options specified in a
control file. This allows you to override any control file options as necessary without chang-
ing the control file. If you specify a control file for the nzload command, you cannot specify
a data file argument (-df) on the command line.
Syntax
The syntax for using a control file is as follows, where each sequence can be another load:
DATAFILE <filename>
{
[<option name> <option value>]*
}
For example, the following control file options load the data from customer.dat into the cus-
tomer table:
DATAFILE /home/operation/data/customer.dat
{
Database dev
TableName customer
}
If you save the control file contents as a text file (named cust_control.txt in this example)
you can specify it using the nzload command as follows:
nzload -cf /home/nz/sample/cust_control.txt
Load session of table 'CUSTOMER' completed successfully
When you use the nzload command, note that you cannot specify both the -cf and -df
options in the same command. You can load from a specified data file, or load from a con-
trol file, but not both in one command.
The following control file options define two data sets to load. Note that the options can
vary for each data set.
DATAFILE /home/operation/data/customer.dat
{
Database dev
TableName customer
Delimiter '|'
Logfile operation.log
Badfile customer.bad
}
DATAFILE /home/imports/data/inventory.dat
{
Database dev
TableName inventory
Delimiter '#'
Logfile importload.log
Badfile inventory.bad
}
If you save these control file contents as a text file (named import_def.txt in this example)
you can specify it using the nzload command as follows:
nzload -cf /home/nz/sample/import_def.txt
Load session of table 'CUSTOMER' completed successfully
Load session of table 'INVENTORY' completed successfully
This chapter describes the options for unloading data. For usage examples, see
Appendix A, “Examples and Grammar.”
Unloading Options
The following external table options are not supported for unloads. For a complete list of
external table options, see Chapter 3, “External Table Options.”
CtrlChars
FillRecord
IgnoreZero
Layout
LogDir
MaxErrors
MaxRows
QuotedValue
RecordDelim
RecordLength
RequireQuotes
SkipRows
TimeRound Nanos/TimeExtraZeros
TruncString
Y2Base
The IncludeZeroSeconds external table option is used only for unloads. The 2-digit format
of the DateStyle external table option is not supported for unloads.
5-1
Netezza Data Loading Guide
This chapter describes the fixed-length format for loading data into external tables.
Formatting Background
All data is a series of byte-sequences and has an associated data type, used here as a con-
ceptual or abstract attribute of the data. Without an associated data type, a byte-sequence
can be interpreted in too many ways.
A single data type can be represented in different forms. For example, an integer data type
can be represented or stored in various types of binary format, or in human-readable
text/character format (typically ASCII). Similarly, dates, times and other data types have
multiple representations used by different programs, languages, and environments. At
some point, though, these data types must be represented in readable form, so users can
do something with the data. Data for loading into the data warehouse typically is presented
in either delimited format or fixed-length format, using either ASCII or UTF-8.
Fixed-Length Format
Fixed-length format files use ordinal positions, which are offsets to identify where fields are
within the record. There are no field delimiters, and there may be no end-of-record delim-
iter. Data in fixed-length format files seldom has decimal or time delimiters, as these are
not necessary, and take up space. Because the fields are fixed in size, the location of
delimiters are fixed, and can are specified in the layout definition, which accompanies the
fixed-length format data file.
Loading fixed format data into the database requires that you define the target data type for
the field, as well as the location within the record.
Not all fields in a fixed-length format file need to be loaded, and can be skipped using the
‘filler’ specification. The order of fields in the data file must match the order of the target
table, or an external table definition must be defined, which specifies the order of the
fields as database columns. Using an external table definition in combination with an
insert-select statement allows field order to be changed.
6-1
Netezza Data Loading Guide
Unknown or null values are typically represented by known data patterns, which are classi-
fied as representing null. The Netezza system identifies and act on these values.
Data Attributes
The typical data attributes in fixed-length format files are as follows:
Data Type – The data at a given offset in a record is always of the same type.
Representation – The representation is constant, and each field has a fixed width. Data
within a field is always presented in the same way. Certain items such as radix points,
time separators, and date delimiters are always at the same place and are typically
implied, rather than being actually present in the data file.
Value – The value can be an actual value or a null indicator. Data representations
which indicate a null value are specified by the layout definition. Assuming null is
allowed.
Length – There is no length specification within the data file, as length in the file is
fixed, and the length attribute is specified by the layout definition.
Null-ness – Null-ness is identified in the layout definition as either a specific data pat-
tern, such as “all spaces” or as being “flagged” by a value in another column.
Format Options
For the fixed-length format, new options have been added, and some have been changed.
New Options
The following added external table options are valid only for the fixed-length format.
RecordLength – The length of the entire record, including null-indicator bytes (if any)
and excluding record-delimiter (if any).
No default value
Constant integer
RecordDelim – The row/record delimiter.
Default is ‘\n’ (new-line). Note that the field is literally interpreted, so ‘\n’ looks for
those characters, and not ‘new-line’
The end-of-record delimiter is entered between single quotes. The end-of-record
indicator can be up to a maximum 8 bytes long
The omission of a record delimiter is defined by side-by side single quotes
Layout – Mandatory for fixed-length format. Used to define the location of fields of the
input record.
No default value
Comma separated zone definitions within braces
Changed Options
The following external table options have a different meaning for the fixed-length format:
Table 6-1: Changed Option Meanings
Option Meaning
MaxErrors Sets the maximum number of allowed (non-fatal) errors before aborting
the load. Since the parser now reports errors for each field or zone rather
than just one error for the row, multiple errors can be reported for the
same row, so this limit must be set accordingly. When the parser sees an
error in a field/zone, it recovers (using the field/zone length) and contin-
ues from the next field/zone, until the End-of-Record, a fatal error, or this
maxerrors limit is reached.
Fatal errors include the following:
• RecordLength mis-match
• RecordDelimiter not found
• RecordLength invalid (negative values or zero)
• Zone length invalid (negative values)
• UTF-8 initial byte is invalid
• UTF-8 continuation bytes are invalid
Unsupported Options
The following external table options are not supported for fixed-length format, and if set,
result in an error:
Encoding
FillRecord
IgnoreZero
TimeExtraZeros
TruncString
AdjustDistZeroInt
IncludeZeroSeconds
Delimiter
EscapeChar
QuotedValue
RequireQuotes
Default Values
The following existing external table options work as default values for zone definitions:
NullValue – Default for the ‘NULLIF’ clause of all zones.
DateStyle, DateDelim, TimeStyle, TimeDelim, BoolStyle – Default for zone style for cor-
responding date, time and bool zones.
Layout Definitions
Layout is an ordered collection of zone (field) definitions, and is a required option for
fixed-length format. Each zone (field) definition is made up of mutually exclusive
(non-overlapping) clauses. These clauses must be in the following order, although some are
optional and can be empty:
Use-type – Indicates whether a zone is a normal (data) zone or a filler zone. For data
zones, this value is omitted. Filler zones can only be specified in bytes. Other use-types
exist, but are not used for fixed-length format data.
Name – The name of the zone. Duplicate zone names are not allowed. This definition is
not currently used, but is typically provided to identify the field.
Type – Defines the zone type. When not specified, type is defaulted to the correspond-
ing table column’s type. Filler-zones must have a zone type of INT. Valid values are as
follows:
CHAR
VARCHAR
NCHAR
NVARCHAR
INT1
INT2
INT4
INT8
INT
UINT1
UINT2
UINT4
UINT8
UINT
FLOATING
DOUBLE
NUMERIC
BOOL
DATE
TIME
TIMESTAMP
TIMETZ
Style – Defines the zone representation, and is optional. This is defaulted based on the
zone-type and ‘Format’ option. All other styles are only valid for their corresponding
non-textual zone-types. Valid values are the following:
INTERNAL – Valid only for textual zones (CHAR/VARCHAR/NCHAR/NVARCHAR)
DECIMAL – Valid for integer/numeric zone types
DECIMALDELIM – Valid for numeric, float, double, and time-styles (time, timetz,
and timestamp) zone type
FLOATING – Valid for float or double zone type
SCIENTIFIC – Valid for float or double zone type
YMD <‘date-delim’> (and other date-styles currently supported in external table
options DateStyle and DateDelim; valid for date zones
12Hour <’time-delim’> (and other time-styles currently supported in external table
options TimeStyle and TimeDelim; valid for time zones)
24Hour <’time-delim’> (and other time-styles currently supported in external table
options TimeStyle and TimeDelim; valid for time zones)
YMD <’date-delim’> 24Hour <’time-delim’> (and other combinations of date and
time styles currently supported for external table options DateStyle, DateDelim,
TimeStyle and TimeDelim; valid for timestamp and timetz zones
TRUE_FALSE, Y_N, 1_0 (and other boolean styles currently supported for external
table option BoolStyle; valid for boolean zones). Style has to be in accordance with
format
Length – Specified in bytes.
Nullif – Defines the zone null-ness attribute. For fixed format files, this clause speci-
fies a known data pattern within the field which when present signifies the field is null.
Length should be equal to or less than the column width, and maximum length is 39
bytes.
Nulls are detailed in Table 6-2:
Table 6-2: Layout Example
End-of-Record
When fixed format records end in a on-line character, no action is required, On-line is the
default end-of-record delimiter. When there is no record separator, use single quotes side
by side, as in the following example:
RecordDelim ‘’
RecordDelim is a literal sequence of up to 8 bytes, which does not translate common
escape representations or support functions like CHAR(8).
Record Length
Record Length is optional, but can provide feedback that the format definition has the cor-
rect length. This excludes the end-of-record delimiter. The following is an example:
Recordlength NNN
Skipping Fields
The following clause skips four bytes:
“filler char(4) bytes 4”
However, the preferred method is to indicate the field being skipped, as in the following
example:
“filler fld_name char(4) bytes 4”
Temporal Values
Temporal values in fixed-length format files often omit delimiters. Table 6-3 shows clauses
that load dates, times, and timestamps without delimiters.
Table 6-3: Temporal Values
Numeric Values
Table 6-4 shows numeric values.
Table 6-4: Numeric Values
Logical Values
Table 6-5 shows logical values.
Table 6-5: Logical Values
Null Values
Fixed-length format files typically use ‘magic’ values to represent nulls. Adding a nullif
clause to any specification allows the column to be checked for null. A nullif clause has the
following parts:
The keyword “nullif”
The column reference
The test expression
As an example, a file specification where field1 is a date and is considered null if it has the
value ’99991231’ would have the following characteristics:
The nullif specification would be as follows:
“nullif &=’99991231’”
The entire specification would be as follows:
“fld1 date YMD'' bytes 8 nullif &=’99991231’”
All format specifications support the nullif clause.
In addition to &=, which evaluates to ‘string must exactly match,’ the nullif clause also
supports &&=, which allows substring matching. This is useful in cases where the string
may occur anywhere in a field with space padding. For example nullif &&=’N’ matches the
different expressions “ N “, “N “, “ N”.
Table 6-6 shows null values:
Table 6-6: Null Values
This appendix includes examples for using external tables, the nzload command, SQL
grammar, and references.
A-1
Netezza Data Loading Guide
nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp
-escapeChar ’\\’
To specify an input line with fewer columns than the table definition, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-fillRecord
To specify discarding the byte value zero in the char() and varchar() fields, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-ignoreZero no
To specify the log file name, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -lf /tmp/
daily/import.log
To specify the maximum number of errors, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-maxErrors 100
To specify stopping processing when the specified number of records are in the data-
base, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -maxRows
100
To specify the string to use for the null value, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-nullValue ‘none’
To specify the output directory for the log files, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-outputDir /tmp/daily
To specify that quotes are mandatory, except for null values, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-requireQuotes quoted value YES
To specify the delimiter to use for time formats, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-timeDelim ‘.’
To specify allowing but rounding non-zero digits with smaller than microsecond resolu-
tion, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-timeRoundNanos
To specify the time style value in the data file, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-timeStyle 12hour
To specify truncation a string and inserting it into the declared string, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-truncString
To specify the first year in the YY format, enter:
nzload -u admin -pw password -host nzhost -y2Base 2000
Reference Examples
Examples for references are as follows:
Table A-1: Reference Examples
Reference Meaning
BYTES &2 Error only internal @ reference allowed for length-clause (in any
format/zone-type).
NULLIF && = ‘123’ Matches (nullif evaluates to ‘true’) ‘123’, ‘ 123 ‘ ‘ 123 ‘, if
SPACE is skipped.
Length has to be at least BYTES 3 (text-styles) or BYTES 4.
SQL Grammar
This section provides an explanation of the SQL grammar used for CREATE EXTERNAL
TABLE.
[INSERT INTO <normal-table>] SELECT <col-list> FROM EXTERNAL [name] ‘<data-file>’
[USING ‘(‘ <Load-options>’)’]
Load-options: Load-option
| Load-option Load-options // space separated list of USING clause options
Load-option: FORMAT TEXT | INTERNAL | FIXED
| RECORDLENGTH <n>| Length-ref-expr
| RECORDDELIM <string-literal-max-8-bytes >
| LAYOUT ( Zone-definitions )
…..
Zone-definitions: Zone-def
| Zone-def ‘,’ Zone-definitions // comma-separated lists of zone definitions
Zone-def: [Zone-use-type] [Zone-name] [Zone-type] [Zone-style] [Zone-len] [Nullness]
Zone-use-type: REF | FILLER
Zone-name: Identifier
Zone-type: CHAR| VARCHAR
| NCHAR| NVARCHAR
| BOOL
| INT1 | INT2 | INT4 | INT8 | INT
| UINT1 | UINT2 | UINT4 | UINT8 | UINT
| NUMERIC
| FLOATING| DOUBLE
| DATE | TIME | TIMESTAMP | TIMETZ
Zone-style: INTERNAL
| DECIMAL [‘decimal-delim’]
| FLOATING | SCIENTIFC [‘decimal-delim’]
| Date-format
| Time-format
| Date-format Time-format
Date-format:
| DateStyle [‘date-delim’]
| DATE DELIM ‘date-delim’
Time-format:
Zone-ref: External-ref
| Isolated-ref
| Internal-ref
External-ref: &[n] // 1 based absolute position of zones, 0, negative values for relative posi-
tions backwards
Isolated-ref: &&[n] // 1 based absolute position of zones 0, negative values for relative posi-
tions backwards
Internal-ref: @[n]// 1 based absolute position of zones, 0, negative values for relative posi-
tions backwards
Length-ref-expr: Internal-ref [ Operator <n> ]
Operator: + | -
The following is an example of the Netezza External Table definition for this data:
CREATE EXTERNAL TABLE sample_ext (
Col01 DATE ,
Col09 BOOL ,
/* Skipped col10 */
Col11 TIMESTAMP,
Col26 Char(12),
Col38 Char(10),
Col48 Char(2),
Col50 Int4,
Col56 CHAR(10),
Col67 CHAR(3) /* Numeric(3,2) cannot be loaded directly */
)
USING (
dataobject('/home/test/sample.fixed')
logdir '/home/test'
format 'fixed'
layout (
Col01 DATE YMD '' bytes 8 nullif &='99991231',
Col09 BOOL Y_N bytes 1 nullif &=' ',
FILLER Char(1) Bytes 1, /* was col10 space */
Col11 TIMESTAMP YMD '' 24HOUR '' bytes 14 nullif &='99991231000000',
Col26 CHAR(15) bytes 15 nullif &=' ', /* 15 spaces */
Col38 CHAR(13) bytes 13 nullif &='****NULL*****' ,
Col48 CHAR(2) bytes 2 nullif &='##' ,
Col50 INT4 bytes 5 nullif &='00000' ,
Col56 CHAR(10) bytes 10 nullif &='0000000000',
Col67 CHAR(3) bytes 3 /* We cannot load this directly, so we use an insert-select */
) /* end layout */
); /* end external table definition. */
function CreateDb()
{
nzsql -c "create database test"
}
function CleanUp()
{
$NZSQL "drop table textDelim_tbl"
$NZSQL "drop table textFixed_tbl"
}
function CreateTable()
{
$NZSQL "create table textDelim_tbl(col1 int, col2 char(10), col3
date)"
$NZSQL "create table textFixed_tbl(col1 int, col2 char(10), col3
date)"
}
function CreateDataFile()
{
function LoadData()
{
# nzload using text format
nzload -t textDelim_tbl -df $DIR/delimData -db test -outputDir
$LOGDIR -delim '|' -dateStyle MDY -dateDelim '/'
#nzload using fixed format
nzload -t textFixed_tbl -df $DIR/fixedData -db test -outputDir
$LOGDIR -format fixed -layout "col1 int bytes 1, col2 char(10) bytes
10, col3 date YMD '-' bytes 10"
function UnloadData()
{
$NZSQL "insert into textDelim_tbl select * from external '$DIR/
delimData' using (Delimiter '|' DateStyle 'MDY' DateDelim '/');"
}
CreateDb
CleanUp
CreateTable
CreateDataFile
LoadData
UnloadData
B-1
Netezza Data Loading Guide
Check the field delimiter. It should be a character used to separate one field value from
another. This field delimiter should be unique and should not appear in a field value,
especially in a char or varchar string. Use the -delim option to specify the field
delimiter.
Check whether there are any NULL values in the data source. How is the null value
expressed in the data file? The RDBMS industry convention is to use the string “null”
to represent a null value. If the data file uses a different representation, use the -
nullValue option to override the default null value. The new value can be an empty
string or a value in the range of a-z or A-Z and no longer than four characters.
Check whether there are any date, time, time with time zone, or timestamp data types
in the table schema. If there are, what style is the date value? The style of these data
type values must be consistent throughout the nzload job.
Check the handling of string fields for char() or varchar() data types. Does the longest
or largest value fit into the storage of the char() or varchar() declaration? If not, is it
possible to alter the schema to accommodate the longest string?
If schema cannot be altered, is truncating a string an acceptable solution?
If truncation is acceptable, specify the -truncString option.
If neither is acceptable, the nzload command treats the record with the long string
as an error record. The nzload command discards the record to the nzbad file and
logs an error with the record and column numbers in nzlog file.
See whether there are any special characters used in the string fields. For example,
CR, CRLF, or a character in a string that is the same as the field delimiter? This
violates the unique character rule.
If there are special characters, can you regenerate the data file to have an escape
character added to these special characters? If so, then use the -escapeChar '\\'
option to process the strings.
If you cannot regenerate the data file, then the load will contain incomplete and
invalid records.
Run the Linux top command on the host to monitor CPU resources. Consider running more
loads concurrently if resources are available.
Troubleshoot
If you see the error message, “Too many data fields for table,” use the Linux command
head -1 on the data file to get the first row, which may contain the column’s names
extracted. Compare these to your create table's DDL and see if their physical positions
match.
If you see the error message, “Data type mismatch on column 5,” use the Linux command
cut -d^ -f 5 inputfile | more to look at the individual data values in the source file and then
compare them to your DDL. Compare these to your create table's DDL and see if their phys-
ical positions match.
Handle Exceptions
Repeat the load on the -bf file. If there are many exceptions, fix them and re-extract from
the source system. If they are few, use a text editor to change data. To make large substitu-
tions, use the Linux sed or awk commands.
Generate Statistics
Remember to run the generate statistics command on your tables and/or database after you
have loaded new data.
Test Performance
If your data is evenly distributed, you should see peak loading performance of at least 75
percent CPU utilization on the host. You can monitor utilization by running the Linux top
command during the load. If you see less CPU utilization that means either the data is
skewed so that all SPUs are not sharing the workload or the parser is waiting for data.
If your input data is skewed, that is, all records are being sent to a small number of SPUs,
those SPUs become the performance bottleneck.
If your CPU utilization is less than 75 percent and the data is evenly distributed, you might
have a streaming problem:
If the load is running from the local host, determine the source of the data.
Look for other concurrent database activities — such as activities that are SPU-to-SPU
broadcast intensive or SPU disk I/O intensive.
If the data is not locally staged or is on a SAN / NFS mount, determine if the bottle-
neck is the remote source of the data or the network.
The performance of the Netezza appliance system depends on the number of SPUs. If,
however, data is being streamed across an external network, then the performance is
limited by the speed of the network.
Test the network by using the FTP command to send a file between the source and the
local host, and measure the transfer rate. Under optimal conditions, a Gig-E network
transfers at a rate of ~1000Mb/second, or ~125MB/second or ~450GB/hour.
Reporting Errors
The nzload command returns standard error status when it completes.
0 – The load was successful, all input records were inserted.
1 – The load failed, no records were inserted due to error(s) found during load.
2 – The load was successful, but the system found error in input that did not exceed
error threshold (-maxErrors), so good records were inserted.
The nzload command writes high-level errors to the terminal (stderr), nzlog file, and nzbad
file. You can specify the nzlog and nzbad filenames on the command line or through the
use of a control file. For more information, see “Using a Control File” on page 4-5.
Note: Periodically delete log files to free up disk space.
Specifying Options
Table C-1 shows how to enter the external table options when using the command line
method (used for nzload), in a control file, or as part of a SQL command.
Table C-1: Specifying External Table Options
C-1
Netezza Data Loading Guide
IncludeZeroSec- NA NA INCLUDEZEROSEC-
onds ONDS
SuspendMviews -suspendMviews NA NA
Tablename -t tablename NA
This section describes some important notices, trademarks, and compliance information.
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and ser-
vices currently available in your area. Any reference to an IBM product, program, or service
is not intended to state or imply that only that IBM product, program, or service may be
used. Any functionally equivalent product, program, or service that does not infringe any
IBM intellectual property right may be used instead. However, it is the user's responsibility
to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to: This information was developed for
products and services offered in the U.S.A.
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellec-
tual Property Department in your country or send inquiries, in writing, to:
IBM World Trade Asia Corporation
Licensing 2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
D-1
Netezza Data Loading Guide
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and distrib-
ute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the appli-
cation programming interface for the operating platform for which the sample programs are
written. These examples have not been thoroughly tested under all conditions. IBM, there-
fore, cannot guarantee or imply reliability, serviceability, or function of these programs.
Each copy or any portion of these sample programs or any derivative work, must include a
copyright notice as follows:
© your company name) (year). Portions of this code are derived from IBM Corp. Sample
Programs.
© Copyright IBM Corp. _enter the year or years_.
If you are viewing this information softcopy, the photographs and color illustrations may not
appear.
Trademarks
IBM, the IBM logo, ibm.com and Netezza are trademarks or registered trademarks of Inter-
national Business Machines Corporation in the United States, other countries, or both. If
these and other IBM trademarked terms are marked on their first occurrence in this infor-
mation with a trademark symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current
list of IBM trademarks is available on the Web at “Copyright and trademark information” at
ibm.com/legal/copytrade.shtml.
Adobe is a registered trademark of Adobe Systems Incorporated in the United States, and/
or other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corpo-
ration in the United States, other countries, or both.
NEC is a registered trademark of NEC Corporation.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United
States, other countries, or both.
Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United States and/or
other countries.
D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River, and the
Wind River logo are trademarks, registered trademarks, or service marks of Wind River Sys-
tems, Inc. Tornado patent pending.
APC and the APC logo are trademarks or registered trademarks of American Power Conver-
sion Corporation.
Other company, product or service names may be trademarks or service marks of others.
Deutschland: Einhaltung des Gesetzes über die elektromagnetische Verträglichkeit von Geräten
Dieses Produkt entspricht dem “Gesetz über die elektromagnetische Verträglichkeit von
Geräten (EMVG)”. Dies ist die Umsetzung der EU-Richtlinie 2004/108/EG in der Bundes-
republik Deutschland.
Zulassungsbescheinigung laut dem Deutschen Gesetz über die elektromagnetische Verträglichkeit von Geräten
(EMVG) (bzw. der EMC EG Richtlinie 2004/108/EG) für Geräte der Klasse A
Dieses Gerät ist berechtigt, in Übereinstimmung mit dem Deutschen EMVG das EG-Konfor-
mitätszeichen - CE - zu führen.
Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller:
International Business Machines Corp.
New Orchard Road
Armonk, New York 10504
914-499-1900
Der verantwortliche Ansprechpartner des Herstellers in der EU ist:
IBM Deutschland
Technical Regulations, Department M456
IBM-Allee 1, 71137 Ehningen, Germany
Telephone: +49 7032 15-2937
Email: tjahn@de.ibm.com
Generelle Informationen:
Das Gerät erfüllt die Schutzanforderungen nach EN 55024 und EN 55022 Klasse A.
This is a Class A product based on the standard of the Voluntary Control Council for Inter-
ference (VCCI). If this equipment is used in a domestic environment, radio interference
may occur, in which case the user may be required to take corrective actions.
This is electromagnetic wave compatibility equipment for business (Type A). Sellers and
users need to pay attention to it. This is for any areas other than home.
Index
A E
allowreplay 4-3, C-1 encoding 3-7, C-2
attributes errors
data 6-2 nzload handling B-4
escape C-2
escapechar 3-7, C-2
B external table
about 2-1
backup
backup and restore 2-4
external tables 2-4
displaying information 2-2
nzload B-2
examples 2-15
badfile 4-5, C-1
options 3-1
best practices
parsing 2-3
external tables 2-13
privileges 2-1
bigint, integer type 2-6
restrictions 2-13
boolstyle 3-3, C-1
byteint, integer type 2-6
F
C fileBufByteSize 4-3
filebufbytesize C-2
character strings
fileBufSize 4-3
char 2-10
filebufsize C-2
varchar 2-10
fillrecord 3-8, C-2
column constraint 2-10
fixed point 2-7
compress 3-4, C-1
floating point 2-8
compressed binary 1-2
format 3-8, C-2
concurrency 4-2
format options 6-2
control file
formatting, background 6-1
using 4-5
counting rows 3-13
CREATE EXTERNAL TABLE
dropping an external table 2-15 H
examples 2-15 host 4-3, C-2
crinstring 3-4, C-1
ctrlchars 3-4, C-1
I
ignorezero 3-8, C-2
D includezeroseconds 3-8, C-2
data attributes 6-2 integer, type 2-6
data loading
components 1-1
formats 1-2 L
data types
layout 3-9
fixed-point 2-7
definitions 6-4
floating-point 2-8
legal characters 3-15
integer 2-7
load continuation 3-15
supported 2-6
load. See also nzload
temporal 2-11
LOAD_LOG_MAX_FILESIZE 4-3
database C-1
LOAD_REPLAY_REGION 4-3, C-1
datafile 4-5, C-1
loading, success tips B-1
dataobject 3-4
log files 2-2
datedelim 3-5, C-1
logdir 3-9, C-2
datestyle 3-5, C-1
logfile 4-5, C-2
decimaldelim 1-2, 3-2, 3-6, 6-5
size C-2
delim C-1
logfilesize 4-3
delimiter 3-6, C-1
Index-1
Index
M bad 3-14
counting 3-13
matching input fields 3-14 input 3-14
MAX_QUERY_RESTARTS 4-3, C-1 skipping 3-11
maxerrors 3-9, C-2
maxrows 3-9, C-2
S
N session variables 3-16
skip rows 3-11
NOT NULL 3-10 skiprows C-2
nullvalue 3-9, C-2 smallint, integer type 2-6
numerics 2-6 socketbufsize 3-12, C-2
nzload command SQL grammar A-5
backup B-2 string versus non-string 3-14
boolStyle 4-2 supported data types 2-6
error reporting B-4 suspendmviews C-2
examples A-1
inputs 4-3
privileges 4-1
program invocation 4-2
T
tablename C-2
specifyng arguments A-1
syntax 4-2 temporal data types 2-11
textfixed, using 6-1
tips B-1
timedelim 3-12, C-2
uncommitted jobs 4-2
timeextrazeros C-3
using 4-1
nzmigrate 1-1 timeroundnanos 3-12, C-3
timestamp 2-12
nzreclaim command
timestyle 3-12, C-3
nzload jobs 4-2
timetz 2-12
transactions, nzload jobs 4-2
O troubleshooting B-1
truncstring 3-12
options
changed 6-3
external table 3-1
names C-1
U
new 6-2 unloading
examples 2-16
processing 3-3
options 5-1
unsupported 6-3
outputdir 4-3, C-2 remote client 5-2
P V
pipes A-2 value absence 3-14
privileges, load session 4-1
Y
Q y2base 3-13, C-3
quotedvalue 3-10, C-2
Z
R zone definition, default values 6-4
zones
recdelim C-2
default values 6-4
recorddelim 3-11
recordlength 3-11
references
examples A-4
remote client, unloading 5-2
remotesource 3-11, C-2
requirequotes 3-11, C-2
rows
Index-2