Informatica Cache Overview
Informatica Cache Overview
idwbitraining@gmail.com 1
Cache Memory
• The Integration Service creates each memory cache based on the configured cache
size.
• The Integration Service might increase the configured cache size for one of the
following reasons:
– The configured cache size is less than the minimum cache size required to process the
operation
– The configured cache size is not a multiple of the cache page size.
• For optimal performance, set the cache size to the total memory required to
process the transformation.
• If there is not enough cache memory to process the transformation, the
Integration Service processes some of the transformation in memory and pages
information to disk to process the rest.
• Remember that 64-bit operating systems allow usage of larger system
memory(RAM).
• It is imperative to have a thorough understanding of the server hardware
configuration to optimally configure cache
2 idwbitraining@gmail.com
Transformations using cache
• The following table describes the type of information that the Integration Service
stores in each cache:
XML Target - Index - Stores primary and foreign key information in separate caches.
- Data - Stores XML row data while it generates the XML target.
3 idwbitraining@gmail.com
Cache files
• When you run a session, the Integration Service creates at least one cache file for
each transformation.
• If the Integration Service cannot process a transformation in memory, it writes the
overflow values to the cache files.
• When you run a session, the Integration Service writes a message in the session
log indicating the cache file name and the transformation name.
• When a session completes, the Integration Service releases cache memory and
usually deletes the cache files.
• You may find index and data cache files in the cache directory under the following
circumstances:
– The session performs incremental aggregation.
– You configure the Lookup transformation to use a persistent cache.
– The session does not complete successfully. The next time you run the session, the
Integration Service deletes the existing cache files and creates new ones.
• Note: Since writing to cache files can slow session performance, configure the cache sizes to
process the transformation in memory
4 idwbitraining@gmail.com
Naming convention for cache files
• The Integration Service uses different naming conventions for index, data, and
sorter cache files.
For example, the name of the data file for the index cache is PMLKUP748_2_5S32.idx1.
PMLKUP identifies the transformation type as Lookup, 748 is the session ID, 2 is the
transformation ID, 5 is the partition index, S (Solaris) is the operating system, and 32 is
the bit platform.
5 idwbitraining@gmail.com
Naming convention for cache files
Prefix : Describes the type of transformation:
- Aggregator transformation is PMAGG.
- Joiner transformation is PMJNR.
- Lookup transformation is PMLKUP.
- Rank transformation is PMAGG.
- Sorter transformation is PMSORT.
- XML target is PMXML.
OS : Identifies the operating system of the machine running the Integration Service process:
- W is Windows.
- H is HP-UX.
- S is Solaris.
- A is AIX.
- L is Linux.
- M is Mainframe.
For Lookup transformation cache file.
idwbitraining@gmail.com 6
Naming convention for cache files
BIT : Identifies the bit platform of the machine running the Integration Service
process: 32-bit or 64-bit. For Lookup transformation cache file.
idwbitraining@gmail.com 7
Cache file directory
• The Integration Service creates the cache files by default in the $PMCacheDir directory. If the
Integration Service process does not find the directory, it fails the session and writes a
message to the session log indicating that it could not create or open the cache file.
• The Integration Service may create multiple cache files. The number of cache files is limited
by the amount of disk space available in the cache directory.
• The $PMCacheDir can be overridden with specific folder names in the server for a specific
transformation.
8 idwbitraining@gmail.com
Configuring cache size
• Configure the amount of memory for a cache in the session properties. The cache size specified in
the session properties overrides the value set in the transformation properties.
• If the session is reusable, all instances of the session use the cache size configured in the reusable
session properties. You cannot override the cache size in the session instance.
• Numeric value. Configure a specific value for the cache size. Configure a specific value when you want to tune the cache
size.
• To configure the memory requirements for a transformation with cache partitioning, calculate the
total requirements for the transformation and divide by the number of partitions.
9 idwbitraining@gmail.com
Calculating cache size
• Use the cache calculator to estimate the total amount of memory required to process the
transformation. You must provide inputs to calculate the cache size. The inputs depend on
the type of transformation. For example, to calculate the cache size for an Aggregator
transformation, you supply the number of groups.
• You can select one of the following modes in the cache calculator:
– Auto. Choose auto mode if you want the Integration Service to determine the cache size at run time
based on the maximum memory configured on the Config Object tab.
– Calculate. Select to calculate the total requirements for a transformation based on inputs. The cache
calculator requires different inputs for each transformation. You must select the applicable cache
type to apply the calculated cache size. For example, to apply the calculated cache size for the data
cache and not the index cache, select only the Data Cache Size option.
10 idwbitraining@gmail.com
Using auto memory size
• If you use auto cache memory, you configure the Integration Service to determine
the cache size for a transformation at run time.
• The Integration Service allocates memory cache based on the maximum memory
size specified in the auto memory attributes in the session properties.
• When you configure a numeric value and a percentage for the auto cache memory,
the Integration Service compares the values and uses the lesser of the two for the
maximum memory limit.
• By default, transformations use auto cache memory
• If a session has multiple transformations that require caching, you can configure
some transformations with auto memory cache and other transformations with
numeric cache sizes.
• The Integration Service allocates the maximum memory specified for auto caching
in addition to the configured numeric cache sizes.
11 idwbitraining@gmail.com
Steps to configure cache size
1. In the Workflow Manager, open the session.
2. Click the Mapping tab.
3. Select the mapping object in the left pane. The right pane of the Mapping tab
shows the object properties where you can configure the cache size.
4. Use one of the following methods to set the cache size: Enter a value for the cache
size, click OK, and then skip to step 8. If you enter a value, all values are in bytes by
default. However, you can enter a value and specify one of the following units: KB,
MB, or GB. If you enter the units, do not enter a space between the value and unit.
For example, enter 350000KB, 200MB, or 1GB.
or
Enter ‘Auto’ for the cache size, click OK, and then skip to step 8.
or
Click the Open button to open the cache calculator.
12 idwbitraining@gmail.com
Steps to configure cache size
5. Select a mode. Select the Auto mode to limit the amount of cache allocated to the
transformation. Skip to step 8.
or
Select the Calculate mode to calculate the total memory requirement for the
transformation.
6. Provide the input based on the transformation type, and click Calculate.
Note: If the input value is too large and you cannot enter the value in the cache
calculator, use auto memory cache.
The cache calculator calculates the cache sizes in kilobytes.
7. If the transformation has a data cache and index cache, select Data Cache Size,
Index Cache Size, or both.
8. Click OK to apply the calculated values to the cache sizes you selected in step 7.
13 idwbitraining@gmail.com
Configuring cache size
• Sample cache size calculation for a lookup
Select calculate
Click calculate
14 idwbitraining@gmail.com
Aggregator cache
• Cache is created for aggregator transformations with unsorted inputs.
• When using multiple partitions, one disk file is created for all partitions and
separate memory cache for each partition
• The first time you run an incremental aggregation session, the Integration Service
processes the source.
• At the end of the session, the Integration Service stores the aggregated data in two
cache files, the index and data cache files.
• The Integration Service saves the cache files in the cache file directory.
• The next time you run the session, the Integration Service aggregates the new
rows with the cached aggregated values in the cache files.
• To calculate aggregator cache size the following inputs are needed:
– Number of groups – Calculate using the group by ports
– Data movement mode – Each ASCII character used one byte while an unicode character
used two bytes
15 idwbitraining@gmail.com
Aggregator cache
Specify cache
directory here
Specify size of
index cache
16 idwbitraining@gmail.com
Joiner cache
• The Integration Service reads rows from the master and detail sources
concurrently and builds index and data caches based on the master rows.
• It performs the join based on the detail source data and the cached master data.
• The following information is stored in the Joiner cache.
17 idwbitraining@gmail.com
Joiner cache
Specify cache
directory here
Specify size of
index cache
18 idwbitraining@gmail.com
Rank cache
• When the Integration Service runs a session with a Rank transformation, it compares an
input row with rows in the data cache. If the input row out-ranks a stored row, the
Integration Service replaces the stored row with the input row.
– Number of ranks - For example, if you want to rank the top 10 sales, you have 10 ranks. The
cache calculator populates this value based on the value set in the Rank transformation.
– Data movement mode - Each ASCII character used one byte while an unicode character used
two bytes
• Enter the input and then click Calculate to calculate the data and index cache sizes. The
calculated values appear in the Data Cache Size and Index Cache Size fields.
19 idwbitraining@gmail.com
Rank cache
Specify cache
directory here
Specify size of
index cache
20 idwbitraining@gmail.com
Sorter cache
• All rows are passed to cache before applying the sort function
• The Integration Service makes multiple passes on the data when it has to page
information to disk to complete the sort.
• To increase session performance, configure the cache size so that the Integration
Service makes one pass on the data.
• To calculate sorter cache size the following inputs are needed:
– Number of rows
– Data movement mode - Each ASCII character used one byte while an unicode character
used two bytes
21 idwbitraining@gmail.com
Sorter cache
Specify sorter
cache size here.
There are no
index and data
cache for a sorter
Specify cache
directory here
22 idwbitraining@gmail.com
XML Target cache
• The Integration Service uses cache memory to create an XML target. The
Integration Service stores the data and XML hierarchies in cache memory while it
generates the XML target.
• The following steps illustrates the process to determine cache size
– Estimate the number of rows in each group
– Use the following formula to calculate the cache size for each group:
• Group cache size = Data cache size + Primary key index cache size + Foreign key index cache size
– Use the following formula to calculate the total cache size:
• Total cache size = Σ(Cache size of all groups)
• You cannot use the cache calculator to configure the cache size for an XML target.
Cache Calculation
Data cache = (Number of rows in a group) X (Row size of the group)
Primary Key Tree Size = (Number of rows in a group) X (Primary key index cache size)
Foreign Key Tree Size = Σ ((Number of rows in parent group) X (Foreign key index cache size))
23 idwbitraining@gmail.com
Lookup cache - overview
• The Integration Service builds the cache in memory when the first row is
processed. If the memory is inadequate, the data is paged into a cache file.
• If you use a flat file lookup, the Integration Service always caches the lookup rows.
• Cache if the number (and size) of records in the Lookup table is small relative to
the number of mapping rows requiring the lookup.
24 idwbitraining@gmail.com
Lookup cache - Types
• There are two types of lookup caches – Static and Dynamic
25 idwbitraining@gmail.com
Lookup cache – for connected
• The Integration Service can build cache for connected lookups in two ways
• Sequential cache: The Integration Service builds the cache in memory when it processes the
first row of the data in a cached lookup transformation. It waits for upstream transformations
to complete before building a cache.
• Concurrent cache: The Integration Service does not wait for upstream active transformations
to complete. It starts building the cache as soon as session starts. This may improve
performance if you are sure that the cache is needed each time the mapping is run.
• For example: if the transformation logic in a mapping is configured to route data to different
pipelines, the downstream lookup might not be hit each time. In this case, it is advisable to
go for sequential cache.
• Unconnected lookup caches cannot be processed concurrently.
26 idwbitraining@gmail.com
Lookup cache: Static
• For each row that passes the transformation, the cache is queried for specified
condition.
• If a match is not available either default value (for connected lookups only) or
NULL is returned.
• If multiple matches are found, rows are returned based on the option specified in
“Lookup policy on multiple match” in the lookup properties.
27 idwbitraining@gmail.com
Lookup cache: Dynamic
• Insert - Inserts the row into the cache if it is not present and you specified to insert
rows. You can configure to insert rows into cache based on input ports or
generated sequence IDs.
• Update – updates the row in cache if the row is already present and an update is
specified in the properties
• No change:
– Row does not exist in cache, but you have specified to only insert new rows
– Row does not exist in cache, but you have specified update existing rows only
– Row exists in the cache, but based on the lookup conditions nothing changes
28 idwbitraining@gmail.com
Lookup cache – dynamic – when to use
• Updating a master customer table with new and updated customer information.
– Use a Lookup transformation to perform a lookup on the customer table to determine if
a customer exists in the target. Use a dynamic lookup cache that inserts and updates
rows in the cache as it passes rows to the target.
• Loading data into a slowly changing dimension table and a fact table.
– Load data into a slowly changing dimension table and a fact table. Create two pipelines
and configure a Lookup transformation that performs a lookup on the dimension table.
Use a dynamic lookup cache to load data to the dimension table. Use a static lookup
cache to load data to the fact table, and specify the name of the dynamic cache from the
first pipeline.
29 idwbitraining@gmail.com
Lookup cache – dynamic – properties
• Dynamic lookup cache consists of the following properties
Property Description
NewLookupRow This port is added when the lookup is configured as dynamic. 0=No change, 1=insert, 2=update
Associated port The data in the associated port is used to determine whether to insert/update rows in cache. A
sequence id can also be used as associated port wherein Informatica generates and uses a
primary key
Ignore Null Inputs for This port is selected when you do not want to update the data in cache when this column is
Updates NULL
Ignore in Comparison The Integration Service compares the values in all lookup ports with the values in their
associated input ports by default. Select this property if you want the Integration Service to
ignore the port when it compares values before updating a row.
Insert else Update This affects only rows that enters the lookup transformation flagged as insert. Inserts a row into
cache if it is new. If the row exists in index cache, but the data cache is different, then it updates
the cache. If this option is not selected, Informatica inserts all new rows and ignores update
rows.
Update else Insert This affects only rows that enter the lookup transformation flagged as update. If the row exists
in cache, Informatica updates the data cache. If a row does not exist in cache, it inserts a new
row. If this option is not selected, Informatica updates rows in cache and ignores new rows
30 idwbitraining@gmail.com
Lookup cache – dynamic - behavior
• Dynamic lookup cache behavior for insert row type
Insert else update Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes n/a No change 0
No n/a Insert 1
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1
31 idwbitraining@gmail.com
Lookup cache – dynamic - guidelines
• The Lookup transformation must be a connected transformation.
• You can only create an equality lookup condition. You cannot look up a range of data in
dynamic cache.
• Associate each lookup port that is not in the lookup condition with an input port or a
sequence ID.
• When you use a lookup SQL override, make sure you map the correct columns to the
appropriate targets for lookup.
• When you add a WHERE clause to the lookup SQL override, use a Filter transformation before
the Lookup transformation.
• Use Update Strategy transformations after the Lookup transformation to flag the rows for
insert or update for the target.
• Use an Update Strategy transformation before the Lookup transformation to define some or
all rows as update if you want to use the Update Else Insert property in the Lookup
transformation.
• Set the row type to Data Driven in the session properties.
• Select Insert and Update as Update for the target table options in the session properties.
32 idwbitraining@gmail.com
Lookup cache – sharing unnamed cache
• For example, if you have two instances of the same reusable Lookup
transformation in one mapping and you use the same output ports for both
instances, the Lookup transformations share the lookup cache by default
• Shared transformations must use the same ports in the lookup condition. The
conditions can use different operators, but the ports must be the same.
33 idwbitraining@gmail.com
Lookup cache – sharing named cache
• You can also share the cache between multiple Lookup transformations by using a
persistent lookup cache and naming the cache files.
• If the Integration Service finds the cache files and you do not specify to recache
from source, the Integration Service uses the saved cache files.
• If the Integration Service does not find the cache files or if you specify to recache
from source, the Integration Service builds the lookup cache us.
• The Integration Service saves the cache files to disk after it processes each target
load order.
34 idwbitraining@gmail.com
Lookup cache – sharing named cache
• The Integration Service fails the session if you configure subsequent Lookup transformations
to recache from source, but not the first one in the same target load order group.
• If the cache structures do not match, the Integration Service fails the session.
• The Integration Service processes multiple sessions simultaneously when the Lookup
transformations only need to read the cache files.
• The Integration Service fails the session if one session updates a cache file while another
session attempts to read or update the cache file.
– For example, Lookup transformations update the cache file if they are configured to use a dynamic
cache or recache from source.
35 idwbitraining@gmail.com
Lookup cache - Tips
• Cache small lookup tables.
• Improve session performance by caching small lookup tables. The result of the
lookup query and processing is the same, whether or not you cache the lookup
table.
• Use a persistent lookup cache for static lookup tables.
• If the lookup table does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache.
• The Integration Service then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.
• Care should be taken to ensure that data does not become stale while using
persistent cache.
– For example: in a daily load, always cache a persistent lookup first (using re-cache from
source option), before they are used in other mappings. It is a good idea to re-cache a
persistent lookup in order to match any changes in the lookup table
36 idwbitraining@gmail.com
Lookup cache
Enable caching
Cache directory
Dynamic lookup
37 idwbitraining@gmail.com