0% found this document useful (0 votes)
127 views

Informatica Cache Overview

The document discusses cache memory used by the Integration Service. It describes how the Integration Service creates caches based on configured size and may increase size if needed. It's best to configure cache size to the total memory needed to process transformations. The document also discusses cache files, naming conventions, directories, and calculating and configuring cache sizes.

Uploaded by

santhosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

Informatica Cache Overview

The document discusses cache memory used by the Integration Service. It describes how the Integration Service creates caches based on configured size and may increase size if needed. It's best to configure cache size to the total memory needed to process transformations. The document also discusses cache files, naming conventions, directories, and calculating and configuring cache sizes.

Uploaded by

santhosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Cache in general..

idwbitraining@gmail.com 1
Cache Memory
• The Integration Service creates each memory cache based on the configured cache
size.
• The Integration Service might increase the configured cache size for one of the
following reasons:
– The configured cache size is less than the minimum cache size required to process the
operation
– The configured cache size is not a multiple of the cache page size.
• For optimal performance, set the cache size to the total memory required to
process the transformation.
• If there is not enough cache memory to process the transformation, the
Integration Service processes some of the transformation in memory and pages
information to disk to process the rest.
• Remember that 64-bit operating systems allow usage of larger system
memory(RAM).
• It is imperative to have a thorough understanding of the server hardware
configuration to optimally configure cache

2 idwbitraining@gmail.com
Transformations using cache
• The following table describes the type of information that the Integration Service
stores in each cache:

Mapping Cache Description


Object Type
Aggregator - Index - Stores group values as configured in the group by ports.
- Data - Stores calculations based on the group by ports
Joiner - Index - Stores all master rows in the join condition that have unique keys.
- Data - Stores master source rows.
Lookup - Index - Stores lookup condition information.
- Data - Stores lookup data that is not stored in the index cache.
Rank - Index - Stores group values as configured in the group by ports.
- Data - Stores ranking information based on the group by ports.
Sorter - Sorter - Stores sort keys and data.

XML Target - Index - Stores primary and foreign key information in separate caches.
- Data - Stores XML row data while it generates the XML target.

3 idwbitraining@gmail.com
Cache files
• When you run a session, the Integration Service creates at least one cache file for
each transformation.
• If the Integration Service cannot process a transformation in memory, it writes the
overflow values to the cache files.
• When you run a session, the Integration Service writes a message in the session
log indicating the cache file name and the transformation name.
• When a session completes, the Integration Service releases cache memory and
usually deletes the cache files.
• You may find index and data cache files in the cache directory under the following
circumstances:
– The session performs incremental aggregation.
– You configure the Lookup transformation to use a persistent cache.
– The session does not complete successfully. The next time you run the session, the
Integration Service deletes the existing cache files and creates new ones.
• Note: Since writing to cache files can slow session performance, configure the cache sizes to
process the transformation in memory

4 idwbitraining@gmail.com
Naming convention for cache files
• The Integration Service uses different naming conventions for index, data, and
sorter cache files.

Cache File Naming Convention


Name
Data & Sorter [<Name Prefix> | <prefix> <session ID>_<transformation ID>]_[partition index]_[OS][BIT].<suffix>
[overflow index]
Index <prefix> <session id>_<transformation id>_<group id>_<key type>.<suffix> <overflow>

 For example, the name of the data file for the index cache is PMLKUP748_2_5S32.idx1.
PMLKUP identifies the transformation type as Lookup, 748 is the session ID, 2 is the
transformation ID, 5 is the partition index, S (Solaris) is the operating system, and 32 is
the bit platform.

5 idwbitraining@gmail.com
Naming convention for cache files
Prefix : Describes the type of transformation:
- Aggregator transformation is PMAGG.
- Joiner transformation is PMJNR.
- Lookup transformation is PMLKUP.
- Rank transformation is PMAGG.
- Sorter transformation is PMSORT.
- XML target is PMXML.

OS : Identifies the operating system of the machine running the Integration Service process:
- W is Windows.
- H is HP-UX.
- S is Solaris.
- A is AIX.
- L is Linux.
- M is Mainframe.
For Lookup transformation cache file.

idwbitraining@gmail.com 6
Naming convention for cache files
BIT : Identifies the bit platform of the machine running the Integration Service
process: 32-bit or 64-bit. For Lookup transformation cache file.

Suffix : Identifies the type of cache file:


- Index cache file is .idx0 for the header file and .idxn for the
data files.
- Data cache file is .dat0 for the header file and .datn for the
data files.
- Sorter cache file is .PMSORT().

Overflow Index : If a cache file handles more than 2 GB of data, the


Integration Service creates more cache files. When creating these files, the
Integration Service appends an overflow index to the file name, such as
PMAGG*.idx2 and PMAGG*.idx3. The number of cache files is limited by the
amount of disk space available in the cache directory.

idwbitraining@gmail.com 7
Cache file directory

• The Integration Service creates the cache files by default in the $PMCacheDir directory. If the
Integration Service process does not find the directory, it fails the session and writes a
message to the session log indicating that it could not create or open the cache file.

• The Integration Service may create multiple cache files. The number of cache files is limited
by the amount of disk space available in the cache directory.

• The $PMCacheDir can be overridden with specific folder names in the server for a specific
transformation.

8 idwbitraining@gmail.com
Configuring cache size
• Configure the amount of memory for a cache in the session properties. The cache size specified in
the session properties overrides the value set in the transformation properties.

• If the session is reusable, all instances of the session use the cache size configured in the reusable
session properties. You cannot override the cache size in the session instance.

• Use one of the following methods to configure a cache size:


– Cache calculator. Use the calculator to estimate the total amount of memory required to process the
transformation.
• Auto cache memory. Use auto memory to specify a maximum limit on the cache size that is allocated for processing the
transformation. Use this method if the machine on which the Integration Service process runs has limited cache memory.

• Numeric value. Configure a specific value for the cache size. Configure a specific value when you want to tune the cache
size.

• To configure the memory requirements for a transformation with cache partitioning, calculate the
total requirements for the transformation and divide by the number of partitions.

9 idwbitraining@gmail.com
Calculating cache size

• Use the cache calculator to estimate the total amount of memory required to process the
transformation. You must provide inputs to calculate the cache size. The inputs depend on
the type of transformation. For example, to calculate the cache size for an Aggregator
transformation, you supply the number of groups.

• You can select one of the following modes in the cache calculator:
– Auto. Choose auto mode if you want the Integration Service to determine the cache size at run time
based on the maximum memory configured on the Config Object tab.

– Calculate. Select to calculate the total requirements for a transformation based on inputs. The cache
calculator requires different inputs for each transformation. You must select the applicable cache
type to apply the calculated cache size. For example, to apply the calculated cache size for the data
cache and not the index cache, select only the Data Cache Size option.

10 idwbitraining@gmail.com
Using auto memory size
• If you use auto cache memory, you configure the Integration Service to determine
the cache size for a transformation at run time.
• The Integration Service allocates memory cache based on the maximum memory
size specified in the auto memory attributes in the session properties.
• When you configure a numeric value and a percentage for the auto cache memory,
the Integration Service compares the values and uses the lesser of the two for the
maximum memory limit.
• By default, transformations use auto cache memory
• If a session has multiple transformations that require caching, you can configure
some transformations with auto memory cache and other transformations with
numeric cache sizes.
• The Integration Service allocates the maximum memory specified for auto caching
in addition to the configured numeric cache sizes.

11 idwbitraining@gmail.com
Steps to configure cache size
1. In the Workflow Manager, open the session.
2. Click the Mapping tab.
3. Select the mapping object in the left pane. The right pane of the Mapping tab
shows the object properties where you can configure the cache size.
4. Use one of the following methods to set the cache size: Enter a value for the cache
size, click OK, and then skip to step 8. If you enter a value, all values are in bytes by
default. However, you can enter a value and specify one of the following units: KB,
MB, or GB. If you enter the units, do not enter a space between the value and unit.
For example, enter 350000KB, 200MB, or 1GB.
or
Enter ‘Auto’ for the cache size, click OK, and then skip to step 8.
or
Click the Open button to open the cache calculator.

12 idwbitraining@gmail.com
Steps to configure cache size
5. Select a mode. Select the Auto mode to limit the amount of cache allocated to the
transformation. Skip to step 8.
or
Select the Calculate mode to calculate the total memory requirement for the
transformation.
6. Provide the input based on the transformation type, and click Calculate.
Note: If the input value is too large and you cannot enter the value in the cache
calculator, use auto memory cache.
The cache calculator calculates the cache sizes in kilobytes.
7. If the transformation has a data cache and index cache, select Data Cache Size,
Index Cache Size, or both.
8. Click OK to apply the calculated values to the cache sizes you selected in step 7.

13 idwbitraining@gmail.com
Configuring cache size
• Sample cache size calculation for a lookup
Select calculate

Enter number of rows in lookup

Select the type of cache for which


you need to calculate size

Click calculate

The calculated value is displayed here

14 idwbitraining@gmail.com
Aggregator cache
• Cache is created for aggregator transformations with unsorted inputs.
• When using multiple partitions, one disk file is created for all partitions and
separate memory cache for each partition
• The first time you run an incremental aggregation session, the Integration Service
processes the source.
• At the end of the session, the Integration Service stores the aggregated data in two
cache files, the index and data cache files.
• The Integration Service saves the cache files in the cache file directory.
• The next time you run the session, the Integration Service aggregates the new
rows with the cached aggregated values in the cache files.
• To calculate aggregator cache size the following inputs are needed:
– Number of groups – Calculate using the group by ports
– Data movement mode – Each ASCII character used one byte while an unicode character
used two bytes

15 idwbitraining@gmail.com
Aggregator cache
Specify cache
directory here

Specify size of data


cache

Specify size of
index cache

16 idwbitraining@gmail.com
Joiner cache
• The Integration Service reads rows from the master and detail sources
concurrently and builds index and data caches based on the master rows.
• It performs the join based on the detail source data and the cached master data.
• The following information is stored in the Joiner cache.

Transformation Index cache Data cache


type
Unsorted inputs Stores all master rows in the join condition Stores all master rows.
with unique index keys.
Sorted inputs with Stores 100 master rows in the join Stores master rows that correspond to the rows
different sources condition with unique index keys. stored in the index cache. If the master data
contains multiple rows with the same key, the
Integration Service stores more than 100 rows in the
data cache.
Sorted inputs with Stores all master or detail rows in the join Stores data for the rows stored in the index cache
same source condition with unique keys. Stores detail
rows if the Integration Service processes
the detail pipeline faster than the master
pipeline.

17 idwbitraining@gmail.com
Joiner cache
Specify cache
directory here

Specify size of data


cache

Specify size of
index cache

18 idwbitraining@gmail.com
Rank cache

• When the Integration Service runs a session with a Rank transformation, it compares an
input row with rows in the data cache. If the input row out-ranks a stored row, the
Integration Service replaces the stored row with the input row.

• To calculate rank cache size, the following inputs are needed


– Number of groups – The number of groups using the group by ports

– Number of ranks - For example, if you want to rank the top 10 sales, you have 10 ranks. The
cache calculator populates this value based on the value set in the Rank transformation.

– Data movement mode - Each ASCII character used one byte while an unicode character used
two bytes

• Enter the input and then click Calculate to calculate the data and index cache sizes. The
calculated values appear in the Data Cache Size and Index Cache Size fields.

19 idwbitraining@gmail.com
Rank cache
Specify cache
directory here

Specify size of data


cache

Specify size of
index cache

20 idwbitraining@gmail.com
Sorter cache
• All rows are passed to cache before applying the sort function
• The Integration Service makes multiple passes on the data when it has to page
information to disk to complete the sort.
• To increase session performance, configure the cache size so that the Integration
Service makes one pass on the data.
• To calculate sorter cache size the following inputs are needed:
– Number of rows
– Data movement mode - Each ASCII character used one byte while an unicode character
used two bytes

21 idwbitraining@gmail.com
Sorter cache
Specify sorter
cache size here.
There are no
index and data
cache for a sorter

Specify cache
directory here

22 idwbitraining@gmail.com
XML Target cache
• The Integration Service uses cache memory to create an XML target. The
Integration Service stores the data and XML hierarchies in cache memory while it
generates the XML target.
• The following steps illustrates the process to determine cache size
– Estimate the number of rows in each group
– Use the following formula to calculate the cache size for each group:
• Group cache size = Data cache size + Primary key index cache size + Foreign key index cache size
– Use the following formula to calculate the total cache size:
• Total cache size = Σ(Cache size of all groups)

• You cannot use the cache calculator to configure the cache size for an XML target.

Cache Calculation
Data cache = (Number of rows in a group) X (Row size of the group)
Primary Key Tree Size = (Number of rows in a group) X (Primary key index cache size)

Foreign Key Tree Size = Σ ((Number of rows in parent group) X (Foreign key index cache size))

23 idwbitraining@gmail.com
Lookup cache - overview

• Lookup transformations can be configured to use cache.

• The Integration Service builds the cache in memory when the first row is
processed. If the memory is inadequate, the data is paged into a cache file.

• If you use a flat file lookup, the Integration Service always caches the lookup rows.

• By default, the cache files are created under $PMCacheDir.

• Cache if the number (and size) of records in the Lookup table is small relative to
the number of mapping rows requiring the lookup.

24 idwbitraining@gmail.com
Lookup cache - Types
• There are two types of lookup caches – Static and Dynamic

Un-cached Static cache Dynamic cache


The lookup table is queried each Cannot insert/update the cache once Can insert/update rows in the cache for each
time. created row from source (previous widget)
Cannot use flat file as lookup Can use relational and flat file lookups Can use relational and flat file lookups
source
When the condition matches, When the condition matches, lookup When the condition matches, rows are
lookup returns a row returns a row updated in the cache or left unchanged
depending on the row type
If the condition is false, the If the condition is false, the default value When the condition is false, rows are
default value is returned for is returned for connected and NULL is updated in the cache or left unchanged
connected and NULL is returned returned for unconnected lookups depending on the row type
for unconnected lookups

25 idwbitraining@gmail.com
Lookup cache – for connected
• The Integration Service can build cache for connected lookups in two ways
• Sequential cache: The Integration Service builds the cache in memory when it processes the
first row of the data in a cached lookup transformation. It waits for upstream transformations
to complete before building a cache.
• Concurrent cache: The Integration Service does not wait for upstream active transformations
to complete. It starts building the cache as soon as session starts. This may improve
performance if you are sure that the cache is needed each time the mapping is run.
• For example: if the transformation logic in a mapping is configured to route data to different
pipelines, the downstream lookup might not be hit each time. In this case, it is advisable to
go for sequential cache.
• Unconnected lookup caches cannot be processed concurrently.

26 idwbitraining@gmail.com
Lookup cache: Static

• This is the default type of cache.

• Cache is built when the first lookup row is processed.

• For each row that passes the transformation, the cache is queried for specified
condition.

• If a match is available, the proper value is returned.

• If a match is not available either default value (for connected lookups only) or
NULL is returned.

• If multiple matches are found, rows are returned based on the option specified in
“Lookup policy on multiple match” in the lookup properties.

27 idwbitraining@gmail.com
Lookup cache: Dynamic

• The cache file is constantly updated by the following actions

• Insert - Inserts the row into the cache if it is not present and you specified to insert
rows. You can configure to insert rows into cache based on input ports or
generated sequence IDs.

• Update – updates the row in cache if the row is already present and an update is
specified in the properties

• No change:
– Row does not exist in cache, but you have specified to only insert new rows

– Row does not exist in cache, but you have specified update existing rows only

– Row exists in the cache, but based on the lookup conditions nothing changes

28 idwbitraining@gmail.com
Lookup cache – dynamic – when to use

• Some situations where dynamic lookups can be used

• Updating a master customer table with new and updated customer information.
– Use a Lookup transformation to perform a lookup on the customer table to determine if
a customer exists in the target. Use a dynamic lookup cache that inserts and updates
rows in the cache as it passes rows to the target.

• Loading data into a slowly changing dimension table and a fact table.
– Load data into a slowly changing dimension table and a fact table. Create two pipelines
and configure a Lookup transformation that performs a lookup on the dimension table.
Use a dynamic lookup cache to load data to the dimension table. Use a static lookup
cache to load data to the fact table, and specify the name of the dynamic cache from the
first pipeline.

29 idwbitraining@gmail.com
Lookup cache – dynamic – properties
• Dynamic lookup cache consists of the following properties
Property Description
NewLookupRow This port is added when the lookup is configured as dynamic. 0=No change, 1=insert, 2=update
Associated port The data in the associated port is used to determine whether to insert/update rows in cache. A
sequence id can also be used as associated port wherein Informatica generates and uses a
primary key
Ignore Null Inputs for This port is selected when you do not want to update the data in cache when this column is
Updates NULL
Ignore in Comparison The Integration Service compares the values in all lookup ports with the values in their
associated input ports by default. Select this property if you want the Integration Service to
ignore the port when it compares values before updating a row.
Insert else Update This affects only rows that enters the lookup transformation flagged as insert. Inserts a row into
cache if it is new. If the row exists in index cache, but the data cache is different, then it updates
the cache. If this option is not selected, Informatica inserts all new rows and ignores update
rows.
Update else Insert This affects only rows that enter the lookup transformation flagged as update. If the row exists
in cache, Informatica updates the data cache. If a row does not exist in cache, it inserts a new
row. If this option is not selected, Informatica updates rows in cache and ignores new rows

30 idwbitraining@gmail.com
Lookup cache – dynamic - behavior
• Dynamic lookup cache behavior for insert row type
Insert else update Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes n/a No change 0
No n/a Insert 1
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1

 Dynamic lookup cache behavior for update row type


Update else insert Row found in cache Data cache is different Lookup cache result NewLookupRow
option value
Not selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a No change 0
selected Yes Yes Update 2 (0)
Yes No No change 0
No n/a Insert 1

31 idwbitraining@gmail.com
Lookup cache – dynamic - guidelines
• The Lookup transformation must be a connected transformation.
• You can only create an equality lookup condition. You cannot look up a range of data in
dynamic cache.
• Associate each lookup port that is not in the lookup condition with an input port or a
sequence ID.
• When you use a lookup SQL override, make sure you map the correct columns to the
appropriate targets for lookup.
• When you add a WHERE clause to the lookup SQL override, use a Filter transformation before
the Lookup transformation.
• Use Update Strategy transformations after the Lookup transformation to flag the rows for
insert or update for the target.
• Use an Update Strategy transformation before the Lookup transformation to define some or
all rows as update if you want to use the Update Else Insert property in the Lookup
transformation.
• Set the row type to Data Driven in the session properties.
• Select Insert and Update as Update for the target table options in the session properties.

32 idwbitraining@gmail.com
Lookup cache – sharing unnamed cache

• When two Lookup transformations share an unnamed cache, the Integration


Service saves the cache for a Lookup transformation and uses it for subsequent
Lookup transformations that have the same lookup cache structure.

• For example, if you have two instances of the same reusable Lookup
transformation in one mapping and you use the same output ports for both
instances, the Lookup transformations share the lookup cache by default

• Shared transformations must use the same ports in the lookup condition. The
conditions can use different operators, but the ports must be the same.

33 idwbitraining@gmail.com
Lookup cache – sharing named cache

• You can also share the cache between multiple Lookup transformations by using a
persistent lookup cache and naming the cache files.

• When the Integration Service processes the first Lookup transformation, it


searches the cache directory for cache files with the same file name prefix.

• If the Integration Service finds the cache files and you do not specify to recache
from source, the Integration Service uses the saved cache files.

• If the Integration Service does not find the cache files or if you specify to recache
from source, the Integration Service builds the lookup cache us.

• The Integration Service saves the cache files to disk after it processes each target
load order.

34 idwbitraining@gmail.com
Lookup cache – sharing named cache

• The Integration Service fails the session if you configure subsequent Lookup transformations
to recache from source, but not the first one in the same target load order group.

• If the cache structures do not match, the Integration Service fails the session.

• The Integration Service processes multiple sessions simultaneously when the Lookup
transformations only need to read the cache files.

• The Integration Service fails the session if one session updates a cache file while another
session attempts to read or update the cache file.
– For example, Lookup transformations update the cache file if they are configured to use a dynamic
cache or recache from source.

35 idwbitraining@gmail.com
Lookup cache - Tips
• Cache small lookup tables.
• Improve session performance by caching small lookup tables. The result of the
lookup query and processing is the same, whether or not you cache the lookup
table.
• Use a persistent lookup cache for static lookup tables.
• If the lookup table does not change between sessions, configure the Lookup
transformation to use a persistent lookup cache.
• The Integration Service then saves and reuses cache files from session to session,
eliminating the time required to read the lookup table.
• Care should be taken to ensure that data does not become stale while using
persistent cache.
– For example: in a daily load, always cache a persistent lookup first (using re-cache from
source option), before they are used in other mappings. It is a good idea to re-cache a
persistent lookup in order to match any changes in the lookup table

36 idwbitraining@gmail.com
Lookup cache
Enable caching

Cache directory

Using persistent cache

Data cache size

Index cache size

Dynamic lookup

Naming a persistent cache

Recache for persistent cache

Dynamic lookup options

37 idwbitraining@gmail.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy