0% found this document useful (0 votes)
133 views

Performance Redis

The document discusses optimizing performance for HCL Commerce. It describes considering performance throughout the project lifecycle from design to testing. Key aspects include sufficient hardware resources, configuration settings, handling customer requests efficiently, and using tools to monitor and improve performance.

Uploaded by

raghunandhan.cv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views

Performance Redis

The document discusses optimizing performance for HCL Commerce. It describes considering performance throughout the project lifecycle from design to testing. Key aspects include sufficient hardware resources, configuration settings, handling customer requests efficiently, and using tools to monitor and improve performance.

Uploaded by

raghunandhan.cv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 119

Performance

Topics in the Performance section describe the means by which to plan, implement, test, and re-
visit the optimization of HCL Commerce site performance.

HCL Commerce works closely with a number of different components and


products, each with its own performance characteristics. Provision of
sufficient hardware and software resources, and careful consideration to
configuration settings, can help achieve performance objectives.

Performance objectives include handling the following types of requests in


a timely manner:

 Handling multiple customer requests


 Accessing data in the HCL Commerce database
 Formatting data as web pages
 Returning responses to the customer's browser

Too frequently, the idea of performance testing is equated with adding a


test phase, believing that performance risks can be covered by running a
performance test right before launch. A test phase might discover
bottlenecks in the application, but the fix might require a solution that goes
back to the fundamentals of application design and architecture.
Considering performance only at the end of development leads to not being
able to address issues on time or having to patch and get by. In the long
term, the "band-aid" mode leads to an increasingly complex application that
becomes more difficult to manage and optimize. The point of the
performance lifecycle is about considering performance in all aspects of the
project, not just as a one-time event.

The premise of the performance lifecycle is that performance is considered


throughout the entire project lifecycle. Adapting the performance lifecycle
approach shifts performance from a point in time activity to a parallel and
equally important track alongside a project’s development activities.
Additionally, achieving pre-defined performance criteria becomes part of
the website or application launch approval process. The key performance
metrics are continually addressed and tracked against targets.

To optimize HCL Commerce performance, review the performance


methodology section for the recommended approach.

 Methodology
Like site security, a proper approach to site performance requires planning, execution,
testing, monitoring, and maintenance. The following collections of topics describe each
phase of this process, and contain the recommended considerations, and best practices
for getting the most out of your HCL Commerce deployment.
 Setting cache providers in HCL Commerce
Starting with HCL Commerce Version 9.1, all caches are automatically configured to use
the HCL Cache. However, the cache provider for each cache can be set to use the
DynaCache provider, or the WebSphere Extreme Scale DynaCache provider.
 HCL Cache
The HCL Cache integrates with DynaCache and improves performance and scalability
with features such as remote caching, real-time monitoring, and auto-tuning capabilities.
 Experiments considerations
When experiments that you have created in WebSphere Commerce Accelerator are
running in the store, the Scheduler launches a job that determines whether any of the
current experiments have expired. The job compares the expiration date specified in the
experiment to the current system date. When an experiment is identified as expired, its
status is updated in the database, and the experiment is prevented from displaying to
customers.
 Website performance tuning
There are four steps for evaluating performance of an HCL Commerce website that is
based on Transaction server.
 Tuning the price rule object cache
You can improve performance by changing the size of the cache used to store price rule
business objects during storefront runtime.
 Data Load utility performance tuning
Scheduled Data Load utility jobs can affect HCL Commerce performance. You can
reduce the impact of this process by tuning the data load performance appropriately for
your implementation.
 Workspaces performance tuning
Workspaces use database views instead of tables to retrieve data. Retrieval of
underlying data might be more time-consuming because of the complexity of SQL
statements that are used in workspace view definitions.
 Configuring custom DynaCache objects in WebSphere Liberty
The WebSphere Liberty profile, can be configured with distributed map for the custom
cache objects. To configure the custom cache objects for WebSphere Liberty in Runtime
and Development, follow the steps outlined below:
 Configuring custom DynaCache objects in HCL Commerce
Define the custom cache in WebSphere Application Server. You can then use the cache
name to store and retrieve the cache objects.
 Preparing HCL Commerce for the peak shopping periods
When preparing for peak shopping periods, such as the holiday season, use the following
best practices to effectively manage your HCL Commerce environment for peak
efficiency, and to ensure that your HCL Commerce site is ready to handle the increased
load.
 OneTest Performance sample script for Emerald B2C Store
HCL OneTest provides software testing tools to support a DevOps approach: API testing,
functional testing, UI testing, performance testing and service virtualization. This
approach aids in test automation, for earlier and more frequent discovery of errors.
 Docker performance tuning
The following performance tuning information can be used to help you configure your
containerized HCL Commerce environments for optimal performance.
 Thread Monitor - Yaml configuration file
The Thead Monitor tool gathers thread dumps and Javacores at a configured interval,
and during events such as high WebContainer/Default_Executor pool thread
usage. Yaml configuration file must exist with
name /SETUP/support/thread_monitor.yaml, or under the location specified with
the THREAD_MONITOR_CFG environment variable.
 Search health check
The search health check API is used to regularly check the status of Commerce
containers to make sure they are in a healthy state.
 Elasticsearch performance tuning
You have numerous options when tuning the performance of NiFi and Elasticsearch. The
following guide introduces tools for monitoring performance and validating key tuning
parameters, and provides a performance tuning strategy that you can use with the
component.
 Optimizing index build and overall flow
Optmizations are provided full index builds and for tuning parameters. Potential
improvements that can be implimented for Near Real Time (NRT) index updates are not
described.
 Monitoring and understanding Elasticsearch and NiFi metrics
You can use Grafana and related tools to analyze the performance of the Ingest pipeline,
and Kibana to do the same with Elasticsearch.
 Tunable Parameters in the setup of NiFi and Elasticsearch
How you can modify the values for the tunable parameters, and some default values and
how they can be improved in different circumstances.
 Elasticsearch scaling and hardware requirements
You can achieve additional capacity by implementing or extending pod clustering on
Elasticsearch, NiFi, or both. You can also consider the hardware footprint and key
resources that impact the processing and index creation speed.
 Elasticsearch with dedicated Auth and Live nodes
For production environments, Elasticsearch is recommended in a clustered configuration
for high availability and scalability considerations.
 Minimal and recommended configuration tunable parameter values
Two configurationsof the NiFi tunable parameter values are presented, a minimal and an
optimal value.
 Measurement
As each deployment of HCL Commerce is different, a measured approach to modifying
your defaults must be taken to successfully tune your site. The following collection of
topics provides details on how to measure and troubleshoot site performance.
 Emerald REST Caching On TS Server for Commerce 9.1
Emerald Store is powered by the REST framework, and the cache is implemented using
a REST servlet.

Methodology
Like site security, a proper approach to site performance requires planning, execution, testing,
monitoring, and maintenance. The following collections of topics describe each phase of this
process, and contain the recommended considerations, and best practices for getting the most
out of your HCL Commerce deployment.

 Design phase
During the design phase of your project, it is important to consider performance priorities.
It is important to design a solution that has a good balance of features and performance
consideration, build a performance test environment, and begin to deploy tools that are
required to support your performance strategy. It is also important to identify architecture
limitations that might change a non-functional requirement.
 Development phase
During the development phase, it is important to measure the response time of business
critical steps in the development environment.
 Testing phase
During the testing phase, it is important to start with simple, common test cases and
gradually go to more complex scenarios and environment configurations. Other priorities
include measuring the performance of the application as load scales up to projected peak
and identifying and resolving the root causes of resource constraints and application
bottlenecks.
 Maintenance phase
Post-launch, during the maintenance phase, instrument and monitor performance
indicators in the production environment to enable the team to proactively identify and
address potential issues before users are affected. Priorities also include fostering
communication between marketing, performance, and operation teams to be better
prepared for promotional events. And using the data that is captured from production to
optimize planning for future marketing and promotional activities.
 Web server considerations
Even though the web server is a mature technology, it can still be tuned to get better
performance, especially on large websites. This section covers the common performance
considerations for all the web servers, and is operating system independent.
 Application server considerations
HCL Commerce is ultimately another Java 2 application that runs on top of WebSphere
Application Server and WebSphere Liberty. As a result, WebSphere Application
Server acts as the operating system for HCL Commerce. Optimizing the application
servers optimized improves the performance of HCL Commerce.
 JDBC_MONITOR_ENABLED parameter
JDBC Monitor (Java Database Connectivity) is a debugging tool that allows you to
examine SQLs that are executed in any application for performance and troubleshooting
purposes. This document provides the installation steps for JDBC Monitor.
 Tuning
an HCL Commerce deployment is made up of many individual parts that work together to
provide the e-commerce experience. By default, each part is configured to provide a
base level of performance. However, individual changes can be made to potentially gain
extra performance. The following collection of topics provides details on how to make
these performance changes to the individual components of your site.

Design phase
During the design phase of your project, it is important to consider performance priorities. It is
important to design a solution that has a good balance of features and performance
consideration, build a performance test environment, and begin to deploy tools that are required
to support your performance strategy. It is also important to identify architecture limitations that
might change a non-functional requirement.

During the design phase, a number of key activities are carried out. Here are some examples:

 Review and validate non-functional requirements with IT and business stakeholders.


 Assist with application flow design.
 Educate developers on performance.
 Review component design that is focused on performance.
 Prepare system maintenance strategies.
 Identify key performance indicators (KPI) for monitoring.
It is important to be aware of potential risks that might result from not considering performance
during the design phase:

 The possibility that features might put a greater demand on resources than projected
over the capacity plan estimate.
 Application performance and user experience might be adversely affected by third-party
application interfaces.
 The lack of an appropriate test environment might hamper performance testing
capabilities and put test results into question.

 Caching strategy
When you plan a caching strategy for HCL Commerce, considerations such as which
pages will be cached, and where they will be cached are important. These decisions are
influenced by whether you are caching a local (Transaction) server, or a remote (Store)
server. To help with these decisions, consider the following approaches.
 Marketing overview: precision marketing features
Marketing Managers can use the extensive precision marketing features in the marketing
tool to deliver targeted marketing messages to customers.
 Facet Navigation widget
This widget automatically retrieves and displays a list of facets, such as the Brand, Price,
and Color facets. For each facet value, the number of matching catalog entries is
displayed in parentheses. Typically, this widget is placed in the left sidebar of the page.
 Marketing cache design considerations
When you are designing the marketing caching for your site and stores, there are many
options, enhancements, and best practices to consider for improving marketing
performance.
 Promotion evaluation considerations
When you are designing promotions for your site, consider how your promotions are
being evaluated. How you design your promotions and configure your promotion
evaluation process can affect your site performance during promotion evaluation. When
you are creating promotions, consider the promotion type, the promotion conditions, the
size of orders that are evaluated, and the agenda builder that is used for promotion
evaluation.

Caching strategy
When you plan a caching strategy for HCL Commerce, considerations such as which pages will
be cached, and where they will be cached are important. These decisions are influenced by
whether you are caching a local (Transaction) server, or a remote (Store) server. To help with
these decisions, consider the following approaches.

Caching local (Transaction) server pages


When you are caching locally, assess the following considerations:

 Which pages provide the best performance improvement if cached.


 Where caching is to occur.
 Whether to cache full pages or page fragments.
 How to invalidate the cached data.

Which pages to cache


Good candidates for caching are web pages that:

 Are frequently accessed.


 Are stable over time.
 Contain a majority of contents that can be reused by various users.

A good example would be catalog display pages.

Where caching is to occur

Theoretically, caching takes place in the tier closest to the user. In reality, other factors such as
security and user-specific data can influence the choice of the best place to cache the content.
To maximize the benefit of dynamic caching, elements of a page can be fragmented as finely as
possible so that they can be cached independently in different cache entries.

For example, the non-user specific, non-security sensitive fragments are generally useful to
many users, and can be cached in a more public space and closer to the user. The security
sensitive data can be cached behind the enterprise firewall.

For stores that run on the Transaction server, caching outside of WebSphere Application Server
can be used with larger databases and sites to improve performance. Edge Server and the ESI
cache-plugin are provided with WebSphere Application Server to provide extra caching
capability. Session information (language ID, preferred currency ID, parent Organization, contract
ID, and member group) must be stored in session cookies. The cookies are required in order for
caching to be done on a server external to WebSphere Application Server.

Cache full pages or page fragments

All web pages consist of smaller and often simpler fragments. An example of a page fragment
might be a header, sidebar, footer, or an e-Marketing Spot. Breaking a web page into fragments
or components makes more caching possible for any page, even for personalized pages.
Fragments can be designed to maximize their reusability.

Caching a whole web page means that the entire page is cached as a large cache entry that
includes all the content from all fragments that have no includes or forwards. This approach can
save a significant amount of application server processing and is typically useful when the
external HTTP request contains all the information that is needed to find the entry.

If web pages are broken into different fragments and the fragments are cached individually, then
some fragments can be reused for a wider audience. When a web page is requested, then
different fragments are reassembled to produce the page. For more information, see Full page
and fragment caching.

If the page output has sections that are user-dependent, then the page output is cached in a
manner that is known as fragment caching. That is, the JSP pages are cached as separate
entries and are reassembled when they are requested. If the page output always produces the
same result based on the URL parameters and request attributes, then this output can be cached
with the cache-entry. Use the property consume-subfragments (CSF) , and the HCL Commerce's
store controller servlet (com.ibm.commerce.struts.ECActionServlet for HCL Commerce Version
9.0.0.x, or com.ibm.commerce.struts.v2.ECActionServlet for Version 9.0.x) as the servlet name.

Web pages can be cached by using full page caching or fragment caching, or a combination of
both methods.
Caching remote (Store) server pages

If you are using remote stores that run under the WebSphere Liberty Profile, your caching
strategy must change to reflect the containerization of the Transaction and Search servers. In
particular, you need to cache the results of REST calls differently.

Which pages are cached

Your selection of pages to be cached is largely similar for local and remote strategies. Aside from
the servlet cache, a remote store server cache contains not only the JSP/Servlet rendering
result, but also the remote REST access result. One thing to bear in mind is that calls that were
previously local in Local Store topologies are now remote calls. Therefore, you have two
considerations. You still need to provide rapid response times to calls from the customer
browser, but you must also minimize the number of calls that are passed to remote servers. You
can cache content that is frequently fetched from the Transaction or Search servers, such as
rendering results and REST results for common remote queries.

Where caching occurs

Consider caching REST tag results. If the REST result is user-, security-, or environment-neutral,
then it is a candidate for caching in the REST result cache. You can use the wcf:rest tag
attribute "cached" to declare that a call's result can be cached.

How to invalidate the cached data

You can trigger cache invalidation passively, or actively. Passive invalidation uses the Time To
Live (TTL) parameter of cache entries for webpages and fragments to trigger the action. After the
TTL time expires, the cache data container will trigger cache invalidation. This configuration
works best when you set the TTL at the level of whole web pages, and refresh the cache daily.
When you use the TTL parameter, any custom logic you create is overridden by the cache
expiry.

Active invalidation can be triggered by events that you define in the


server's cachespec.xml configuration file. However, changes in one server's file are not
automatically propagated to the other servers. If you are using the Solr search engine with HCL
Commerce Version 9.0.x, Apache Kafka is used as the default messaging infrastructure to
propagate the files. You can create a de facto pipe by writing the invalidation command to the
CACHEIVL table, which is accessed by all the servers. This approach is not as fast as a real
command pipe.

If you are using Elasticsearch with HCL Commerce Version 9.1, your caching solution is Redis.
You can monitor and control Redis using the Cache Manager, as described in HCL Cache. For
information about caching and invalidation in Elasticsearch, see HCL Cache with Redis.

For more information about configuring Apache Kafka, see Cache invalidation using Kafka and
ZooKeeper.

Alternatively, if you use WebSphere eXtreme Scale as the centralized cache data container,
cache invalidation is also centralized. In this case, you do not need to use a separate messaging
system such as Kafka.

Monitoring HCL Cache


HCL Cache integrates with the Prometheus and Grafana monitoring framework. The use of this
integration is critical for tuning, and to ensure the correct function of the cache.
The HCL Cache-Prometheus integration provides real-time access to key monitoring metrics,
such as number of operations and response times for each operation; cache-hit ratios for remote
and local caches, and for each REST end-point; flow of invalidation messages; Redis memory
usage; maintenance statistics, and more.

The integration includes pre-defined dashboards that are relevant to cache tuning:

HCL Cache - Remote


Provides visibility into the remote cache including availability, cache sizes, get/ clear/ put/
invalidate operations per second, error counts, response times (averages; 95 and 99
percentiles), hit ratios, invalidations, maintenance, and more.
HCL Cache - local cache detail
Includes detailed information for each local cache, including sizes, operations, expiries and
evictions.

HCL Cache - local cache summary


Summary view showing caches for a pod, with sizes (number of entries and memory footprint in
megabytes).
QueryApp and Transaction servers: REST caching statistics/ hit ratios
The QueryApp and Transaction Servers include cache statitics per REST service (as opposed to
per cache).

Redis
If Redis is installed with the Prometheus exporter, the available dashboard can display
information about the connected clients, memory used, commands executed, and more.

Grafana also supports a Redis datasource. The Redis Dashboard retrieves information directly
from Redis by querying the Redis datasource. The output complements the information provided
by the Redis Prometheus exporter.

The redis-datasource can be enabled during Prometheus Operator/ Grafana installation as


follows:

grafana:

plugins:

- redis-datasource
 WebSphere cache monitor
Although the WebSphere cache monitor is supported, the HCL Cache provides newer
functions and tools that are more suitable for managing a distributed cache system in
production.
 logMetricsFrequency configuration in HCL Cache
The logMetricsFrequency configuration option can be used to specify, in seconds, the
frequency at which cache statistics are written to the logs. This can be especially useful
for environments where the Prometheus and Grafana integration is not available.
 HCL Cache configurable Prometheus metrics
The HCL Cache provides cache level configurations to customize the metrics created for
the Prometheus integration.

WebSphere cache monitor


Although the WebSphere cache monitor is supported, the HCL Cache provides newer functions
and tools that are more suitable for managing a distributed cache system in production.

Consider the following use cases:

1. Cache monitoring: The HCL Cache integrates with Prometheus and Grafana for real-time
monitoring and alerting.
2. Cache invalidations: The cache manager provides REST APIs for clears and
invalidations.
3. Troubleshooting: The cache manager has troubleshooting APIs, such as listing the cache
ids for a dependency id, or details for a cache id. The Redis database can also be
queried to inspect the cache or monitoring invalidations. See Troubleshooting for details.

The WebSphere Cache monitor can be used in development to inspect the contents of Servlet
and JSP fragment caching, and to assist with the development of cachespec.xml rules.

The WebSphere Cache monitor only displays the contents and statistics of local caches. Cache
clears and invalidations issued from the monitor are also executed on the remote cache and
propagated to other containers (as applicable).

For installation steps see: Enabling cache monitoring.

logMetricsFrequency configuration
in HCL Cache
The logMetricsFrequency configuration option can be used to specify, in seconds, the
frequency at which cache statistics are written to the logs. This can be especially useful for
environments where the Prometheus and Grafana integration is not available.

Enabling logMetricsFrequency
The logMetricsFrequency setting is a top level configuration option. See cache configuration for
details.
apiVersion: v1
data:
cache_cfg-ext.yaml: |-
redis:
enabled: true
yamlConfig: "/SETUP/hcl-cache/redis_cfg.yaml" # Please leave this line
untouched
logMetricsFrequency: 60
cacheConfigs:
baseCache:
remoteCache:
shards: 5
redis_cfg.yaml: |-
...

Cache metrics loggers


Cache metrics are printed to the logs in the frequency set by logMetricsFrequency using
the com.hcl.commerce.cache.MetricsLogger logger and INFO level:
[5/2/22 16:05:08:697 GMT] 000000ed CacheMetrics I baseCache
{"[demoqaauth]:baseCache":{"remote":{"invalidates.duration.result.ok":"1/0.0075
secs- avg: 7.49 ms","puts.duration.result.ok":"1500/5.2514 secs- avg: 3.50
ms","clears.duration.result.ok":"1/0.0852 secs- avg: 85.17 ms"},"local":
{"size.current":"1500","puts.source.local":1500,"clears":1,"size.current.max":"5
000","size.max":"5000"}}}
Formatted JSON output:
{
"[demoqaauth]:baseCache": {
"remote": {
"invalidates.duration.result.ok": "1/0.0075 secs- avg:
7.49 ms",
"puts.duration.result.ok": "1500/5.2514 secs- avg: 3.50
ms",
"clears.duration.result.ok": "1/0.0852 secs- avg: 85.17
ms"
},
"local": {
"size.current": "1500",
"puts.source.local": 1500,
"clears": 1,
"size.current.max": "5000",
"size.max": "5000"
}
}
}

HCL Cache configurable


Prometheus metrics
The HCL Cache provides cache level configurations to customize the metrics created for the
Prometheus integration.

Although changes are not typically required, if you are integrating with a 3rd-party monitoring
system and there is a cost associated with the retrieval or storage of metrics, these
configurations can be used to fine-tune the metrics to be used.

Cache configurations

Metrics are configurable at the cache level. Changes can be applied to a single cache, or to the
default configuration using defaultCacheConfig. See cache configuration for details.

Enabling or disabling metrics for a cache

Disable metrics for a cache using the enabled attribute as follows:


defaultCacheConfig:
metrics:
enabled: false
Timer metrics histogram buckets

The Timer metrics used by the HCL Cache support histograms for the calculation of
percentiles. The tracking of histogram values requires the definition of additional metrics.
This support can be disabled to reduce the amount of metrics created.
hclcache_cache_clears_total{cachespace="demoqaauth",name="baseCache",scope
="local",} 100.0
hclcache_cache_clears_duration_seconds_sum{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 1.3296758
hclcache_cache_clears_duration_seconds_max{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 0.0897587
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="1.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="3.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="5.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="7.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.001",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.003",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.005",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.01",} 23.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.05",} 99.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.1",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.5",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="+Inf",} 100.0
hclcache_cache_clears_duration_seconds_count{cachespace="demoqaauth",name=
"baseCache",result="ok",scope="remote",} 100.0
The default histogram configuration is as follows:
defaultCacheConfig:
metrics:
timerNanoBuckets:
- 100000 # 0.1 ms
- 300000 # 0.3 ms
- 500000 # 0.5 ms
- 700000 # 0.7 ms
- 1000000 # 1.0 ms
- 3000000 # 3.0 ms
- 5000000 # 5.0 ms
- 10000000 # 10.0 ms
- 50000000 # 50.0 ms
- 100000000 # 100.0 ms
- 500000000 # 500.0 ms
Values are in nanoseconds.

The histogram buckets can be disabled by specifying an empty list:


defaultCacheConfig:
metrics:
timerNanoBuckets: []

If disabled, percentile calculations will no longer be available in the HCL Cache -


Remote Grafana dashboard.

Use of common metrics for all caches


The number of metrics can also be reduced by using a combined Timer for all
caches. This change is incompatible with the HCL Cache dashboards and can be
inaccurate when used with Redis cluster.
defaultCacheConfig:
metrics:
addCacheNameLabelToTimers: false
Redis servers for HCL Cache
Redis server requirements for HCL Commerce

Although the use of a remote cache in Redis is highly recommended, it is not required in all
configurations.

 Elasticsearch: Redis is required. The Redis servers must be shared by authoring and
live. This is required as NiFi must interact with both Authoring and Live environments.

 Solr Search: Redis is recommended but not required. Migrated environments that do not
implement Redis must continue using Kafka to replicate invalidations. Redis replaces
Kafka for invalidations and can also act as a remote cache. Authoring and Live can be
configured with separate Redis servers. This is recommended for production
environments.

Selecting a Redis topology


Redis can be installed in a variety of configurations. The selection depends on your performance
and high-availability requirements. The alternatives include:

 Using the Bitnami Charts to install Redis within the Kubernetes cluster

 Redis Enterprise by RedisLabs

 Redis as-a-service from a cloud provider

Standalone and cluster configurations

Redis standalone (single instance) will work appropriately for pre-production and many
production environments. Larger environments can benefit from a clustered Redis, which
allows for multiple masters and replicas.

A Redis Cluster is configured with multiple masters (3+). Caches and shards will be
distributed across the master servers. Each master can be configured with zero or more
replicas. Replicas can help scalability by handling read traffic (GET operations) and can
take over should the master become unavailable.
Required configurations
Use the following configurations in the Redis server. See key eviction for details.

maxmemory

Indicates the amount of memory available to hold cached data, and should be set to a
non-zero value.

maxmemory-policy
Must be set to volatile-lru

Key tuning configurations

Besides the topology, consider the following key tuning configurations. Most
apply to locally installed Redis, but can be relevant to redis as-a-service as
well.

To validate, or compare the performance of the Redis server, you can use the
the Redis benchmark utility, and the HCL Cache's hcl-cache-
benchmark utility.

Use fast CPUs and fast storage

Redis is mostly single-threaded, at least from the command execution point-of-view. It


benefits from fast processors with high frequency rate. If persistence is enabled, the
containers will also benefit from fast storage. Use a premium storage class that is backed
by SSDs.

Validate the Kubernetes limits

Ensure the limits set on the Redis pods are set appropiately:

Storage
If persistence is used, the Persistent Volumes needs to be sized acordingly. For example,
the Bitnami Charts set a limit of 8 GB by default. This might not be enough for production
environments and might lead to a crash.

CPU
CPU throttling can freeze the Redis server. Kubernetes is very aggressive with CPU
throttling. To avoid it, set a high limit, or remove the CPU limit for the Redis pods.
Memory
The memory required by the Redis container is a function of
the maxmemory configuration. maxmemory should be less than 70% of the container
limit.

Redis persistence
Redis includes persistence (AOF/ RDB) options that save the memory contents to disk.
This allows Redis to recover the memory contents (cache) in case of a crash. For the use
with HCL Cache, enabling RDB persistence and disabling AOF should be sufficient.

Persistence is required when replicas are used. Otherwise, it is optional and Redis does
not require a Persistence Volume. Without persistence, in the unlikely case that Redis
crashes or becomes unresponsive, Kubernetes should be able to restart the service
almost instantaneously, but with an empty cache.

If the Commerce site is not tuned to absorb a cache clear during peak traffic period,
persistence is recommended. When persistence is enabled, the startup will be delayed
by a number of seconds while Redis re-loads the memory cache from disk. It is also
possible that if the Kubernetes node crashes, manual intervention may be required to
release the Persistence Volume from the problematic node and to allow Kubernetes to
reschedule the pod (due to ReadWriteOnce-RWO mode).

Note: Disclaimer: Redis is a registered trademark of Redis Labs Ltd. Any rights therein
are reserved to Redis Labs Ltd. Any use by HCL is for referential purposes only and does
not indicate any sponsorship, endorsement, or affiliation between Redis Labs Ltd.
 Memory management in Redis
Redis is the database of choice to back cache
systems. It uses a memory management model
that supports LRU (Least Recently Used) and
other algorithms to evict keys in order to allow for
new data when the memory is full.
 Bitnami Redis installation
Although Redis is installed automatically as a sub-
chart of HCL Commerce, it can also be installed
separately, or with different configurations. This
document describes the recommended
deployment options and the use of Bitnami charts.
 HCL Cache in Redis
Redis is the remote database that stores HCL
Cache data, and it is also used to replicate
invalidations to local caches. For troubleshooting
and adminstrative purposes, you might sometimes
need to use the Redis command line to query or
configure the database.
 Use of Redis replicas
With Redis cluster, master nodes can be backed
by replicas (one or many). Replicas are used for
failover and scalability:

Bitnami Redis installation


Although Redis is installed automatically as a sub-chart of HCL Commerce, it can also be
installed separately, or with different configurations. This document describes the recommended
deployment options and the use of Bitnami charts.

Installation considerations:
Topology

Redis can be installed with different topologies, including standalone,


master/subordinate, sentinel or cluster. Most cloud providers also offer managed versions
of Redis that hide some of the high-availability and replication complexities. The following
are recommended configurations using the Bitnami charts:

1. standalone: A single master with no replicas can work well in many scenarios.
Because HCL Cache is a multi-tiered framework, the most frequently-accessed
content is served from local caches, reducing the load on Redis and therefore
decreasing its capacity requirements. (The amount of caching and hit ratios will
affect the load on each site). HCL Cache is also designed with high availability
features, and implements circuit breakers that block Redis access until the server
recovers. During that time, the local caches remain available. Kubernetes will
detect hang or crash conditions and rapidly re-spawn the master container based
on the probes defined in the Redis deployment.

Note: If replicas/ subordinates were defined (without Sentinel), the replicas are
for ready-only access and are not promoted to master. The system still needs to
wait for the master to be re-spawned. See topologies for more details.

2. cluster: Clustering can be used to scale Redis. Although each HCL Cache can
only exist on a single node (each cache is tied to a single slot), HCL
Commerce defines multiple caches (50 or greater) that can be distributed across
the Redis cluster nodes. With slot migration, it is possible to select what caches
land on each server. A Redis cluster requires a minimum of three master servers.
If replicas are used, six containers need to be deployed. See the Redis Cluster
tutorial for more details.

Persistence

Redis offers persistence AOF (Append Only File) and RDB (Redis Database) options
that save the memory contents to disk. This allows Redis to recover the memory contents
(cache) in case of a crash. For more information on AOF and RDB, see Redis
Persistence.

With standalone Redis, the use of persistence is optional but with Redis cluster it is
recommended. The use of persistence can add a small overhead to runtime operations.
There can also be a delay during Redis startup as it loads the persisted cache into
memory. This delay varies depending on the size of the file. For use with HCL Cache,
use of RDB only (not AOF) can be sufficient.
When configuring Kubernetes persistent volumes for Redis, select a storageClass with
fast SSD storage. By default, Redis requests only 8 GB of storage for a persistant
volume. That may not be enough, especially if AOF persistence is enabled. Request a
larger size (For example, 30 GB) and monitor usage to get a better understanding for
how much storage is required.

Redis Bitnami Charts


Bitnami publishes the most popular Redis charts, and they can be used to install
Redis within the Kubernetes cluster.
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
Redis Standalone

Use this Redis chart to install Redis standalone, with no persistence. Review redis-
standalone-values.yaml for details.
helm install hcl-commerce-redis bitnami/redis -n redis -f redis-
standalone-values.yaml
Note: If Prometheus is not set up, disable the metrics section prior to install. For more
infromation on Promethus and Grafana integration, see HCL Commerce Monitoring -
Prometheus and Grafana Integration

Redis Cluster

These steps install a Redis Cluster with three masters. Review redis-cluster-
values.yaml For details.
helm install hcl-commerce-redis bitnami/redis-cluster -n redis -f redis-
cluster-values.yaml
Note: If Prometheus is not set up, disable the metrics section prior to install. For more
infromation on Promethus and Grafana integration, see HCL Commerce Monitoring -
Prometheus and Grafana Integration

Common configurations and settings

The following configurations are common for the standalone and


cluster charts:

Redis configurations

The following section is to customize Redis default configurations (see redis.conf).


configuration: |-
appendonly no
save ""
maxmemory 10000mb
maxmemory-policy volatile-lru
maxmemory

Determines the size of the memory available to store Redis objects. The amount of
cache will vary from site to site. 10 GB is a good starting configuration. The pod memory
limit must be higher.
maxmemory-policy
Using volatile-lru is required for the HCL Cache. This allows Redis to evict cache entries
but not dependency IDs. The options appendonly and save are for persistence, which is
disabled in the sample.

This section can also be used to enable debug settings, such as for SLOWLOG:

slowlog-log-slower-than 10000
slowlog-max-len 512
latency-monitor-threshold 100

Redis Cluster only:

cluster-require-full-coverage: no:

When not all of the slots are covered (e.g. due to master down),
the CLUSTERDOWN error is issued. Configuring cluster-require-full-
coverage to no enables the subset of nodes that remain available to continue to serve
requests.

If you plan to enable replicas, see Use of Redis Replicas for additional configurations.

Persistence

Kubernetes persistence (PVC) must be enabled if Redis persistence (AOF/RDB) is used,


or with Redis clustering. If Redis persistence is used, the PVC must be large enough to
accommodate the Redis memory dumps. With Redis Cluster, the cluster maintains
a nodes.conf file that must persist, as otherwise nodes that restart are unable to re-join
the cluster. This file requires minimal storage.
Resources
Redis is single-threaded (for the most part), so it benefits more from having faster
processors, as opposed to having multiple processors. Two CPUs can work well in many
scenarios. It's important to monitor for Kubernetes CPU resource throttling and ensure
that is not happening, as throttling can hang the Redis main thread. The memory
assigned should be larger than the memory allocated for the Redis cache memory (see
above).
resources:
limits:
cpu: 2000m
memory: 12Gi
requests:
cpu: 2000m
memory: 12Gi
Metrics

If Prometheus is set up, you can enable metrics and serviceMonitors (requires Kube-
prometheus-stack). Redis metrics can be consumed with the Redis Grafana dashboard.
The HCL Cache - Remote dashboard also displays Redis metrics.
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: redis

Additional operating system


configurations (sysctl)
Redis requires certain host level
configurations to perform well. These may or
may not be required depending on the node
configurations. For more information on
Configure Host Kernel Settings in Bitnami
Infrastructure Stacks for Kubernetes,
see Configure Host Kernel Settings.

Transparent huge pages

If enabled, you will see this warning in the Redis log:


WARNING you have Transparent Huge Pages (THP) support enabled in your
kernel.
This will create latency and memory usage issues with Redis.
To fix this issue run the command ' echo
madvise >
/sys/kernel/mm/transparent_hugepage/enabled' as root,
and add it to your /etc/rc.local in order to retain the setting after a
reboot. Redis must be restarted
after THP is disabled (set to 'madvise' or 'never').
The configuration can be checked with the following command:
cat /sys/kernel/mm/transparent_hugepage/enabled
Socket max connections
(somaxconn)

If misconfigured, the following warning can be printed to the logs:


WARNING: The TCP backlog setting of 511 cannot be enforced because
/proc/sys/net/core/somaxconn is set to the lower value of 128.
The current value can be validated as follows:
cat /proc/sys/net/core/somaxconn

Add this section on the values.yaml file to configure THP and somaxconn as follows:

sysctl:
enabled: true
mountHostSys: true
command:
- /bin/sh
- -c
- |-
sysctl -w net.core.somaxconn=10240
echo madvise > /host-sys/kernel/mm/transparent_hugepage/enabled
Key eviction
Overview of Redis key eviction policies (LRU, LFU, etc.)

When Redis is used as a cache, it is often convenient to let it automatically evict old
data as you add new data. This behavior is well known in the developer community,
since it is the default behavior for the popular memcached system.

This page covers the more general topic of the Redis maxmemory directive used to limit
the memory usage to a fixed amount. It also extensively covers the LRU eviction
algorithm used by Redis, which is actually an approximation of the exact LRU.

Maxmemory configuration directive


The maxmemory configuration directive configures Redis to use a specified amount of
memory for the data set. You can set the configuration directive using the redis.conf file,
or later using the CONFIG SET command at runtime.

For example, to configure a memory limit of 100 megabytes, you can use the following
directive inside the redis.conf file:

maxmemory 100mb

Setting maxmemory to zero results into no memory limits. This is the default behavior for
64 bit systems, while 32 bit systems use an implicit memory limit of 3GB.

When the specified amount of memory is reached, how eviction policies are
configured determines the default behavior. Redis can return errors for commands
that could result in more memory being used, or it can evict some old data to return
back to the specified limit every time new data is added.

Eviction policies
The exact behavior Redis follows when the maxmemory limit is reached is configured
using the maxmemory-policy configuration directive.

The following policies are available:

 noeviction: New values aren’t saved when memory limit is reached. When a database
uses replication, this applies to the primary database
 allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
 allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
 volatile-lru: Removes least recently used keys with the expire field set to true.
 volatile-lfu: Removes least frequently used keys with the expire field set to true.
 allkeys-random: Randomly removes keys to make space for the new data added.
 volatile-random: Randomly removes keys with expire field set to true.
 volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-
to-live (TTL) value.

The policies volatile-lru, volatile-lfu, volatile-random, and volatile-ttl behave


like noeviction if there are no keys to evict matching the prerequisites.

Picking the right eviction policy is important depending on the access pattern of your
application, however you can reconfigure the policy at runtime while the application is
running, and monitor the number of cache misses and hits using the
Redis INFO output to tune your setup.

In general as a rule of thumb:

 Use the allkeys-lru policy when you expect a power-law distribution in the
popularity of your requests. That is, you expect a subset of elements will be
accessed far more often than the rest. This is a good pick if you are unsure.
 Use the allkeys-random if you have a cyclic access where all the keys are
scanned continuously, or when you expect the distribution to be uniform.
 Use the volatile-ttl if you want to be able to provide hints to Redis about what
are good candidate for expiration by using different TTL values when you create
your cache objects.

The volatile-lru and volatile-random policies are mainly useful when you want to use
a single instance for both caching and to have a set of persistent keys. However it is
usually a better idea to run two Redis instances to solve such a problem.

It is also worth noting that setting an expire value to a key costs memory, so using a
policy like allkeys-lru is more memory efficient since there is no need for
an expire configuration for the key to be evicted under memory pressure.

How the eviction process works


It is important to understand that the eviction process works like this:

 A client runs a new command, resulting in more data added.


 Redis checks the memory usage, and if it is greater than the maxmemory limit , it evicts
keys according to the policy.
 A new command is executed, and so forth.
So we continuously cross the boundaries of the memory limit, by going over it, and
then by evicting keys to return back under the limits.

If a command results in a lot of memory being used (like a big set intersection stored
into a new key) for some time, the memory limit can be surpassed by a noticeable
amount.

Approximated LRU algorithm


Redis LRU algorithm is not an exact implementation. This means that Redis is not able
to pick the best candidate for eviction, that is, the key that was accessed the furthest in
the past. Instead it will try to run an approximation of the LRU algorithm, by sampling
a small number of keys, and evicting the one that is the best (with the oldest access
time) among the sampled keys.

However, since Redis 3.0 the algorithm was improved to also take a pool of good
candidates for eviction. This improved the performance of the algorithm, making it
able to approximate more closely the behavior of a real LRU algorithm.

What is important about the Redis LRU algorithm is that you are able to tune the
precision of the algorithm by changing the number of samples to check for every
eviction. This parameter is controlled by the following configuration directive:

maxmemory-samples 5

The reason Redis does not use a true LRU implementation is because it costs more
memory. However, the approximation is virtually equivalent for an application using
Redis. This figure compares the LRU approximation used by Redis with true LRU.
The test to generate the above graphs filled a Redis server with a given number of
keys. The keys were accessed from the first to the last. The first keys are the best
candidates for eviction using an LRU algorithm. Later more 50% of keys are added, in
order to force half of the old keys to be evicted.

You can see three kind of dots in the graphs, forming three distinct bands.

 The light gray band are objects that were evicted.


 The gray band are objects that were not evicted.
 The green band are objects that were added.

In a theoretical LRU implementation we expect that, among the old keys, the first half
will be expired. The Redis LRU algorithm will instead only probabilistically expire the
older keys.

As you can see Redis 3.0 does a better job with 5 samples compared to Redis 2.8,
however most objects that are among the latest accessed are still retained by Redis
2.8. Using a sample size of 10 in Redis 3.0 the approximation is very close to the
theoretical performance of Redis 3.0.

Note that LRU is just a model to predict how likely a given key will be accessed in the
future. Moreover, if your data access pattern closely resembles the power law, most of
the accesses will be in the set of keys the LRU approximated algorithm can handle well.

In simulations we found that using a power law access pattern, the difference between
true LRU and Redis approximation were minimal or non-existent.

However you can raise the sample size to 10 at the cost of some additional CPU usage
to closely approximate true LRU, and check if this makes a difference in your cache
misses rate.

To experiment in production with different values for the sample size by using
the CONFIG SET maxmemory-samples <count> command, is very simple.

The new LFU mode


Starting with Redis 4.0, the Least Frequently Used eviction mode is available. This
mode may work better (provide a better hits/misses ratio) in certain cases. In LFU
mode, Redis will try to track the frequency of access of items, so the ones used rarely
are evicted. This means the keys used often have a higher chance of remaining in
memory.

To configure the LFU mode, the following policies are available:


 volatile-lfu Evict using approximated LFU among the keys with an expire set.
 allkeys-lfu Evict any key using approximated LFU.

LFU is approximated like LRU: it uses a probabilistic counter, called a Morris counter to
estimate the object access frequency using just a few bits per object, combined with a
decay period so that the counter is reduced over time. At some point we no longer
want to consider keys as frequently accessed, even if they were in the past, so that the
algorithm can adapt to a shift in the access pattern.

That information is sampled similarly to what happens for LRU (as explained in the
previous section of this documentation) to select a candidate for eviction.

However unlike LRU, LFU has certain tunable parameters: for example, how fast should
a frequent item lower in rank if it gets no longer accessed? It is also possible to tune
the Morris counters range to better adapt the algorithm to specific use cases.

By default Redis is configured to:

 Saturate the counter at, around, one million requests.


 Decay the counter every one minute.

Those should be reasonable values and were tested experimentally, but the user may
want to play with these configuration settings to pick optimal values.

Instructions about how to tune these parameters can be found inside the
example redis.conf file in the source distribution. Briefly, they are:

lfu-log-factor 10

lfu-decay-time 1

The decay time is the obvious one, it is the amount of minutes a counter should be
decayed, when sampled and found to be older than that value. A special value
of 0 means: we will never decay the counter.

The counter logarithm factor changes how many hits are needed to saturate the
frequency counter, which is just in the range 0-255. The higher the factor, the more
accesses are needed to reach the maximum. The lower the factor, the better is the
resolution of the counter for low accesses, according to the following table:

+--------+------------+------------+------------+------------+------------+

| factor | 100 hits | 1000 hits | 100K hits | 1M hits | 10M hits |
+--------+------------+------------+------------+------------+------------+

|0 | 104 | 255 | 255 | 255 | 255 |

+--------+------------+------------+------------+------------+------------+

|1 | 18 | 49 | 255 | 255 | 255 |

+--------+------------+------------+------------+------------+------------+

| 10 | 10 | 18 | 142 | 255 | 255 |

+--------+------------+------------+------------+------------+------------+

| 100 | 8 | 11 | 49 | 143 | 255 |

+--------+------------+------------+------------+------------+------------+

So basically the factor is a trade off between better distinguishing items with low
accesses VS distinguishing items with high accesses. More information is available in
the example redis.conf file.

Redis servers for HCL Cache


Redis server requirements for HCL Commerce

Although the use of a remote cache in Redis is highly recommended, it is not required in all
configurations.

 Elasticsearch: Redis is required. The Redis servers must be shared by authoring and
live. This is required as NiFi must interact with both Authoring and Live environments.

 Solr Search: Redis is recommended but not required. Migrated environments that do not
implement Redis must continue using Kafka to replicate invalidations. Redis replaces
Kafka for invalidations and can also act as a remote cache. Authoring and Live can be
configured with separate Redis servers. This is recommended for production
environments.
Selecting a Redis topology
Redis can be installed in a variety of configurations. The selection depends on your performance
and high-availability requirements. The alternatives include:

 Using the Bitnami Charts to install Redis within the Kubernetes cluster

 Redis Enterprise by RedisLabs

 Redis as-a-service from a cloud provider

Standalone and cluster configurations

Redis standalone (single instance) will work appropriately for pre-production and many
production environments. Larger environments can benefit from a clustered Redis, which
allows for multiple masters and replicas.

A Redis Cluster is configured with multiple masters (3+). Caches and shards will be
distributed across the master servers. Each master can be configured with zero or more
replicas. Replicas can help scalability by handling read traffic (GET operations) and can
take over should the master become unavailable.

Required configurations
Use the following configurations in the Redis server. See key eviction for details.

maxmemory

Indicates the amount of memory available to hold cached data, and should be set to a
non-zero value.

maxmemory-policy
Must be set to volatile-lru

Key tuning configurations

Besides the topology, consider the following key tuning configurations. Most
apply to locally installed Redis, but can be relevant to redis as-a-service as
well.
To validate, or compare the performance of the Redis server, you can use the
the Redis benchmark utility, and the HCL Cache's hcl-cache-
benchmark utility.

Use fast CPUs and fast storage

Redis is mostly single-threaded, at least from the command execution point-of-view. It


benefits from fast processors with high frequency rate. If persistence is enabled, the
containers will also benefit from fast storage. Use a premium storage class that is backed
by SSDs.

Validate the Kubernetes limits

Ensure the limits set on the Redis pods are set appropiately:

Storage
If persistence is used, the Persistent Volumes needs to be sized acordingly. For example,
the Bitnami Charts set a limit of 8 GB by default. This might not be enough for production
environments and might lead to a crash.

CPU
CPU throttling can freeze the Redis server. Kubernetes is very aggressive with CPU
throttling. To avoid it, set a high limit, or remove the CPU limit for the Redis pods.

Memory
The memory required by the Redis container is a function of
the maxmemory configuration. maxmemory should be less than 70% of the container
limit.

Redis persistence
Redis includes persistence (AOF/ RDB) options that save the memory contents to disk.
This allows Redis to recover the memory contents (cache) in case of a crash. For the use
with HCL Cache, enabling RDB persistence and disabling AOF should be sufficient.

Persistence is required when replicas are used. Otherwise, it is optional and Redis does
not require a Persistence Volume. Without persistence, in the unlikely case that Redis
crashes or becomes unresponsive, Kubernetes should be able to restart the service
almost instantaneously, but with an empty cache.
If the Commerce site is not tuned to absorb a cache clear during peak traffic period,
persistence is recommended. When persistence is enabled, the startup will be delayed
by a number of seconds while Redis re-loads the memory cache from disk. It is also
possible that if the Kubernetes node crashes, manual intervention may be required to
release the Persistence Volume from the problematic node and to allow Kubernetes to
reschedule the pod (due to ReadWriteOnce-RWO mode).

Note: Disclaimer: Redis is a registered trademark of Redis Labs Ltd. Any rights therein
are reserved to Redis Labs Ltd. Any use by HCL is for referential purposes only and does
not indicate any sponsorship, endorsement, or affiliation between Redis Labs Ltd.
 Memory management in Redis
Redis is the database of choice to back cache
systems. It uses a memory management model
that supports LRU (Least Recently Used) and
other algorithms to evict keys in order to allow for
new data when the memory is full.
 Bitnami Redis installation
Although Redis is installed automatically as a sub-
chart of HCL Commerce, it can also be installed
separately, or with different configurations. This
document describes the recommended
deployment options and the use of Bitnami charts.
 HCL Cache in Redis
Redis is the remote database that stores HCL
Cache data, and it is also used to replicate
invalidations to local caches. For troubleshooting
and adminstrative purposes, you might sometimes
need to use the Redis command line to query or
configure the database.
 Use of Redis replicas
With Redis cluster, master nodes can be backed
by replicas (one or many). Replicas are used for
failover and scalability:

Memory management in Redis


Redis is the database of choice to back cache systems. It uses a memory management model
that supports LRU (Least Recently Used) and other algorithms to evict keys in order to allow for
new data when the memory is full.

The HCL Cache has requirements that go beyond a simple key-value database. To support
these requirements, such as invalidation by dependency, the cache must maintain sets of
metadata for each key. Redis must not be allowed to evict metadata information, as this creates
inconsistencies in the cache, such as entries not getting invalidated. To maintain metadata,
the HCL Cache implements a set of maintenance processes.

Redis memory configurations


The following configurations must be in place for Redis to be used with the HCL Cache.

maxmemory
The amount of memory made available for keys (cached data).
maxmemory-policy
Must be set to volatile-lru, which removes least recently used keys with the expire field
set to true.

With Redis Enterprise, maxmemory is not used. Caches are maintained using number
of entries instead. See softMaxSize.

HCL Cache objects in Redis

Expiry
Object Use Set

{}-data-* HASH key that contains the cached data and additional YES
metadata, such as creation time, dependencies and others

{}-dep-* SET key for each dependency id. The set contains a list of all NO
the cache id that are associated to this dependency id.

{}- ZSET by expiry time that contains cache keys and their NO
maintenance
dependencies

{}-inactive ZSET by creation time used for inactivity maintenance NO


(9.1.10+)

Cached data ({}-data-) must always have an expiry set and may be evicted by
Redis when available memory is exhausted. Metadata information ({}-dep-,-
maintenance,-inactive) has no expiry and thus cannot be evicted by Redis. It
must be maintained by HCL Cache maintenance processes.

HCL Cache maintenance processes

In order to deal with metadata, the HCL Cache implements the following
maintenance processes. For more details see: Cache Maintenance.

Expired maintenance

When a key expires, Redis automatically removes it from memory. The Expired
Maintenance job is responsible for removing references from the metadata to the expired
key.
Low-memorymaintenance
When used memory reaches 100% of maxmemory, Redis starts evicting keys. Testing has
shown that full memory conditions can lead to errors such as "command not allowed
when used memory > 'maxmemory'". To prevent this situation, the HCL
Cache monitors memory usage and triggers jobs to reduce the size of each cache,
before the available memory is exhausted. The jobs remove both cache entries and their
associated metadata. The cache entries selected for removal are those that are sooner
to expire.
Inactive maintenance
This job is not required for memory maintenance, but helps reduce the memory
requirements by removing idle cache entries. Its design is very similar to that of expired
maintenance, but for cache entries that have not yet expired.

HCL Cache in Redis


Redis is the remote database that stores HCL Cache data, and it is also used to replicate
invalidations to local caches. For troubleshooting and adminstrative purposes, you might
sometimes need to use the Redis command line to query or configure the database.

This page includes a list of commands and concepts that you might find useful when learning or
troubleshooting the cache system. For a complete list of comands check the Redis site.

This document assumes you installed Redis on the Kubernetes cluster with the Bitnami charts,
but the commands should work on all distributions.

Note: Changing the cache contents directly on the Redis database can break the consistency of
the cache. The supported way to operate with the cache is by using the Cache or Cache
Manager APIs.

Accessing the Redis command line interface (redis-cli)


The Redis command line can be accessed with the redis-cli command within the container:
kubectl exec -it redis-master-0 -n redis -c redis -- bash
redis-cli
For example, if you want to run the DBSIZE command from outside the container, you would
issue the following command.
kubectl exec -it redis-master-0 -n redis -c redis -- redis-cli DBSIZE

Most Redis commands only apply to the local server. If you are running a cluster, you need to
first identify the server that contains the cache.

HCL Cache objects

The naming of the cache objects follows the convention {cache-<namespace>-<cachename>}-


<type>-<id>.

The namespace allows you to share Redis for multiple environments, and to distinguish between
Auth and Live. The prefix also contains the name of the cache and it is bracketed { }. For cluster
environments, this ensures all the cache contents are created on the same node. This is a
design decision for performance purposes.

The main two object types are -data (cache contents) and -dep, for dependencies. For example:
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_PathToken_
1=api:DC_PathToken_2=v2:DC_PathToken_3=categories:storeId=11:langId=-
1:contractId=-11005:depthAndLimit=11,11:UTF-8:requestType=GET"
"{cache-demoqalive-baseCache}-dep-storeId:categoryId:11:10506"
Querying the HCL Cache
The KEYS command can be used to inspect the contents of the Redis cache in
a TEST environment. This command should not be used in a live/ production
environment because it can lock the Redis thread. In production use the SCAN command
(and its variations) instead, because it retrieves data in chunks with a cursor.
The redis-cli interface provides a shortcut to run the SCAN command --scan that
automatically follows the cursor.
I have no name!@redis-master-0:/$ redis-cli --scan --pattern "{cache-
demoqalive-baseCache}-*"
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_Path
Token_1=api:DC_PathToken_2=v2:DC_PathToken_3=products:catalogId=11501:cont
ractId=-11005:langId=-1:partNumber=BD-BEDS-0002:storeId=11:UTF-
8:requestType=GET"
"{cache-demoqalive-baseCache}-dep-WCT+ESINDEX"
"{cache-demoqalive-baseCache}-dep-storeId:partNumber:11:LR-FNTR-0002"
"{cache-demoqalive-baseCache}-dep-storeId:categoryId:11:10506"
"{cache-demoqalive-baseCache}-dep-storeId:partNumber:11:BD-BEDS-0002"
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_Path
Token_1=api:DC_PathToken_2=v2:DC_PathToken_3=products:catalogId=11501:lang
Id=-1:partNumber=LR-FNTR-0002:storeId=11:UTF-8:requestType=GET"
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_Path
Token_1=api:DC_PathToken_2=v2:DC_PathToken_3=categories:storeId=11:langId=
-1:contractId=-11005:depthAndLimit=11,11:UTF-8:requestType=GET"
..

Cache keys (-data) are stored as HASH objects and contain the cached value along with
metadata.

The HGETALL command can be used to retrieve the contents of a cache entry:

For example:
127.0.0.1:6379> HGETALL
"{cache-demoqalive-baseCache}-data-/search/resources/org.springframework.w
eb.servlet.DispatcherServlet.class:DC_Envtype:DC_PathToken_1=api:DC_PathTo
ken_2=v2:DC_PathToken_3=categories:storeId=11:langId=-1:contractId=-
11005:id=10501,10516:UTF-8:requestType=GET"
1) "created-by"
2) "demoqalivequery-app-d7ff6c649-ch78j"
3) "created-at"
4) "1636061491787"
5) "dependencies"
6) "WCT+ESINDEX;;;WCT+FULL_ESINDEX;;;"
7) "value"
8) "\x04\x04\t>0com.ibm.ws.cache.servlet.FragmentComposerMemento\x00\x00\
x00\x00P\x00 \x02\x00\x00\x00\n>\x0eattributeBytes\x16\x00>\nattributes\
x16\x00>\x11characterEncoding\x16\x00>\x13consumeSubfragments \x00>\
x0bcontainsESI \x00>\x0bcontentType\x16\x00>\bcontents\x16\x00>\
x15externalCacheFragment\x16\x00>\x14externalCacheGroupId\x16\x00>\
x0boutputStyle#\x00\x16\x01B\x01\
t>4com.ibm.ws.cache.servlet.CacheProxyRequest$Attribute\xdb\x97\x83\x0cL\
xc2!\x1d\x00\x00\x00\x02>\x03key\x16\x00>\x05value\x16\x00\x16\x04;\xff>\
x15REST_REQUEST_RESOURCE>\x15/api/v2/categories?id\x01\x00\x00\x01B\x03\
x16\x04\t>0com.ibm.ws.cache.servlet.DefaultStatusSideEffect\xe2\xe3\xc1\
xc89\x19\x01y\x00\x00\x00\x01>\nstatusCode#\x00\x16\x00\x00\x00\xc8\x04\
t>)com.ibm.ws.cache.servlet.HeaderSideEffect\x8a\xc4#[9\xfb\xfc=\x00\x00\
x00\x03>\x04name\x16\x00>\x03set \x009\xf4\x16\x00\x16>\x0cContent-Type\
x00>\x10application/jsonC\a\xaf!{\"contents\":
[{\"name\":\"Kitchen\",\"identifier\":\"Kitchen\",\"shortDescription\":\"C
reate a kitchen that suits your needs and fits your
lifestyle\",\"resourceId\":\"https://www.demoqalive.andres.svt.hcl.com/
search/resources/api/v2/categories?
storeId=11&id=10516&id=10501&contractId=-11005&langId=-
1\",\"uniqueID\":\"10516\",\"parentCatalogGroupID\":\"/
10516\",\"thumbnail\":\"/hclstore/EmeraldCAS/images/catalog/kitchen/
category/dep_kitchen.jpg\",\"seo\":{\"href\":\"/
kitchen\"},\"storeID\":\"11\",\"sequence\":\"5.0\",\"fullImage\":\"/
hclstore/EmeraldCAS/images/catalog/kitchen/category/
dep_kitchen.jpg\",\"id\":\"10516\",\"links\":{\"parent\":{\"href\":\"/
search/resources/api/v2/categories?storeId=11&id=-1\"},\"children\":
[\"href: /search/resources/api/v2/categories?storeId=11&id=10518\",\"href:
/search/resources/api/v2/categories?storeId=11&id=10517\"],\"self\":
{\"href\":\"/search/resources/api/v2/categories?
storeId=11&id=10516\"}},\"description\":\"Create a kitchen that suits your
needs and fits your lifestyle\"},{\"name\":\"Living
Room\",\"identifier\":\"LivingRoom\",\"shortDescription\":\"Bring your
living space together with comfort and
style\",\"resourceId\":\"https://www.demoqalive.andres.svt.hcl.com/
search/resources/api/v2/categories?
storeId=11&id=10516&id=10501&contractId=-11005&langId=-
1\",\"uniqueID\":\"10501\",\"parentCatalogGroupID\":\"/
10501\",\"thumbnail\":\"/hclstore/EmeraldCAS/images/catalog/livingroom/
category/dep_livingroom.jpg\",\"seo\":{\"href\":\"/living-
room\"},\"storeID\":\"11\",\"sequence\":\"1.0\",\"fullImage\":\"/
hclstore/EmeraldCAS/images/catalog/livingroom/category/
dep_livingroom.jpg\",\"id\":\"10501\",\"links\":{\"parent\":{\"href\":\"/
search/resources/api/v2/categories?storeId=11&id=-1\"},\"children\":
[\"href: /search/resources/api/v2/categories?storeId=11&id=10503\",\"href:
/search/resources/api/v2/categories?storeId=11&id=10502\",\"href:
/search/resources/api/v2/categories?storeId=11&id=10504\"],\"self\":
{\"href\":\"/search/resources/api/v2/categories?
storeId=11&id=10501\"}},\"description\":\"Bring your living space together
with comfort and style\"}]}\x01\x01\x00\x00\x00\x01"
9) "expiry-at"
10) "1636075891787"
The TTL command shows the time-to-live remaining for an entry. When the time expires,
the entry is deleted by Redis.
127.0.0.1:6379> TTL
"{cache-demoqalive-baseCache}-data-/search/resources/org.springframework.w
eb.servlet.DispatcherServlet.class:DC_Envtype:DC_PathToken_1=api:DC_PathTo
ken_2=v2:DC_PathToken_3=categories:storeId=11:langId=-1:contractId=-
11005:id=10501,10516:UTF-8:requestType=GET"
(integer) 13609
Dependency ID information

Dependency information is stored in sets that link to cache-ids. Redis has multiple
commands to operate on sets. SCARD shows the size of the set (number of cache IDs
linked to the dependency ID).
127.0.0.1:6379> SCARD "{cache-demoqalive-baseCache}-dep-WCT+ESINDEX"
(integer) 9

SMEMBERS lists all the cache-ids for a dependency. This command should only be
used for small dependency IDs. For dependency IDs that can link to a large number
of cache-ids, SSCAN should be used instead.

127.0.0.1:6379> SMEMBERS "{cache-demoqalive-baseCache}-dep-WCT+ESINDEX"


1)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=categories
:storeId=11:langId=-1:contractId=-11005:depthAndLimit=11,11:UTF-
8:requestType=GET"
2)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=urls:store
Id=11:identifier=tables:langId=-1:UTF-8:requestType=GET"
3)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=urls:store
Id=11:identifier=home:langId=-1:UTF-8:requestType=GET"
4)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=urls:store
Id=11:identifier=sleepy-head-elegant-queen-bed-bd-beds-0002:langId=-1:UTF-
8:requestType=GET"
5)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=products:c
ontractId=-
11005:id=14033,14057,14082,14100,14111,14156,14178,14220:langId=-
1:storeId=11:UTF-8:requestType=GET"
6)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=store:DC_PathToken_2=11:DC_PathToken_3=sitecont
ent:DC_PathToken_4=suggestions:catalogId=11501:langId=-1:contractId=-
11005:suggestType=Category:UTF-8:requestType=GET"
7)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=categories
:storeId=11:langId=-1:contractId=-11005:id=10501,10516:UTF-
8:requestType=GET"
8)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=urls:store
Id=11:identifier=beds:langId=-1:UTF-8:requestType=GET"
9)
"/search/resources/org.springframework.web.servlet.DispatcherServlet.class
:DC_Envtype:DC_PathToken_1=api:DC_PathToken_2=v2:DC_PathToken_3=urls:store
Id=11:identifier=luncheon-table-dr-tbls-0001:langId=-1:UTF-
8:requestType=GET"

Dependency IDs do not set an expiration, and as per the volatile-lru memory
management rule, they cannot be evicted, because this would result in missing
invalidations.

The HCL Cache also maintains other objects such as the cache
registry, {cache_registry-namespace} and {cache-namespace-cachename}-maintenance
which contain information used for maintenance.

Deletions

Manually deleting cache objects is not recommended as it can create inconsistencies.


The Cache APIs ensure that the metadata and dependencies are correctly updated after
an operation. The FLUSHALL AYSNC command is often used to clear a Redis database.
This clears the Redis cache, but it does not issue the necessary PUBSUB messages for
the local caches. Use the CacheManager App to issue invalidations and clears.

Invalidations

The HCL Cache relies on Redis PUBSUB to distribute invalidation messages


to the containers to clear local caches. Each cache defines its own topic for
invalidations. The topic uses the following convention: {cache-<namespace>-
<cachename>}-invalidation.

PUBSUB command

The PUBSUB CHANNELS command lists all the topics with an active subscriber.
I have no name!@redis-master-0:/$ redis-cli PUBSUB CHANNELS
1) "{cache-demoqaauth-services/cache/WCSearchFacetDistributedMapCache}-
invalidation"
5) "{cache-demoqalive-services/cache/WCSearchSTADistributedMapCache}-
invalidation"
6) "{cache-demoqalive-services/cache/SearchContractDistributedMapCache}-
invalidation"
7) "{cache-demoqalive-services/cache/WCCatalogGroupDistributedMapCache}-
invalidation"
8) "{cache-demoqaauth-services/cache/WCCatalogGroupDistributedMapCache}-
invalidation"
9) ...
SUBSCRIBE commmand

The SUBSCRIBE and PSUBSCRIBE commands start a topic listener and can be used to
monitor invalidations.

The following example uses PSUBSCRIBE to subscribe to all live caches. The use of
the --csv makes the output more readable.
I have no name!@redis-master-0:/$ redis-cli --csv PSUBSCRIBE "{cache-
demoqalive-*}-invalidation"
Reading messages... (press Ctrl-C to quit)
"psubscribe","{cache-demoqalive-*}-invalidation",1
"pmessage","{cache-demoqalive-*}-invalidation","{cache-demoqalive-
baseCache}-invalidation","[p:demoqalivecache-app-8597fc98cc-dm2rz]> inv-
cache-dep:product:10001"
"pmessage","{cache-demoqalive-*}-invalidation","{cache-demoqalive-
baseCache}-invalidation","[p:demoqalivecache-app-8597fc98cc-dm2rz]> inv-
cache-dep:product:10002"
PUBLISH commmand

The PUBLISH command is the counterpart of the SUBSCRIBE command. Although the
command is available and could be used to publish invalidation messages, the supported
and recommended way to test invalidations is by using the Cache Manager REST
services. This ensures the message format is preserved. For testing or learning
purposes, you can publish invalidations as follows:
PUBLISH "{cache-demoqaauth-baseCache}-invalidation" product:10002
PUBLISH "{cache-demoqaauth-baseCache1}-invalidation" inv-cache-clear

The Cache APIs add optional metadata information that help identify the source of the
invalidations, the time at which they were created and the intended consumers.

HCL Cache with Redis clustering

The HCL Cache supports Redis clusters. Through the use


of HASH keys, all the objects that belong to the same
cache are assigned the same slot. Slots are assigned to
unique master servers.

The CLUSTER KEYSLOT command can be used to


retrieve the slot for a cache. The brackets ({..}) contain
both the namespace and the cache name:

127.0.0.1:6379> CLUSTER KEYSLOT {cache-


demoqalive-baseCache}-data-key1
(integer) 2773
127.0.0.1:6379> CLUSTER KEYSLOT {cache-
demoqalive-baseCache}-data-key2
(integer) 2773
127.0.0.1:6379> CLUSTER KEYSLOT {cache-
demoqalive-customCache}-data-key1
(integer) 13467

The CLUSTER command has a number of options to


retrieve information and configure the cluster.
The CLUSTER NODES command, for example, lists node
details and their assigned slots. From this list you can see
that the baseCache lands on node 10.0.0.3, while
the customCache on node 10.0.0.2.

127.0.0.1:6379> CLUSTER NODES


04c1fb6be4b726ab9e628a8839caa0dc641988ac
10.0.0.1:6379@16379 master - 0 1636119372578 2
connected 5461-10922
1af0ba43e61824f6c1c16cc0ee66cd3aa58f792b
10.0.0.2:6379@16379 master - 0 1636119373581 3
connected 10923-16383
54a3ebe113c84ea638d3df7706dbdb8ad2ca5836
10.0.0.3:6379@16379 myself,master - 0
1636119373000 1 connected 0-54601

You can also use the kubectl get pods -n redis -o


wide command to map IPs to pods:

kubectl get pods -n redis -o wide


NAME READY
STATUS RESTARTS AGE IP NODE
hcl-commerce-redis-redis-cluster-0 2/2
Running 0 20m 10.0.0.3 gke-perf-
cluster-...
hcl-commerce-redis-redis-cluster-1 2/2
Running 0 20m 10.0.0.1 gke-perf-
cluster-...
hcl-commerce-redis-redis-cluster-2 2/2
Running 0 20m 10.0.0.2 gke-perf-
cluster-...

The Cache Manager /cm/cache/redisNodeInfo API


also displays cache slot and node details.

The HCL Cache can be scaled by adding additional


nodes. Redis also allows for slot migration, which means
you can select which caches go to which servers. See
the Cluster Tutorial and Cluster Specification for more
details.

Use of Redis replicas


With Redis cluster, master nodes can be backed by replicas (one or many). Replicas are used
for failover and scalability:

Although other topologies support the use of replicas, this document is written with clusters in
mind.

Failover scenarios

When the Redis cluster detects a master node is down, it initiates failover to one of the master's
replicas. Replicas use the replication process to mirror updates from the master node as they
happen.

In Kubernetes, when the previously crashed pod recovers and re-joins the cluster, it will switch to
a replica role of the master currently serving the slots (which used to be its replica).
If replicas are not used, Kubernetes will still detect (using probes) and restart unresponsive pods.
The slots served by the impacted master will be temporarily unavailable. Depending on the
duration of the outage, HCL CacheCircuit Breakers will activate. It might take a couple of minutes
for the Redis node to be available again. This time is extended when persistence is used, as
Redis needs to re-load the cache upon start, and the service is unavailable until the cache is
done loading.

Scalability scenarios

Besides their role for failover, replicas can increase scalability by handling GET/ read operations.
This frees resources on the master node and enables more efficient use of resources. The HCL
Cache Redis client can be configured to direct read operations to replicas using
the readMode configuration.

When replicas are used for read operations, the following consideration must be made:

 The replication process introduces a delay. If a read operation happens immediately after
a write, the read might return stale data, or no data. This could introduce functional
issues for certain caches and customizations scenarios. The HCL Cache includes a
number of configurations that control whether reads are directed to masters or replicas,
and wait times for replications to complete.

 If replicas are used for reads, both master and replica servers must be available for
optimal performance: An unavailable replica can lead to WAIT command timeouts
during PUT operations (syncReplicas, see below), and failed read (GET) operations
executed on the replicas. When the recovered master is restarted, it reconfigures itself a
replica and starts a new syncronization process. If a full synchronization is required, the
replica server might be unavailable for some time while the database is replicated. The
system might take longer to recover when read operations are offloaded to replicas.

Configurations

Replicas and the HCL Cache might require configuration changes in Redis, the Redis client or
the HCL Cache:

Redis configurations

 cluster-replica-validity-factor: xxxxxxx

 repl-diskless-sync

 client-output-buffer-limit

Redis client configurations

The HCL Cache can be configured to issue read (GET) operations to replica servers with
the readMode setting.
HCL Cache configurations
The HCL Cache includes a number of advanced cache level configurations to control the
behaviour of PUT operations when replicas are used. These settings are more relevant
when readMode: SLAVE is used.
cacheConfigs:
cacheName:
remoteConfig:
forceReadFromMaster: [TRUE|false]
syncReplicas: [NULL| <number_of_replicas> OR all : timeout_ms]
limitSyncReplicasToNumberAvailable: [TRUE|false]
forceReadFromMaster

When readMode is set to SLAVE or MASTER_SLAVE,


the forceReadFromMaster configuration ensures that writes (PUT) for this cache are sent
to the master server.
syncReplicas
This configuration is disabled by default. If enabled, the HCL Cache invokes
the WAIT command after a PUT operation. The WAIT command introduces a delay until
the configured number of replicas has processed the change, or the timeout is reached.
Instead of specifiying a fixed number of replicas, it is possible to use all which translates
to the number of replicas currently known by the Redis client. Example: With the
following configuration, the HCL Cache will wait until the change is replicated to all the
available replicas, and wait up to 250 milliseconds:
syncReplicas: all:250
limitSyncReplicasToNumberAvailable

When syncReplicas is enabled with a number of replicas,


the limitSyncReplicasToNumberAvailable configuration can be used to restrict the
configured number to the number of replicas currently known by the Redis client.

Redis benchmark
Using the redis-benchmark utility on a Redis server

Redis includes the redis-benchmark utility that simulates running commands done by N
clients while at the same time sending M total queries. The utility provides a default set
of tests, or you can supply a custom set of tests.

The following options are supported:

Usage: redis-benchmark [-h <host>] [-p <port>] [-c <clients>] [-n <requests]> [-k <boolean>]

-h <hostname> Server hostname (default 127.0.0.1)

-p <port> Server port (default 6379)

-s <socket> Server socket (overrides host and port)

-a <password> Password for Redis Auth


-c <clients> Number of parallel connections (default 50)

-n <requests> Total number of requests (default 100000)

-d <size> Data size of SET/GET value in bytes (default 3)

--dbnum <db> SELECT the specified db number (default 0)

-k <boolean> 1=keep alive 0=reconnect (default 1)

-r <keyspacelen> Use random keys for SET/GET/INCR, random values for SADD

Using this option the benchmark will expand the string __rand_int__

inside an argument with a 12 digits number in the specified range

from 0 to keyspacelen-1. The substitution changes every time a command

is executed. Default tests use this to hit random keys in the

specified range.

-P <numreq> Pipeline <numreq> requests. Default 1 (no pipeline).

-q Quiet. Just show query/sec values

--csv Output in CSV format

-l Loop. Run the tests forever

-t <tests> Only run the comma separated list of tests. The test

names are the same as the ones produced as output.

-I Idle mode. Just open N idle connections and wait.

You need to have a running Redis instance before launching the benchmark. You can
run the benchmarking utility like so:

redis-benchmark -q -n 100000
Running only a subset of the tests
You don't need to run all the default tests every time you execute redis-benchmark. For
example, to select only a subset of tests, use the -t option as in the following example:

$ redis-benchmark -t set,lpush -n 100000 -q

SET: 74239.05 requests per second

LPUSH: 79239.30 requests per second

This example runs the tests for the SET and LPUSH commands and uses quiet mode
(see the -q switch).

You can even benchmark a specific command:

$ redis-benchmark -n 100000 -q script load "redis.call('set','foo','bar')"

script load redis.call('set','foo','bar'): 69881.20 requests per second

Selecting the size of the key space


By default, the benchmark runs against a single key. In Redis the difference between
such a synthetic benchmark and a real one is not huge since it is an in-memory
system, however it is possible to stress cache misses and in general to simulate a more
real-world work load by using a large key space.

This is obtained by using the -r switch. For instance if I want to run one million SET
operations, using a random key for every operation out of 100k possible keys, I'll use
the following command line:

$ redis-cli flushall

OK

$ redis-benchmark -t set -r 100000 -n 1000000

====== SET ======

1000000 requests completed in 13.86 seconds

50 parallel clients
3 bytes payload

keep alive: 1

99.76% `<=` 1 milliseconds

99.98% `<=` 2 milliseconds

100.00% `<=` 3 milliseconds

100.00% `<=` 3 milliseconds

72144.87 requests per second

$ redis-cli dbsize

(integer) 99993

Using pipelining
By default every client (the benchmark simulates 50 clients if not otherwise specified
with -c) sends the next command only when the reply of the previous command is
received, this means that the server will likely need a read call in order to read each
command from every client. Also RTT is paid as well.

Redis supports pipelining, so it is possible to send multiple commands at once, a


feature often exploited by real world applications. Redis pipelining is able to
dramatically improve the number of operations per second a server is able do deliver.

Consider this example of running the benchmark using a pipelining of 16 commands:

$ redis-benchmark -n 1000000 -t set,get -P 16 -q

SET: 403063.28 requests per second

GET: 508388.41 requests per second

Using pipelining results in a significant increase in performance.


Pitfalls and misconceptions
The first point is obvious: the golden rule of a useful benchmark is to only compare
apples and apples. You can compare different versions of Redis on the same workload
or the same version of Redis, but with different options. If you plan to compare Redis
to something else, then it is important to evaluate the functional and technical
differences, and take them in account.

 Redis is a server: all commands involve network or IPC round trips. It is meaningless to
compare it to embedded data stores, because the cost of most operations is primarily
in network/protocol management.
 Redis commands return an acknowledgment for all usual commands. Some other data
stores do not. Comparing Redis to stores involving one-way queries is only mildly
useful.
 Naively iterating on synchronous Redis commands does not benchmark Redis itself, but
rather measure your network (or IPC) latency and the client library intrinsic latency. To
really test Redis, you need multiple connections (like redis-benchmark) and/or to use
pipelining to aggregate several commands and/or multiple threads or processes.
 Redis is an in-memory data store with some optional persistence options. If you plan to
compare it to transactional servers (MySQL, PostgreSQL, etc ...), then you should
consider activating AOF and decide on a suitable fsync policy.
 Redis is, mostly, a single-threaded server from the POV of commands execution
(actually modern versions of Redis use threads for different things). It is not designed to
benefit from multiple CPU cores. People are supposed to launch several Redis instances
to scale out on several cores if needed. It is not really fair to compare one single Redis
instance to a multi-threaded data store.

The redis-benchmark program is a quick and useful way to get some figures and evaluate
the performance of a Redis instance on a given hardware. However, by default, it does
not represent the maximum throughput a Redis instance can sustain. Actually, by using
pipelining and a fast client (hiredis), it is fairly easy to write a program generating more
throughput than redis-benchmark. The default behavior of redis-benchmark is to
achieve throughput by exploiting concurrency only (i.e. it creates several connections
to the server). It does not use pipelining or any parallelism at all (one pending query
per connection at most, and no multi-threading), if not explicitly enabled via the -
P parameter. So in some way using redis-benchmark and, triggering, for example,
a BGSAVE operation in the background at the same time, will provide the user with
numbers more near to the worst case than to the best case.

To run a benchmark using pipelining mode (and achieve higher throughput), you need
to explicitly use the -P option. Please note that it is still a realistic behavior since a lot
of Redis based applications actively use pipelining to improve performance. However
you should use a pipeline size that is more or less the average pipeline length you'll be
able to use in your application in order to get realistic numbers.
The benchmark should apply the same operations, and work in the same way with the
multiple data stores you want to compare. It is absolutely pointless to compare the
result of redis-benchmark to the result of another benchmark program and
extrapolate.

For instance, Redis and memcached in single-threaded mode can be compared on


GET/SET operations. Both are in-memory data stores, working mostly in the same way
at the protocol level. Provided their respective benchmark application is aggregating
queries in the same way (pipelining) and use a similar number of connections, the
comparison is actually meaningful.

When you're benchmarking a high-performance, in-memory database like Redis, it


may be difficult to saturate the server. Sometimes, the performance bottleneck is on
the client side, and not the server-side. In that case, the client (i.e., the benchmarking
program itself) must be fixed, or perhaps scaled out, to reach the maximum
throughput.

Factors impacting Redis performance


There are multiple factors having direct consequences on Redis performance. We
mention them here, since they can alter the result of any benchmarks. Please note
however, that a typical Redis instance running on a low end, untuned box usually
provides good enough performance for most applications.

 Network bandwidth and latency usually have a direct impact on the


performance. It is a good practice to use the ping program to quickly check the
latency between the client and server hosts is normal before launching the
benchmark. Regarding the bandwidth, it is generally useful to estimate the
throughput in Gbit/s and compare it to the theoretical bandwidth of the
network. For instance a benchmark setting 4 KB strings in Redis at 100000 q/s,
would actually consume 3.2 Gbit/s of bandwidth and probably fit within a 10
Gbit/s link, but not a 1 Gbit/s one. In many real world scenarios, Redis
throughput is limited by the network well before being limited by the CPU. To
consolidate several high-throughput Redis instances on a single server, it worth
considering putting a 10 Gbit/s NIC or multiple 1 Gbit/s NICs with TCP/IP
bonding.
 CPU is another very important factor. Being single-threaded, Redis favors fast
CPUs with large caches and not many cores. At this game, Intel CPUs are
currently the winners. It is not uncommon to get only half the performance on
an AMD Opteron CPU compared to similar Nehalem EP/Westmere EP/Sandy
Bridge Intel CPUs with Redis. When client and server run on the same box, the
CPU is the limiting factor with redis-benchmark.
 Speed of RAM and memory bandwidth seem less critical for global
performance especially for small objects. For large objects (>10 KB), it may
become noticeable though. Usually, it is not really cost-effective to buy
expensive fast memory modules to optimize Redis.
 Redis runs slower on a VM compared to running without virtualization using
the same hardware. If you have the chance to run Redis on a physical machine
this is preferred. However this does not mean that Redis is slow in virtualized
environments, the delivered performances are still very good and most of the
serious performance issues you may incur in virtualized environments are due
to over-provisioning, non-local disks with high latency, or old hypervisor
software that have slow fork syscall implementation.
 When the server and client benchmark programs run on the same box, both the
TCP/IP loopback and unix domain sockets can be used. Depending on the
platform, unix domain sockets can achieve around 50% more throughput than
the TCP/IP loopback (on Linux for instance). The default behavior of redis-
benchmark is to use the TCP/IP loopback.
 The performance benefit of unix domain sockets compared to TCP/IP loopback
tends to decrease when pipelining is heavily used (i.e. long pipelines).
 When an ethernet network is used to access Redis, aggregating commands
using pipelining is especially efficient when the size of the data is kept under
the ethernet packet size (about 1500 bytes). Actually, processing 10 bytes, 100
bytes, or 1000 bytes queries almost result in the same throughput. See the
graph below.
 On multi CPU sockets servers, Redis performance becomes dependent on the
NUMA configuration and process location. The most visible effect is that redis-
benchmark results seem non-deterministic because client and server processes
are distributed randomly on the cores. To get deterministic results, it is required
to use process placement tools (on Linux: taskset or numactl). The most efficient
combination is always to put the client and server on two different cores of the
same CPU to benefit from the L3 cache. Here are some results of 4 KB SET
benchmark for 3 server CPUs (AMD Istanbul, Intel Nehalem EX, and Intel
Westmere) with different relative placements. Please note this benchmark is not
meant to compare CPU models between themselves (CPUs exact model and
frequency are therefore not disclosed).

 With high-end configurations, the number of client connections is also an


important factor. Being based on epoll/kqueue, the Redis event loop is quite
scalable. Redis has already been benchmarked at more than 60000 connections,
and was still able to sustain 50000 q/s in these conditions. As a rule of thumb,
an instance with 30000 connections can only process half the throughput
achievable with 100 connections. Here is an example showing the throughput
of a Redis instance per number of connections:

 With high-end configurations, it is possible to achieve higher throughput by


tuning the NIC(s) configuration and associated interruptions. Best throughput is
achieved by setting an affinity between Rx/Tx NIC queues and CPU cores, and
activating RPS (Receive Packet Steering) support. More information in
this thread. Jumbo frames may also provide a performance boost when large
objects are used.
 Depending on the platform, Redis can be compiled against different memory
allocators (libc malloc, jemalloc, tcmalloc), which may have different behaviors
in term of raw speed, internal and external fragmentation. If you did not
compile Redis yourself, you can use the INFO command to check
the mem_allocator field. Please note most benchmarks do not run long enough to
generate significant external fragmentation (contrary to production Redis
instances).

Other things to consider


One important goal of any benchmark is to get reproducible results, so they can be
compared to the results of other tests.

 A good practice is to try to run tests on isolated hardware as much as possible. If it is


not possible, then the system must be monitored to check the benchmark is not
impacted by some external activity.
 Some configurations (desktops and laptops for sure, some servers as well) have a
variable CPU core frequency mechanism. The policy controlling this mechanism can be
set at the OS level. Some CPU models are more aggressive than others at adapting the
frequency of the CPU cores to the workload. To get reproducible results, it is better to
set the highest possible fixed frequency for all the CPU cores involved in the
benchmark.
 An important point is to size the system accordingly to the benchmark. The system
must have enough RAM and must not swap. On Linux, do not forget to set
the overcommit_memory parameter correctly. Please note 32 and 64 bit Redis instances
do not have the same memory footprint.
 If you plan to use RDB or AOF for your benchmark, please check there is no other I/O
activity in the system. Avoid putting RDB or AOF files on NAS or NFS shares, or on any
other devices impacting your network bandwidth and/or latency (for instance, EBS on
Amazon EC2).
 Set Redis logging level (loglevel parameter) to warning or notice. Avoid putting the
generated log file on a remote filesystem.
 Avoid using monitoring tools which can alter the result of the benchmark. For instance
using INFO at regular interval to gather statistics is probably fine, but MONITOR will
impact the measured performance significantly.

Other Redis benchmarking tools


There are several third-party tools that can be used for benchmarking Redis. Refer to
each tool's documentation for more information about its goals and capabilities.

 memtier_benchmark from Redis Ltd. is a NoSQL Redis and Memcache traffic


generation and benchmarking tool.
 rpc-perf from Twitter is a tool for benchmarking RPC services that supports Redis and
Memcache.
 YCSB from Yahoo @Yahoo is a benchmarking framework with clients to many
databases, including Redis.

Caching and Redis client configurations


for HCL Cache
The HCL Cache is configured via a set of files that define the configurations for each cache, and
the Redis connection information.

File Location Usage

/SETUP/hcl-cache/ Config Redis client connection information


redis_cfg.yaml Map
File Location Usage

/SETUP/hcl-cache/ Container Provided by default and present on all containers that


cache_cfg.yaml use HCL Cache. It contains default and preset
configurations and must not be modified

/SETUP/hcl-cache/ Config You can create this file to extend and override the
cache_cfg-ext.yaml Map configuration in cache_cfg.yaml. This file has the
same format as cache_cfg.yaml

Redis client configuration

The Redis client connection information is stored in redis_cfg.yaml. This file contains

details about the topolgy (standalone, cluster, etc), the connection information
(hostname, TLS, authentication), pools and timeouts. See Redis Client Configuration for
details.

Cache configurations
Caches are configured in two main files: cache_cfg.yaml exists on all contains and
defines default values. Specific values for default caches, and cache_cfg-ext.yaml can
be used to update defaults or set new configurations for custom caches. See: Cache
Configuration.

Configurations in Kubernetes

For the customizable files cache_cfg-ext.yaml and redis_cfg.yaml, the HCL

Cache uses a technique whereby the contents of the files are stored in Kubernetes
Configuration Maps (configmaps). This allows for updates to be made without
having to create custom images for each environment, as for example the Redis
hostname might be different from environment to environment. Pods are configured
to load the config map during initialization and make its contents available as
regular files.

Configurations in Helm

The configmaps are originally created during installation of the Helm chart. The original
values are defined in the HCL Cache section of values.yaml. These values can be
updated as required.
hclCache:

configMap:

cache_cfg_ext: |-

redis:

enabled: true

yamlConfig: "/SETUP/hcl-cache/redis_cfg.yaml" # Please leave this

line untouched

cacheConfigs:

...

redis_cfg: |-

singleServerConfig:

...

To update the configuration, the recommended approach is to update the chart, and
perform Helm Update. If the configmap is updated directly, pods must be manually
restarted to load the updated values.

Note: Change to one release (auth/ share/ live) might need to be repicated to the others.

Validating the current client configuration

After installation, you can inspect the contents of the configmap as follows:

kubectl get configmap -n commerce

NAME DATA AGE

demo-qa-auth-demoqa-hcl-cache-config 2 15d

demo-qa-live-demoqa-hcl-cache-config 2 15d

demo-qa-share-demoqa-hcl-cache-config 2 15d

kubectl describe configmap -n commerce demo-qa-auth-demoqa-hcl-cache-

config

kubectl edit configmap -n commerce demo-qa-auth-demoqa-hcl-cache-config


 HCL Commerce - Redis client configurations
HCL Commerce uses the Redisson client to communicate with
the Redis server. Redisson is configured using a YAML
configuration file, which is defined in the HCL Commerce Helm
chart and stored in a Kubernetes configuration map.
 HCL Cache configurations
The HCL Cache provides a set of configuration files in YAML
format. The configuration can be updated for tuning purposes,
or to provide custom caches with non-default configurations.
See Configurations in Helm for details on how the configuration
is updated.

Custom caching
HCL Cache extends the capabilities of DynaCache and introduces remote caching. Therefore,
additional configuration options are available for custom caches. Custom caches can be
configured using the Cache Configuration Yaml file for extensions.

Custom caches are declared in the WebSphere configuration and accessed with
the DistributedMap interface. Migrated custom caching code does not require modification to use
the HCL Cache.

The size of a cache is used as the starting point for local caching. Disk offload is not available --
remote caching is instead recommended. See Local and Remote Caching for details.

Registering Custom Caches in the WebSphere Configuration


Transaction server container

When custom caches are added with the Transaction server Run Engine commands run-
engine command, they are by default automatically mapped to the HCL Cache cache
provider.
Liberty containers
Custom caches defined in the configDropins/overrides directory must explicitly specify
the HCL Cache cacheProviderName as in the example below:
<?xml version="1.0" encoding="UTF-8"?>
<server>
<distributedMap id="services/cache/CustomCache1"
memorySizeInEntries="2000" memorySizeInMB="100" cacheProviderName="hcl-
cache"/>
</server>

Configuring HCL Cache Options


Custom caches can be configured with non-default options as described in
the Cache Configurations document. For example, a custom cache
named services/ cache/ MyCustomCache can be configured to use remote-only
caching by adding a new configuration in cache_cfg-ext.yaml under
the cacheConfigs element:
cacheConfigs:
...
services/cache/MyCustomCache:
remoteCache:
enabled: true
localCache:
enabled: false
Using the HCL Cache for in-memory data storage

The HCL Cache is traditionally used for caching scenarios, where if entries are not found
in the cache, they can be regenerated by the application. With the incorporation of the
remote cache that allows for large amounts of data storage, the HCL Cache can also be
used as temporary in-memory database. In its default configuration, the HCL
Cache implements maintenance processes that remove cache entries when needed to
avoid out of memory conditions. This could lead to the loss of cache entries.
If the objects stored in the cache cannot be regenerated, the Low Memory
Maintenance process for the specific cache must be disabled to avoid data loss:
services/cache/MyCustomCache:
remoteCache:
onlineLowMemoryMaintenance:
enabled: false

Low Memory Maintenance can continue to work on other caches. If the caches that
disable Low Memory Maintenance require a significate amount of memory, the memory
made available (maxmemory) might need to be retuned. The Redis persistence options
might also need to be updated to a more durable configuration (e.g. enable AOF and
RDB)

Accessing a Cache with the DistributedMap Interface


As in previous versions, custom caches can be accessed using the IBM
WebSphere DynaCache DistributedMap interface, as in the following
example:
// Obtain cache reference using JNDI name
InitialContext ctx = new InitialContext();
DistributedMap myCustomCache = (DistributedMap)
ctx.lookup("services/cache/MyCustomCache");

// insert into the cache


myCustomCache.put("cacheId", myCacheEntryObject )

final int priority = 1;


// the time in seconds that the cache entry should remain in the
cache. The default value is -1 and means the entry does not time
out.
final int timeToLive = 1800;
// the time in seconds that the cache entry should remain in the
local cache if not accessed.
final int inactivityTime = 900
// Not supported
final int sharingPolicy = 0;
final String [] dependencyIds = new String [] {"dependencyId1",
"dependencyId2"};

myCustomCache.put("cacheId", myCacheEntryObject, priority,


timeToLive, inactivityTime, sharingPolicy, dependencyIds );

// Read an object from cache


Object cachedObject = myCustomCache.get("cacheKey");

// Invalidate by dependency id
myCustomCache.invalidate("dependencyId1");

HCL Cache Manager


The HCL Cache Manager provides a set of REST interfaces to interact with the cache, additional
monitoring metrics, and a set of utilities.
REST interfaces

In addition to APIs to clear and invalidate cached data, the Cache Manager includes APIs that
can be used to retrieve cache entry and dependency details for debugging information.

Issuing a cache clear on baseCache:

curl -X 'DELETE' 'https://cache.demoqalive.hcl.com/cm/cache/clear?

cache=baseCache'

Table 1. Redis information REST API:

Metho
d Path Description

GET /cache/health-check Service health check.

GET /cache/redisNodeInfo Redis topology information.

Table 2. Invalidate and clear REST API:APIs to clear and invalidate caches:

Metho
d Path Description

DELETE /cache/invalidate Invalidates by dependency ID.

DELETE /cache/clear Clears the specified caches.

DELETE /cache/clearall Clears all registered caches.

DELETE /cache/clearRegistry Registry clear.

DELETE /cache/publishInvalidation Issues a invalidation id (PUBSUB) to local caches but


does not clear the remote cache (for debugging
purposes).
Table 3. Cache information REST API:The following APIs are used for monitoring or debugging:

Metho
d Path Description

GET /cache Returns a list of all the registered caches and current sizes.

GET /cache/size Remote size for cache.

GET /cache/id/byDependency Returns a list of cache IDs associated to a dependency ID


(for debugging).

GET /cache/id/byIds Returns cache entry details for the specified ID (for
debugging).

Installing Cache Manager

The Cache Manager pod must be enabled during installation in values.yaml by configuring
with enabled: true.

cacheApp:

name: cache-app

enabled: true

For high availability, you might choose to run redundant cache manager pods.

Accessing Cache Manager

Cache Manager can be accessed with port-forwarding or by enabling Ingress. The Swagger
API is available under the path /openapi/ui/#/.

Port forwarding

1. Start port forwarding to the Cache Manager service.

kubectl port-forward -n commerce service/demoqalivecache-app

40901:40901
2. Access the HCL Cache Manager Swagger/ API using localhost and
path /openapi/ui/#/.

https://localhost:40901/openapi/ui/#/

Ingress

Ingress access can optionally be enabled in values.yaml for both authoring and live
environments. The cache manager endpoints do not implement authentication. Only
enable access through ingress definitions that are internal and restricted.

cache:

auth:

enabled: true

domain: cache.{{ $.Values.common.tenant }}

{{ $.Values.common.environmentName }}auth{{ $.Values.common.externalDomain

}}

live:

enabled: true

domain: cache.{{ $.Values.common.tenant }}

{{ $.Values.common.environmentName }}live{{ $.Values.common.externalDomain

}}

Monitoring

The HCL Cache Manager makes available additional remote-only APIs, which are
used from the HCL Cache - Remote dashboard:

Metric Use

hclcache_cache_size_current{ scope="remot Size of remote cache in entries.


e" }

hclcache_cache_size_maintenance Number of expired keys pending


maintenance.
Metric Use

hclcache_cache_remote_node_mapping Mapping between HCL Cache caches and


Redis nodes.

See Monitoring for details.

Utilities

The Cache Manager pod also makes available a number of cache utilities for
benchmarking, debugging, and configuration. They are available under
the /SETUP/hcl-cache/utilities/ directory. For more information, see HCL

Cache utilities.

 HCL Cache utilities


The HCL Cache manager pod includes the following utilities:

Troubleshooting the HCL Cache


Tools and techniques that can be used for troubleshooting the cache.

Monitoring

Debugging a complex distributed system without the support of metrics and monitoring can be a
challenging task. The Prometheus and Grafana integration gives you visibility into the number
and performance of all cache operations and maintenance processes, which can enable you to
quickly narrow down the problem.

Cache manager
The Cache Manager includes a number of debug APIs to retrieve details about the caches and
cached data. See Cache Manager for details.

Redis database

Redis is a database, and provides a command interface (redis-cli) and commands that can be
used to query it and retrieve information about the existing cache keys and metatada. For details
see HCL Cache in Redis.
Tracing

The following string is used to trace the operation of the HCL Cache:

com.hcl.commerce.cache*=all

If enabled at the fine level instead, the HCL Cache will create a less verbose output, with timing
and invalidation details.

Troubleshooting scenarios

 Near-Real-Time (NRT) Build Index

Troubleshooting Near-Real-Time
(NRT) index building
Some operations, such as updating a product description, trigger a Near-Real-Time (NRT) delta
build index with NiFi. uses invalidation messages to notify the NiFi server of the event. This
document will help you troubleshoot the aspects of the NRT process.

Troubleshooting steps
Confirm the Transaction Server is sending messages

When the operation is performed, the Transaction Server will write the event on
the WCNifiDistributedMapCache channel. Use the SUBSCRIBE command to confirm
the event is being written to the queue:
redis-cli subscribe
"{cache-demoqaauth-services/cache/WCNifiDistributedMapCache}-invalidation"

The namespace will vary from system to system.

If the SUBSCRIBE command does not capture any events, there might be a problem with
the Redis connection from the Transaction Server, or the specific event might not be
configured to do so.

Check if NiFi is listening on the Redis NRT queue

The NiFi server registers PUBSUB listeners with Redis to receive the events.

Use the PUBSUB CHANNELS command on all the master servers to confirm NiFi's
listeners are enabled:
redis-cli pubsub channels | grep -i nifi
{cache-demoqaauth-services/cache/WCNifiDistributedMapCache}-invalidation
{cache-demoqaauth-services/cache/WCNifiBatchDistributedMapCache}-
invalidation

If the listener (WCNifiDistributedMapCache) is not active, NiFi might not be running or


might not be operating correctly.

Ensure that WCNifiDistributedMapCache is configured as follows.


"[${TENANT:-}${ENVIRONMENT:-}auth]:services/cache/
WCNifiDistributedMapCache":
localCache:
enabled: false
remoteCache:
enabled: false
remoteInvalidations:
publish: true
publishFormat: plain
Use Tracing in NiFi to confirm listeners are active

The next step is to use tracing in NiFi to confirm that the NRT listeners are triggering
when events are posted to the channels
(WCNifiDistributedMapCache and WCNifiBatchDistributedMapCache).

Enable the following traces in logback.xml:


<logger name="com.hcl.commerce.cache" level="TRACE" />
<logger name="org.redisson" level="TRACE" />
Confirm the listeners trigger by looking for the onMessage method in the trace:
... onMessage: onMessage(CharSequence pattern, CharSequence channel,
String stringMessage) ..
Also, confirm that WCNifiBatchDistributedMapCache is configured as follows.
"[${TENANT:}${ENVIRONMENT:-}auth]:services/cache/
WCNifiBatchDistributedMapCache":
localCache:
enabled: false
remoteCache:
enabled: false
remoteInvalidations:
publish: true
publishFormat: plain

HCL Cache Architecture


In previous releases of HCL Commerce, DynaCache in-memory caching was used for caching.
In HCL Commerce 9.1, hclcache extends the functionality of DynaCache and brings important
improvements.

DynaCache offers in-memory caching, disk off-loading, cache replication, and Servlet and JSP
fragment caching. DynaCache also offers a pluggable architecture that enables the use of
different cache providers (such as HCL Cache, Redis and IBM WebSphere eXtreme Scale) while
maintaining access with a consistent set of interfaces such
as DistributedMap and cachespec.xml.

HCL Cache is installed as a DynaCache cache provider, which enables its use through
DynaCache interfaces without code changes. HCL Cache provides the following:

 Multi-tiered caching with local and remote caching configurations.


 Built-in support for replication of local cache invalidations.
 Integration with Prometheus and Grafana for monitoring and alerting.

When a cache (default or custom) that is configured with the HCL Cache provider is accessed,
DynaCache defers the processing to the custom provider. HCL Cache interacts with the local
and remote caches, and replicates invalidations as required according to the configuration.

 Local and remote caching in HCL Cache


HCL Cache extends the capabilities of DynaCache by enabling the use of remote
caching supported by Redis.
 Invalidation support
Local caches require a mechanism for replication of invalidation messages to ensure that
local cache entries associated with an invalidation ID are removed from all containers.
 HCL Cache - circuit breakers
HCL Cache implements circuit breakers for the remote cache to protect the application
from a failing or unavailable Redis server.
 Remote cache maintenance
Maintenance processes for remote caches are specific to HCL Commerce releases.
 Remote cache tuning configurations in HCL Cache
This document describes cache level tuning configurations for remote caches.

Local and remote caching in HCL Cache


HCL Cache extends the capabilities of DynaCache by enabling the use of remote caching
supported by Redis.

Local caching

The behaviour of local caching is similar to that of the traditional DynaCache caches, with some
important differences:

Replication of invalidation messages

The use of local caching requires the replication of cache clear and invalidation
messages to ensure stale or outdated content is removed from caches on other
containers. In HCL Commerce Version 9.0 this was achieved with the use of Kafka.
When HCL Cache is enabled with Redis, invalidations are handled automatically by the
framework. See Invalidations for details.

Automatic memory footprint tuning


HCL Cache framework simplifies the tuning of local cache sizes with Automatic Memory
Footprint Tuning which can automatically resize local caches according to the amount of
free memory.

Monitoring capabilities
HCL Cache implements a comprehensive set of metrics for local caches to support
monitoring, debugging and tuning. The metrics enable tracking of cache sizes (by
number of entries and memory footprint in MB), hit ratios, cache operations and internal
removals per second (expiry, inactivity, explicit removal and LRU eviction). Local cache
metrics can be tracked with the "HCL Cache - Local Cache Details", and "HCL Cache -
Local Cache Summary" dashboards. See Monitoring for details.
HCL Cache: Local Cache Details -
dashboard:

Use of disk offload

HCL Cache caches do not support disk offload. Disk offload configurations in WebSphere
DynaCache are ignored. To scale beyond the local JVM memory limits, local caches are
designed to be used in conjunction with remote Redis caches.

WebSphere extended cache monitor


For more information on WebSphere extended cache monitor, see WebSphere cache
monitor.

Remote caching

Extending the caching framework with the use of a remote cache


in Redis can improve performance and scability:
 Remote caches are not bound by the local JVM limits:
Remote caches can scale to hundreds of gigabytes.
Large caches can render higher hit ratios, improving
performance and reducing the need to regenerate
content, thus reducing the overhead in other services
and databases.

 Scalability is improved: With the use of local caches,


each server must maintain its own cache. Cache entries
might need to be generated once per pod. For example,
each pod must generate and maintain its own copy of
the home page. When only local caching is used,
increasing the number of containers increases the
overall number of cache misses, resulting in increased
load on backend services, such as the primary database
for HCL Commerce or the Search database (Solr or
Elasticsearch). But when remote caching is used, each
remote cache entry only needs to be created from
backend services once. Servers using both local and
remote caching (QueryApp, TsApp, Search) can be scaled
up and consume existing cache from the remote cache
instead of regenerating its own from backend services.
Newly started containers have immediate access to the
existing cache and the impact of the initialization period
is greatly reduced. This advantage also applies to cache
clear scenarios.

In order to use remote caching, cache entries must be


serializable. By default, remote caches that encounter
serialization errors are disabled: Remote Caching is disabled for
this cache due to previous serialization errors. Ensure all cache
entries are serializable. If entries cannot be made serializable,
consider disabling remote caching for this cache.
The stopRemoteCachingOnSerializationErrors:false configur
ation can be used to continue allowing remote caching after
serialization errors.

Note: Use this configuration cautiously as it might result in


memory leaks.

Choosing a deployment option


Local and remote caching

This is the default configuration. Local caches act as a near-cache for remote caches,
keeping copies of the most recently accessed cache entries. These cache entries can be
served directly from the local container, without making remote calls to the Redis servers,
improving performance and reducing overhead on remote servers.

Local and remote caching flows:

When local and remote caches are used together, the caching flow is as follows:

Existing in
Cache Miss Local Existing in Remote

1- Local miss 1- Local hit 1- Local miss

2- Remote miss 2- Remote hit

3- Local PUT 3- Local PUT (internal)

4- Remote PUT

Local-only caching

The primary reason for disabling remote caching is if the objects stored in the cache are
not serializable. If custom caches store objects that are not serializable, remote caching
should be disabled for the cache in the configuration. See Custom caching for details.
Local-only caches must still use Redis for replication of invalidations.

Remote-only caching
Remote-only caching might be desirable for caches that store frequently updated objects,
when changes must be immediately available to all containers. For example, changes to
user session data must be immediately available to all containers. Disabling local caches
eliminates the risk of reading stale data due to timing issues with invalidations. Examples
of default caches that are configured as remote-only include caches for Precision
Marketing and Punch-out integration.
 Automatic memory footprint tuning in
local caches
Local caches reside in the local
application server Java Virtual
Machine (JVM). Each local cache
holds a number of cache entries, and
each cache entry has a cache ID, a
cache value, and a list of
dependency IDs. Controlling the
memory footprint of local caches is
important, since larger caches can
improve performance, but a cache
that is too large can lead to low or out
of memory conditions.

Automatic memory footprint tuning in


local caches
Local caches reside in the local application server Java Virtual Machine (JVM). Each local cache
holds a number of cache entries, and each cache entry has a cache ID, a cache value, and a list
of dependency IDs. Controlling the memory footprint of local caches is important, since larger
caches can improve performance, but a cache that is too large can lead to low or out of memory
conditions.

Each local cache has a configured maximum number of cache entries it can hold
(memorySizeInEntries) and an optional maximum memory footprint (memorySizeInMB). For
example, in a WebSphere Application Server V8.5.5 Liberty configuration server.xml file, the
following line configures the memory footprint of the HCL Cache with JNDI
name services/cache/SearchQueryDistributedMapCache:
<distributedMap id="services/cache/SearchQueryDistributedMapCache"
cacheProviderName="hcl-cache" memorySizeInEntries="30012" memorySizeInMB="400"
lowThreshold="95" highThreshold="98"/>

By default, HCL Cache automatically increase or decrease their memory footprint according to
how much JVM heap is available. When the JVM heap utilization is below 65% used, HCL local
caches will increase their maximum sizes up to 400% of their configured sizes, and conversely
when the JVM heap is more than 75% utilized, they will decrease their maximum sizes down to
10% of their configured sizes. In this way, HCL local caches take advantage of available free
memory, while helping to avoid low or out of memory conditions.
Screen capture from the HCL Cache - Local Cache Details dashboard. For more details,
see Monitoring.

Automatic memory footprint configurations

The automatic memory footprint feature provides configurations that can be used for advanced
troubleshooting or tuning scenarios. See Cache Configuration for details of updating the HCL
Cache configuration.

globalLocalCache is a top level element.

 Configuring used memory thresholds


 Configuring minimum and maximum scale factors
 Disabling automatic memory footprint tuning
 Reporting unsizeable cache values

Configuring used memory thresholds

By default, caches can increase their maximum sizes when used JVM memory is less
than 65% of the maximum heap size, and will decrease their maximum sizes when used
JVM memory is more than 75% of the maximum heap size.
globalLocalCache:
localCacheTuning:
tightMemoryPercentUsedThreshold: 75
normalMemoryPercentUsedThreshold: 65
Configuring minimum and maximum scale factors

By default, caches will not increase their maximum sizes to more than 400% of their
configured maximum sizes, and will not decrease their maximum sizes to less than 10%
of their configured maximum sizes.
globalLocalCache:
localCacheTuning:
maxScaleFactor: 400
minScaleFactor: 10
Disabling automatic memory footprint tuning

By default, automatic memory footprint tuning is enabled. You can disable it by


specifying false.
globalLocalCache:
localCacheTuning:
enabled: true
Reporting unsizeable cache values

HCL local cache can calculate the memory footprint of cache entries when they contain
values composed of typical java objects. When other objects are encountered, the
calculated memory footprint may be inaccurate. Specify "reportUnsizeable: true" to log
an information message when HCL Cache is unable to calculate an accurate object
memory footprint. The default value of this configuration setting is false.
globalLocalCache:
reportUnsizeable: false

Invalidation support
Local caches require a mechanism for replication of invalidation messages to ensure that local
cache entries associated with an invalidation ID are removed from all containers.

In IBM Websphere Commerce Version 8, the IBM Data Replication Service (DRS) is integrated
with WebSphere DynaCache and performs the job of replicating invalidation messages. In HCL
Commerce Version 9.0, Kafka is used to send invalidation messages. With HCL Cache with
Redis in HCL Commerce Version 9.1, replication of invalidation messages is handled within
DynaCache by the HCL Cache Provider. HCL Cache automatically issues invalidation messages
when clear and invalidate operations are issued for a cache that enables local caching. The
same cache on other containers implement a listener, and when the invalidation messages are
received, the indicated invalidate and clear operations are performed on the local cache.

HCL Cache relies on Redis PUBSUB technology (Elasticsearch-based search solution uses
Redis PUBSUB for Near Real-Time (NRT) updates) to replicate invalidation messages. For more
information on Redis PUBSUB for NRT updates, see The Elasticsearch index lifecycle. Each
cache defines a topic, with format {cache-namespace-cacheName}-invalidation where
invalidation messages are issued and received.

The Redis database provides commands, that allow you to list listeners (PUBSUB CHANNELS),
publish (PUBLISH) and subscribe to messages (SUBSCRIBE). See Invalidation in Redis for
details.

Timing considerations

Sending and receiving invalidation messages using Redis is fast, but not instantaneous.
Consider an HCL Commerce request that executes in the Transaction serverts-app and makes
a change to data in the database. Immediately after the database transaction commits, the local
cache is invalidated and invalidation messages are sent to peer application servers. Meanwhile,
the application may execute a subsequent request that expects to use the updated data. When
the messages are received by the peer servers, they are immediately processed and the local
cache is invalidated according to the messages received. But between the time that the
messages are sent and the time that the local caches are invalidated, the data in the local
caches is "stale". If the subsequent request is received in a peer server before the cache
invalidation has completed, it will see the stale data, perhaps causing incorrect processing to
occur.

To help avoid accessing stale data due to this situation, the HCL Commerce data cache provides
optional configurations to introduce a short delay in the original Transaction server request, just
after the invalidation messages are sent, and before the request returns. If the delay is long
enough to allow the invalidation messages to be completely processed in peer application
servers, the timing problem can be avoided.

The delayAfterInvalidationMilliseconds data cache configuration can be used to specify how


long a delay should be introduced. But it may be difficult to estimate how much delay should be
introduced. The additional delayAfterInvalidationPercentBuffer configuration can be used to
add an additional delay that is based on how long it typically takes to send and receive an
invalidation message. This setting only has an effect when
the delayAfterInvalidationMilliseconds is greater than zero. These settings can be specified on
a global level, or can be specified for only certain logical caches. For more information about
these settings, see Additional HCL Commerce data cache configuration.

For example, if occasionally accessing stale data is unacceptable,


specifying delayAfterInvalidationMilliseconds=1 and delayAfterInvalidationPercentBuffer=1
00 inserts a delay of 1 millisecond plus twice the length of time it typically takes to send and
receive a message. That should be more than enough time to avoid the possibility of accessing
stale data.

HCL Cache - circuit breakers


HCL Cache implements circuit breakers for the remote cache to protect the application from a
failing or unavailable Redis server.

If a circuit breaker detects a Redis server is failing, it prevents new requests to the Redis server
for a period of time. Circuit breakers are used in addition to high availablity configurations
provided by Kubernetes and Redis itself, such as replicas.

Local Caches

As invalidation messages can neither be sent nor received during a Redis outage, local
caches implement a shorter timeout for new and existing entries. By default, the timeout
during outages is configured to five minutes.
Remote Only Caches
Remote only caches become unavailable during a Redis outage.

The default configuration can be interpreted as follows: If there are at least 20


consecutive failures (as set in minimumConsecutiveFailures), over a period of at
least 10 seconds (as set in minimumFailureTimeMs), break the circuit (prevent new
connections) for one minute (the interval set in retryWaitTimeMs). After that time,
allow new queries to Redis, but if the first two requests
(minimumConsecutiveFailuresResumeOutage) continue to fail, break the circuit
again for another minute.

Redis request time out


Slow Redis requests are not considered failures unless they time out. The Redis
Client has configurations for timeouts and retry attempts including:
timeout: 3000
retryAttempts: 3
retryInterval: 1500
Considering retries, with the configuration above a request will need 16.5 seconds
before returning a failure signal (3000+3*(3000+1500)). Timeouts can be made
more aggresive but that could lead to sporadic errors in the logs.

Circuit breaker configurations

Circuit breaker configurations can be adjusted using the Cache YAML configuration.

The configuration for the circuit breaker is available in the cache YAML file
under redis, circuitBreaker. The maximum timeout for local caches in outage
mode is configured using the maxTimeToLiveWithRemoteOutage element
under localCache, as in the following example:

redis:
circuitBreaker:
scope: auto
retryWaitTimeMs: 60000
minimumFailureTimeMs: 10000
minimumConsecutiveFailures: 20
minimumConsecutiveFailuresResumeOutage: 2
cacheConfigs:
defaultCacheConfig:
localCache:
enabled: true
maxTimeToLiveWithRemoteOutage: 300

Defa
Setting ult Use

scope auto Depending on the topology, circuit breakers must


be configured at the client (single circuit breaker
configuration for all), or cache/ shard level. HCL
Cache automatically selects the scope depending
on the configuration used: cache level is used
when the topology is cluster, and either HCL
Commerce fails to connect to Redis during startup,
or the setting cluster-require-full-
coverage is set to false. Otherwise the scope is set
to client.

minimumConsecutiveFailur 20 The minimum number of consecutive connection


es
attempt failures before a cache can be set in
outage mode. This value,
and minimumFailureTimeMs must be satisfied
before the circuit breaker breaks the Redis
Defa
Setting ult Use

connection. Notice that any successful operation


resets this counter.

minimumFailureTimeMs 1000 The time, in milliseconds, that must elapse before


0 (10 a cache can be put into outage mode. This amount
seco of time, and minimumConsecutiveFailures must
nds) be satisfied before the circuit breaker breaks the
Redis connection.

retryWaitTimeMs 6000 Once a cache is set in outage


0 (60 mode, retryWaitTimeMs is the time, in
seco milliseconds, that must elapse before the Redis
nds) connection is retried.

minimumConsecutiveFailur 2 The minimum number of consecutive connection


esResumeOutage
attempt failures before a cache can be set back
into outage mode. When a connection is in outage
mode and reaches the retryWaitTimeMs value,
the circuit breaker will allow connection attempts
to the Redis server. In order to allow for quick
testing of the connection without an undue excess
of connection attempts,
the minimumConsecutiveFailuresResumeOutage
value is used.
If minimumConsecutiveFailuresResumeOutage is
reached, the connection is placed back into
outage mode, without having to wait for the
entire minimumFailureTimeMs and minimumConse
cutiveFailures condition cycle to be satisfied
once again.

maintenance
implements a number of required maintenance processes.

To support features such as invalidation by dependency ID, the maintains metadata information
for each cache entry. This metadata cannot be expired or evicted by Redis because this would
lead to inconsistencies such as missed invalidations.

implements a number of background processes to maintain the metadata information.

Expired maintenance

Removes metadata for objects that have expired.


Low memory maintenance
Triggers when Redis memory is close to full and removes soonest to expire cache entries
to free up memory.
Inactivity maintenance
Optional maintenance that can be used to remove inactive cache entries.

For more details, see Memory Management in Redis.

The maintenance jobs can add overhead to the Redis servers. It is important
that performance test environments accurately simulate production
environments, exercising the maintenance processes in a similar manner. For
example, if the production environment typically fills up the Redis memory,
the performance environment should do the same. Short tests (e.g. one hour
in duration) might not be long enough to simulate expired and inactivity
maintenance processing conditions.

Self-Adjusting maintenance processes

All maintenance processes implement a similar technique to self-adjust the


speed of maintenance. Executing maintenance too quickly can impact
performance, while if it is done too slowly, new data can be added at a rate
that is faster than it is removed, leading to out of memory (OOM) situations.
For example, expired maintenance adjusts the speed of maintenance
considering the time since expiry of the oldest expired cache entries. If the
time since expiry increases, it means expired maintenance is not running at a
fast enough rate, and the speed is increased.

The maintenance processes also have configurations to determine how many


cache entries are removed at once. This is required because Redis is single-
threaded, and a large maintenance operation can block Redis:

numCacheIdPerLUACall

This is the maximum number of cache entries that will be inspected and processed by a
LUA script. Increasing the number speeds up maintenance but can also block the Redis
thread for a longer period.
numLUACallsInPipeline
The number of LUA scripts that are sent together as a batch. The Redis thread is only
locked during each individual script execution.

LUA is a scripting language supported by Redis for server-side


operations. LUA scripts are atomic and blocking.

Due to the self-adjusting nature of the maintenance processes,


tuning should not typically be required, but performance testing
is critical to confirm they run at optimal speeds.

Expired maintenance
(onlineExpiredEntriesmaintenance)

While Redis automatically removes expired cached values from


memory, the expired maintenance process is responsible for
removing expired cache entries from the metadata (dependency
IDs). This process runs from all the pods and the speed is
determined by the age of the oldest expired entry pending
maintenance.

Expired maintenance cleanup rates

The speed of maintenance adjusts depending on the age of the oldest expired entry. For
example, if the maintenance process finds cache entries that have been expired for
seven minutes, it will use the maintenance configuration for objects from 5-8 minutes,
which cleans at a rate of 20/ second.
newerThan: 180 secs ( 3 mins) inLUA: 1 pipeline: 1 delayMs: 60000 --
speed: 0/sec, 1/min
newerThan: 300 secs ( 5 mins) inLUA: 2 pipeline: 5 delayMs: 500 --
speed: 20/sec, 1,200/min
newerThan: 420 secs ( 7 mins) inLUA: 3 pipeline: 5 delayMs: 125 --
speed: 120/sec, 7,200/min
newerThan: 540 secs ( 9 mins) inLUA: 5 pipeline: 5 delayMs: 100 --
speed: 250/sec, 15,000/min
newerThan: 720 secs ( 12 mins) inLUA: 5 pipeline: 5 delayMs: 50 --
speed: 500/sec, 30,000/min
newerThan: 960 secs ( 16 mins) inLUA: 5 pipeline: 5 delayMs: 25 --
speed: 1,000/sec, 60,000/min
newerThan: ~ ALL ~ inLUA: 5 pipeline: 5 delayMs: 12 --
speed: 2,083/sec, 125,000/min

For details on updating the configuration see Updating the default maintenance values.

Expired maintenance details from the HCL Cache - Remote dashboard:

Low memory maintenance


(onlineLowMemoryEntriesMaintenance)

with Redis does not perform well when memory is full.


Processes, including maintenance processes, can fail with
memory errors "command not allowed when used
memory > 'maxmemory'". To prevent this
situation, monitors the percentage of memory used and
triggers low memory maintenance processing to reduce
the size of each cache. The processing removes both
cached values and their associated cache entry metadata.
The keys selected for removal are those sooner to expire.
The low memory maintenance job is scheduled from all
the pods, but it can only be active from a single container
at any one time.
Low memory maintenance in Redis Enterprise

Due to differences in architecture, Redis Enteprise does not make used memory
statistics available to the application. But this is the trigger the low memory maintenance
process uses to determine when and how much maintenance is required. As a result,
with Redis Enterprise, the softMaxSize configuration must be manually configured for
each cache to define a maximum size in number of entries.
Low memory maintenance default
configurations

The default configurations are as follows. For details on updating the configuration
see Updating the default maintenance values .

Defaul
Configuration t Use

intervalSecs 120 Interval at which the low-memory maintenance job


runs on each pod to check for memory conditions.

maxMemoryPercentage 93 If the percentage of memory used is at or above this


configuration, the maintenance process must
execute.

maintenancePercentageBuffer 5 The percentage of the cache that is removed. For


example, if maxMemoryPercentage is 93%
and maintenancePercentageBuffer is 5%, the target
memory used after maintenance is 88%.

putOperationPausePercentag 5 This percentage is added to


e the maxMemoryPercentage. For example,
if maxMemoryPercentage is 93%
and putOperationPausePercentage is 5%, when used
memory reaches 98%, caches stop inserting to the
remote cache to allow maintenance to catch up.

softMaxSize -1 Used to set a maximum size in entries. It can be used


in combination with maxMemoryPercentage.

softMaxSizeAsPercentFull 93 Used to map the current cache size in entries


for softMaxSize to a cleanupRate that is specified as
a cache percentage. In this case, when the cache size
is equal to softMaxSize, the used % is asumed to be
93%. This makes the calculations equivalent
to maxMemoryPercentage.

Low memory maintenance cleanup rates


The speed of maintenance adjusts depending on the percentage of memory free.
Maintenance starts at 500/ sec. If memory reaches 100%, maintenance could run as fast
as 4,000/ sec.
used: >= 100% inLUA: 5 pipeline: 10 delayMs: 5 -- speed: 10,000/sec,
600,000/min
used: >= 99% inLUA: 5 pipeline: 10 delayMs: 7 -- speed: 7,143/sec,
430,000/min
used: >= 98% inLUA: 5 pipeline: 10 delayMs: 10 -- speed: 5,000/sec,
300,000/min
used: >= 97% inLUA: 5 pipeline: 5 delayMs: 10 -- speed: 2,500/sec,
150,000/min
used: >= 96% inLUA: 5 pipeline: 5 delayMs: 20 -- speed: 1,250/sec,
75,000/min
used: >= 95% inLUA: 5 pipeline: 5 delayMs: 25 -- speed: 1,000/sec,
60,000/min
used: >= 94% inLUA: 4 pipeline: 5 delayMs: 25 -- speed: 800/sec,
48,000/min
used: >= 93% inLUA: 4 pipeline: 5 delayMs: 40 -- speed: 500/sec,
30,000/min
used: >= 0% inLUA: 3 pipeline: 5 delayMs: 40 -- speed: 375/sec,
22,500/min

Inactivity maintenance
(onlineInactiveEntriesMaintenan
ce)

Inactivity configuration enables to track


and evict entries that are not seeing
reuse. By removing idle entries before
their expiry time, total cache memory
use is reduced, helping caches run
more efficiently.

Inactivity maintenance is disabled by


default. It must be enabled by
specifying an inactivity threshold in
minutes (inactivityMins) for each
remote cache. For details see Remote
Cache Tuning Configurations.

Inactivity maintenance runs from all


containers (concurrently).
The default rate of clean up is defined
as follows:

newerThan: 180 secs ( 3 mins)


inLUA: 1 pipeline: 1 delayMs:
60000 -- speed: 0/sec,
1/min
newerThan: 300 secs ( 5 mins)
inLUA: 2 pipeline: 5 delayMs:
500 -- speed: 20/sec,
1,200/min
newerThan: 420 secs ( 7 mins)
inLUA: 3 pipeline: 5 delayMs:
125 -- speed: 120/sec,
7,200/min
newerThan: 540 secs ( 9 mins)
inLUA: 5 pipeline: 5 delayMs:
100 -- speed: 250/sec,
15,000/min
newerThan: 720 secs ( 12 mins)
inLUA: 5 pipeline: 5 delayMs:
50 -- speed: 500/sec,
30,000/min
newerThan: 960 secs ( 16 mins)
inLUA: 5 pipeline: 5 delayMs:
25 -- speed: 1,000/sec,
60,000/min
newerThan: ~ ALL ~
inLUA: 5 pipeline: 5 delayMs:
12 -- speed: 2,083/sec,
125,000/min

Updating the default


maintenance values

Although due to the self-adjusting


nature of the scripts, tuning may not
be required, configurations can be
changed by updating the Cache YAML
configuration files. Configurations can
be changed at the cache level, or for
all caches by
using defaultCacheConfig:

cacheConfigs:
defaultCacheConfig:
remoteCache:

onlineExpiredEntriesMaintenance
:
...

onlineLowMemoryMaintenance:
...

onlineInactiveEntriesMaintenanc
e:

Use the default configuration


in /SETUP/hcl-cache/cache_cfg.yaml
as a starting point.

For list configurations such


as cleanupRate, customizations must
re-define the whole list instead of
individual elements.

Remote cache tuning configurations


in HCL Cache
This document describes cache level tuning configurations for remote caches.

See Cache Configuration for details about how these settings can be applied to custom or default
caches.

Compression

HCL Cache provides the option to use a compression algorithm, LZ4, on the cache key values.
Caches with large keys, such as JSP caching in baseCache can benefit from compression.
Compression reduces the size of the keys in Redis, and reduces network traffic, but it can
increase CPU usage on the client containers. You might see no benefit from enabling
compression on caches with small keys.

Compression is disabled by default. To enable it, add codec: compressing under


the remoteCache element for the desired cache.
cacheConfigs:
baseCache:
remoteCache:
codec: compressing

Sharding
Sharding is available from HCL Commerce Version 9.1.10. The number of shards defaults to
one. To enable sharding, add the shards configuration under the remoteCache element for a
particular cache, with a value higher than one.
cacheConfigs:
baseCache:
remoteCache:
shards: 3
When sharding is enabled, the cache is internally partitioned by the number of shards specified.
For example, if three shards are specified, three shards are created. Regardless of the number
of shards, invalidation processing is still handled at the cache level; each shard processes all
invalidations for its cache.
{cache-demoqalive-baseCache}-invalidation
{cache-demoqalive-baseCache:s=0}-(dep|data)
{cache-demoqalive-baseCache:s=1}-(dep|data)
{cache-demoqalive-baseCache:s=2}-(dep|data)

Because each shard is assigned a unique hash slot, sharding is typically used with Redis
Clustering, since it allows each shard or cache segment to be handled by a different Redis node.
Sharding can be helpful for caches that might overwhelm a single Redis node, either due to their
memory footprint, or the amount of load/operations they generate. baseCache is an example of a
cache that might benefit from sharding.

In a Redis cluster environment, the slot assignment is done considering the namespace, cache
name and shard number. It is not guaranteed that the shards will be evenly distributed across the
Redis nodes, but this issue can be overcome by increasing the number of shards or by using slot
migration.

Impact of sharding on cache operations

Get Put Invalidate Clear

A hashing algorithm is A hashing algorithm is Invalidations A clear operation


applied over the cache- applied over the cache- must be executed must be executed
id to select the shard id to select the shard against all shards. against all shards. This
that the cache entry that the cache entry will This is done in is done in parallel.
should be retrieved be assigned to. Sharding parallel.
from. Sharding has has negligible impact on
negligible impact on "put" operations.
"get" operations.

Parallel processing of invalidate and clear operations


The HCL Cache uses a thread pool to perform invalidate and clear shard operations in
parallel. The thread pool size is configured with
the numAsyncCacheOperationThreads setting, which is configured at the top level of the
YAML configuration:
numAsyncCacheOperationThreads: -1
cacheConfig:
...

numAsyncCacheOperationThreads defaults to -1, which translates to the maximum


number of shards configured for any cache.

When an invalidate or clear operation is executed on a sharded cache, the HCL


Cache attempts to use the threadpool to execute in parallel. If the thread pool has no
queued threads, all shards will be executed concurrently. If the thread pool is in use, and
there are queued tasks, the threadpool is not used and each shard is processed
sequentially. This is to manage a potential case where multiple invalidate operations
are executed concurrently on a sharded cache, which might require a large number of
threads to be active.

Inactivity

Inactivity is available from HCL Commerce Version 9.1.10. It is disabled by default.


The configuration of inactivity enables the HCL Cache to track and evict entries that
are not seeing reuse. By removing idle entries before their expiry time, the total
cache memory used is reduced, which makes the cache run more efficiently.
Because the tracking of inactive entries adds a processing cost, inactivity is not
currently enabled by default. Instead, it is enabled for selected default caches, and
can be enabled for other default or custom caches.

To enable removal of inactive cache entries, specify enabled: true and specify a
number of minutes using the inactivityMins configuration.
cacheConfigs:
baseCache:
remoteCache:
onlineInactiveEntriesMaintenance:
enabled: true
inactivityMins: 30

See Cache Maintenance for details on how inactivity maintenance is performed, and
can be monitored and tuned.

Use of inactivity with local caches

Local caches support inactivity at the cache entry level. Inactivity can be configured using
the cachespec.xml, or programatically with the DistributedMap interface. Inactivity set
by DynaCache is used for local caching only and does not impact the inactivity process
of the remote cache, which must be enabled independently.

Inactivity vs Low Memory Maintenance

Production environments generate large amounts of cache. Certain caches, such as


users or "searchTerm", can grow unbounded and eventually fill up the remote cache
memory (maxmemory). When Redis memory usage is near the limit, Low Memory
Maintenance triggers to maintain the usage under 100%. The use of Inactivity
Maintenance can eliminate or reduce the processing done for Low Memory Maintenance.
The use of Inactivity Maintenance is more efficient as follows:

 Inactivity Maintenance runs continuously, while Low Memory Maintenance only


triggers when the memory is nearly full. Keeping a smaller memory footprint
allows Redis to run more efficiently, including high availability operations such as
persistence and recovery.

 Inactivity Maintenance runs from all containers, while Low Memory Mainteance is
active from only a single container at any one time.

 Low Memory Maintenance must remove a percentage of the cache, and it does
so by selecting entries that are sooner to be expired, even if they may have high
reuse. However, Inactivity Maintenance only removes inactive entries, helping to
retain other high reuse entries.

Inactivity and skip-remote cache directive

The HCL Cache has special configurations called Cache directives that are used with
cache entries to skip local or remote caching. Caches that enable local and remote
caching, and deal with entries that might not see reuse, may benefit from these
configurations. Examples include REST calls that specify searchTerm, faceting and
pagination. Caches that create many cache entries that are not reused can be inefficient.
Disabling remote caching for those caches can help reduce remote cache memory
footprint. From HCL Commerce Version 9.1.10, you can choose to allow these entries in
the remote cache, while relying on Inactivity processing to remove inactive entries.

QueryApp REST Caching for searchTerm

The QueryApp container implements the skip-remote directive for certain "searchTerm"
caches in cachespec.xml. If you enable Inactivity for baseCache, consider allowing
these caches to use the remote cache, by customizing the cachespec.xml file to
remove the following snippets:
<component id="DC_HclCacheSkipRemote" type="attribute">
<required>true</required>
</component>

Optimizations for Cache Clear Operations

Remote cache clear operations work by scanning the


Redis database and deleting all the keys that match the
cache prefix (e.g. {cache-demoqalive-baseCache}-*).
This algorithm is most efficient with large caches, or with
caches that account for a large percentage of the total
keys.

When clearing a small cache, the algorithm must still scan


the whole database for matching keys. As the
cache clear operation is non-blocking, the scanning
might be slow and require many calls to the Redis server.

For small caches, it is more efficient to use


the invalidate logic. The invalidate logic processes
each cache entry and its dependency IDs, but avoids the
complete database scan by using the dependency ID set.
To perform a clear, the invalidate logic is executed with
the &ALL& implicit dependency ID, which is associated with
all the keys in the cache.

The HCL Cache is configured to use


the invalidate algorithm for cache clears with the
setting smallCacheClearOptimizationMaximumSize,
which is enabled by default with a value of 50000.

smallCacheClearOptimizationMaximumSize is available
since 9.1.6.

cacheConfigs:
baseCache:
remoteCache:
smallCacheClearOptimizationMaximumSize:
50000
HCL Cache configurable
Prometheus metrics
The HCL Cache provides cache level configurations to customize the metrics created for the
Prometheus integration.

Although changes are not typically required, if you are integrating with a 3rd-party monitoring
system and there is a cost associated with the retrieval or storage of metrics, these
configurations can be used to fine-tune the metrics to be used.

Cache configurations

Metrics are configurable at the cache level. Changes can be applied to a single cache, or to the
default configuration using defaultCacheConfig. See cache configuration for details.

Enabling or disabling metrics for a cache

Disable metrics for a cache using the enabled attribute as follows:


defaultCacheConfig:
metrics:
enabled: false
Timer metrics histogram buckets

The Timer metrics used by the HCL Cache support histograms for the calculation of
percentiles. The tracking of histogram values requires the definition of additional metrics.
This support can be disabled to reduce the amount of metrics created.
hclcache_cache_clears_total{cachespace="demoqaauth",name="baseCache",scope
="local",} 100.0
hclcache_cache_clears_duration_seconds_sum{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 1.3296758
hclcache_cache_clears_duration_seconds_max{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 0.0897587
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="1.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="3.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="5.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="7.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.001",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.003",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.005",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.01",} 23.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.05",} 99.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.1",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.5",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="+Inf",} 100.0
hclcache_cache_clears_duration_seconds_count{cachespace="demoqaauth",name=
"baseCache",result="ok",scope="remote",} 100.0
The default histogram configuration is as follows:
defaultCacheConfig:
metrics:
timerNanoBuckets:
- 100000 # 0.1 ms
- 300000 # 0.3 ms
- 500000 # 0.5 ms
- 700000 # 0.7 ms
- 1000000 # 1.0 ms
- 3000000 # 3.0 ms
- 5000000 # 5.0 ms
- 10000000 # 10.0 ms
- 50000000 # 50.0 ms
- 100000000 # 100.0 ms
- 500000000 # 500.0 ms
Values are in nanoseconds.

The histogram buckets can be disabled by specifying an empty list:


defaultCacheConfig:
metrics:
timerNanoBuckets: []

If disabled, percentile calculations will no longer be available in the HCL Cache -


Remote Grafana dashboard.

Use of common metrics for all caches


The number of metrics can also be reduced by using a combined Timer for all
caches. This change is incompatible with the HCL Cache dashboards and can be
inaccurate when used with Redis cluster.
defaultCacheConfig:
metrics:
addCacheNameLabelToTimers: false
Website performance tuning
There are four steps for evaluating performance of an HCL Commerce website that is based
on Transaction server.

 Identifying the workload

 Data monitoring and analysis

 Top-down tuning approach

 Closed-loop cycle

Identifying the workload

The workload defines how the performance of a system is evaluated. A workload should have
the following characteristics:

 Measurable: A metric that can be quantified, such as throughput and response time.

 Reproducible: The same results can be reproduced when the same test is run multiple
times.

 Static: The same results can be achieved no matter how long you execute the run.

 Representative: The workload realistically represents the stress to the system under
normal operating considerations.

Data monitoring and analysis

Improving performance is always a matter of identifying where the bottleneck is, and changing
your system configuration to avoid it. Monitoring system performance and identifying problems
are the most essential skills to ensure good performance of the system. All tools have strengths
and weaknesses. Some tools might alter the flow and the timing of the applications, but they
provide information to the developer and the system administrator, such as Rational Application
Developer's Profiling function. Other tools have minimal impact on the overall system but provide
little information, or offer specific information that might not be helpful to identify the source of the
problem.

Top-down tuning approach

To save time, adopt a top-down approach by changing the system level first, followed by the
application level, and then to the programming level. By removing the inefficiencies from the top
level, the underlying problems at the lower level might be minimized.

Levels Tuning

More processors with faster speed


System More Memory
Faster Network
Faster Disk
Levels Tuning

Operating system level configuration

Application Web server configuration


Application server configuration
Commerce application configuration
database configuration

Programming HCL Commerce programming guideline


Java performance tips

The system level consists of components such as processors, memory subsystem, network
configuration, and disk subsystem. Bottlenecks at this level are easier to identify and address by
modifying the hardware configuration or operating system level optimization.

Closed-loop cycle
The closed-loop cycle is a method for implementing the top down tuning approach. This method
prescribes the way to gather and analyze data, come up with ideas for resolving issues,
implement enhancements, and test results. The process is driven by data, and the results from
one iteration of the loop drive the next iteration of the loop.

A closed-loop cycle must be done in a controlled environment where each change is


documented and can be undone later. In theory, after many iterations, since only the
enhancements with positive test results are being used, the overall HCL Commerce site's
performance will improve.

Data Load utility performance tuning


Scheduled Data Load utility jobs can affect HCL Commerce performance. You can reduce the
impact of this process by tuning the data load performance appropriately for your
implementation.

When you are considering how to tune your Data Load process, ensure that you review the Data
Load summary reports that generate after you run the Data Load utility. These reports can be
used to identify what elements of the Data Load process that require tuning to improve
performance.
Before you begin
Ensure that you are familiar with and understand the following concepts and tasks that are
related to the Data Load utility:

 Running the Data Load utility

 Configuring the Data Load utility

To tune your Data Load utility process, you can tune the following processes, parameters, and
caches:

 Data Load parallelization

 Data Load mode

 File difference preprocessing

 Validation options

 ID resolver cache

 Batch size and commit count

 Java virtual machine (JVM) heap size

 Network latency

 Database tuning

Data Load parallelization

When used with CSV formatted data, Data Load can be run with multiple writer threads. This can
drastically increase load performance for large sets of data that are formatted with considerations
made for parallelization.

For more information on Data Load parallelization, see Data Load parallelization.

Data Load mode

The Data Load mode parameter is used to set the type of load process that the Data Load utility
is to run. You can set this mode to be Insert, Replace, or Delete in the wc-dataload.xml file
for the data you are loading. Typically, Replace is used, however Insert and Delete can run
faster. Running the utility in Insert or Delete mode does not require as many Ids to be resolved
with the ID resolver utility. When you are using Insert or Delete, ensure that these actions are the
only database operations that are required by your CSV file.

File difference preprocessing

You can run a file difference preprocess for routine data loads to improve the Data Load utility
performance for loading these files. By using this preprocessor that you can compare two input
files, such as a previously loaded file and a new version of this file. The preprocessor generates
a difference file that contains only the records in the new file that are not within the old file or that
are changed from the records in the old file. The Data Load utility can then load this difference
file. If your routinely loaded files contain many previous loaded records, then running this file
difference can result in shorter load times. Running a file difference can reduce the loading time
that is required to load your routine updates to your HCL Commerce database, reduce server
usage time, and improve server performance.

You can configure the Data Load utility file difference preprocessor to compare files by the
values in each column, instead of entire records, to identify the changed records. You can also
configure the file difference preprocessor to ignore specific columns when the process is
comparing files.

For more information about this preprocessor, see Data Load file difference preprocessing.

Validation options
Configuring the Data Load utility to validate the data that you are loading can affect your Data
Load performance. The validation of the data your are loading is performed against your HCL
Commerce database. If you are validating many records, or the validation process encounters
many invalid records, this validation process can affect performance. By default, the following
validations options are available as configurable properties for the Data Load utility:

attributeValueValidation

Indicates whether to validate the attribute value. The attribute value is mandatory except
within a product and defining attribute relationship.
validateAttribute
Validates whether a SKU and a product have compatible defining attributes when the
SKU is moved under the product. The validation logic determines whether the attributes
or allowed values that are to be created, updated, or deleted belong to the current store.
validateCatalog
Validates whether more than one master catalog is being created for a store. If the store
supports sales catalogs, the validation checks whether a catalog entry belongs to more
than one master category. The validation also checks whether an attribute allowed value
can be set to default in the current store.
validateCatalogEntry
Validates whether to check the types of the SKU and product when the Data Load adds a
SKU under a product. This check is to make sure that the SKU is really a SKU and the
product is really a product.
validateCatalogGroup
Validates whether a catalog group belongs to a specified catalog.
validateUniqueDN
Validates the uniqueness of the distinguished name (DN) to identify a user in CSV file. By
default, to optimize data load performance, users in the CSV file are identified by the
logon ID instead of the distinguished name.
If the data that you are loading does not require any of the
validation processes to occur, ensure that the configurable
property is set false to ensure that no validation performs.
ID resolver cache
If the ID resolver cache size is large enough, the cache
can store all of the required IDs for a database table that
data is being loaded into. If the size of this cache is not
large enough to store all of the required Ids, then none of
the IDs are cached. When the IDs are not cached, the
data load process requires that the IDs are resolved
directly against the database. When you are configuring
your cache setting, consider the following behavior:

 To cache all of the IDs for a large table, the ID


resolver cache can require a significant amount of
time. If you are loading few records into a table
with many individual records and IDs to cache,
directly resolving the IDs against the database can
require less time than caching all of the IDs for an
entire table and resolving the IDs against the
cache. You can configure the cache size to ensure
that the cache is too small to store the IDs for
large tables, but still large enough size to cache
smaller tables to resolve the IDs against the
cache. By reducing the size of this cache, you can
reduce the time that is spent caching IDs for an
entire table that you are loading only a few records
into.

 The ID resolver cache is cleared after a load item


is completed in the load order configuration file. If
you are running multiple CSV files, then the cache
must be repopulated after each item completes. If
multiple CSV files load data into the same tables,
consider merging the files wherever possible to
reduce the caching time for repopulating the same
table ID data.

To tune your Id resolver cache, set an initial value that


ensures all of the tables that you are loading data into can
be cached. Then, set a second Data Load scenario with
the Id resolver cache set to 0. In both instances, pass in
the parameter -
Dcom.ibm.commerce.foundation.dataload.idresolve.le
vel=FINE when you call the Data Load utility to run. This
parameter adds the resolving times to the trace, including
the time that is required to populate the cache (if it is not
set to 0), and the time that is required to resolve the Ids
against the database or cache. With these times, you can
identify whether you can increase or decrease your
caching of Ids and reduce the time that is required to
resolve the Ids for the data your are loading.

The size of the Id resolver cache and is set in the Data


Load environment configuration file. The following is a
sample declaration of a cache that can store 1 million
records:
<_config:IDResolver
className="com.ibm.commerce.foundation.dataload.
idresolve.IDResolverImpl" cacheSize="1000000"/>
To help tune the ID resolver cache, review the data load
summary report after a data load completes. This
summary includes the total time that is necessary for the
ID resolver to resolve and check the IDs for the objects
that are loaded. The Data Load utility can also be
configured to output more information about the data load
ID resolver process. This addition to the summary report
includes the following information:

 The time that it takes to fetch and load the IDs for
a table into the ID resolver cache

 The number of entries per table that are stored in


the cache

 The number of hits to the cache per table to


resolve IDs

 The time that it takes to resolve IDs for a table


directly from the database table

 The number of hits to the database to resolve IDs


for a table

With this information, you can identify whether it would be


more efficient to resolve and check IDs for a table against
the ID resolver cache or directly against the database.
You can then adjust the size of your ID resolver cache as
needed, or exclude IDs from certain tables from being
included within the cache. For more information about the
ID resolver information that can be included in the data
load summary report, see Verifying the results of the data
load.

Batch size and commit count

Change the Data Load utility batch size and commit count
parameters to reduce the effect of network latency and
reduce the processing load on the Transaction server.
The commit count must be a multiple of the batch size.
The commit count parameter specifies the number of rows
that are flushed in a single transaction before a commit is
issued. Database transactions are loaded in batches.
These batches are kept in the Java memory until there
are enough rows stored for a flush. Then, the batch
contents are stored to the database as a single packet of
data. The batches are stored in the database until the
next commit occurs and the database loads the changes.

By increasing the batch size, you can reduce the effect


that network latency can have on the Data Load process.
Increasing the batch size can reduce the number of
batches that are required to be sent to your database. The
wait time for the database response can also be reduced
by increasing the batch size.

By increasing the commit count, you can reduce the


processing load on the Transaction server. Increasing the
commit count increases the load on the database.
Increasing the commit count causes more records to be
committed to the database in a single transaction. This
increase results in less uncommitted data that remains
stored on your Transaction server and fewer overall
transactions that are required to commit the data.

The values for the batch size and commit count


parameters are set in the load order configuration file and
are typically in the 500-1000 range. The following is a
sample declaration of the batch size and commit count
parameters:
<_config:LoadOrder commitCount="1000"
batchSize="500" dataLoadMode="Replace">

Java virtual machine (JVM) heap size

When you are tuning the ID resolver cache, you can


adjust the JVM heap allocation. The JVM heap size must
be proportionate to the ID resolver cache. For instance, if
the ID resolver cache is large, specify a large JVM heap
size. If the ID resolver cache is large, the Data Load utility
does not resolve the ID from the database directly. The ID
resolver cache however might use much of the JVM heap
memory. For 1 GB JVM heap size, set the ID resolver
cache size to be less than 2 million to prevent a Java out
of memory error. If you encounter an out of memory
exception during the data load process, the allocated JVM
heap size might be too small. Ensure that the JVM heap
size is sufficient to accommodate the ID resolver cache
and batches that are store in memory. Set the value for
the JVM heap size in the following parameters within the
Data Load utility, dataload.sh file: -Xms1024m -
Xmx4096m

Network latency

If your environment is configured to have a large physical


distance between servers, the data load process can be
impacted. Your network latency can affect the flush time
when the Data Load utility runs. The flush time can be
viewed in the Data Load utility summary report. The flush
time includes the time that is required to transport a batch,
including the time that is required for the database to
respond. This flush time also includes network latency. If
you are experiencing a large flush time, your system can
be experiencing either poor database or network
performance, or both. If you tune your environment and
the Data Load performance is not within the required
performance range, then, installing HCL Commerce on a
local system might be necessary to improve Data Load
utility performance.

Database tuning
By tuning the database, you can improve the performance
of the Data Load utility by reducing the time that is
required to commit records. You can view the time that is
required to commit loaded data records in the Data Load
utility summary report. There are many performance
tuning tools available for improving database
performance. For more information about database
performance tuning, see:

 Database (DB2) performance


considerations

 Database (Oracle) performance


considerations

Database (DB2) performance


considerations
The database is usually one of the potential areas for bottlenecks that makes WebSphere
Commerce unable to scale and perform well. It is therefore crucial that the database is tuned for
your implementation.

Note: HCL Commerce ships with default DB2 optimization settings, such as optimization levels
and optimization profile registries. It is highly recommended that you thoroughly test any changes
that are made to the optimization settings in a production-like environment, before you use them
in a production system. Changing the optimization settings can affect the overall performance of
the application, either immediately or later, such as when the data volume increases or the data
distribution changes.

Physical environment considerations

Considerations for the physical environment are related to how the data is spread among the
disks and how the memory is managed for the databases.

Layout on disk

Reading from the database and writing back to the database (disk I/O) can become a bottleneck
for any application that is accessing a database. Proper database layout can help reduce the
potential for this bottleneck. It is a significant effort to change the physical layout of the database
once it is created, hence proper planning at the initial stages is important.

The first consideration is to ensure that the DB2 transaction logs reside on their own physical
disk. Every update that is issued against the database is written to the logs (in addition to being
updated in memory). Hence there is a lot of disk I/O in the location where the DB2 transaction
logs reside. It is a good practice to try to ensure that all read/write activity on the disk is related
only to the transaction logs, thus eliminating any I/O contention with other processes that might
access the disk.

To set the location for the DB2 transaction logs, issue the following command:

db2 update db cfg for dbalias using NEWLOGPATH path

Before the logs are stored in the location that is specified, disconnect all sessions, or deactivate
the database by issuing the db2 deactivate command.
The second consideration in terms of disk layout is to determine how to manage the table spaces
efficiently. One performance principle in the management of Relational Database Management
Systems (RDBMs) is to separate the database table data and database index data onto different
physical disks. This separation enables better query performance, since index scans can execute
in parallel with data fetch operations because they are on different physical disks.

In DB2, automatic storage table spaces are used by default. That is, the System Managed
Storage (SMS) and Database Managed Storage (DMS) table space types are deprecated for
user-defined permanent table spaces and might be removed in a future release.

Automatic storage table spaces are the easiest table spaces to set up and maintain, and are
recommended for most applications. They are particularly beneficial when:

 You have larger tables or tables that are likely to grow quickly

 You do not want to have to make regular decisions about how to manage container
growth.

 You want to be able to store different types of related objects (for example, tables, LOBs,
indexes) in different table spaces to enhance performance.

For more information, see Comparison of automatic storage, SMS, and DMS table spaces.

Memory

DB2 associates memory for the database through buffer pool objects. A buffer pool has a page
size that is associated with it and is linked to one or more table spaces. Thus, if table spaces of
different page sizes are created, then buffer pools corresponding to the different page sizes are
required.

While you can create multiple buffer pools having the same page size, it is recommended that
only one buffer pool per page size be created, for the most efficient usage of memory on the
database server.

The question is always, how much memory to assign to the buffer pools. For DB2 32-bit
implementations, there is a limit, based on the operating system, that can be available for buffer
pools.

Assuming a dedicated database server, allocate a large proportion of memory available on the
server, about 75% to 80%, but not exceeding the platform limits.

Note that for 64-bit implementations of DB2, the limits are increased. In this case, the buffer pool
hit ratio would need to be monitored to determine the optimal setting for the buffer pools. You can
also monitor the hit ratio for 32-bit implementation using database snapshots using the following
command:

db2 get snapshot for database on <dbalias>

The output that is generated contains some statistics on buffer pool logical and physical reads:

Buffer pool data logical reads = DLR


Buffer pool data physical reads = DPR
...
Buffer pool index logical reads = ILR
Buffer pool index physical reads = IPR

In this output, DLR, DPR, ILR, and IPR have actual values. The hit ratio can be computed using
the following formula:

(1 - (( DPR + IPR) / (DLR + ILR))) * 100%

The size of the buffer pool can be changed using the ALTER BUFFERPOOL command, or
the BUFFPAGE parameter if the size of the buffer pool is set to -1.

Additional tuning and configuration database parameters

There are many parameters to consider for performance. This section describes a subset of
these that are considered important for HCL Commerce implementations. To set the values for
the parameters, the following command can be used:

db2 update db cfg for <dbalias> using <paramname> <paramvalue>

Parameters related to memory

The database heap (DBHEAP) contains control block information for database objects (tables,
indexes, and buffer pools), as well as the pool of memory from which the log buffer size
(LOGBUFSZ) and catalog cache size (CATALOGCACHE_SZ) are allocated. Its setting depends
on the number of objects in the database and the size of the two parameters mentioned.

In general, the following formula can be used to estimate the size of the database heap:

DBHEAP=LOGBUFSZ + CATALOGCACHE_SZ + (SUM(# PAGES in each bufferpool) * 3%)

The log buffer is allocated from the database heap, and is used to buffer writes to the transaction
logs for more efficient I/O. The default size of this setting 128 4K pages. A recommended starting
point for the log buffer size (LOGBUFSZ) in HCL Commerce implementations is 256.

Parameters related to transaction logs

When you are considering values for the transaction log file size (LOGFILSIZ) and the number of
primary (LOGPRIMARY) and secondary (LOGSECOND) logs, some generalizations for OLTP
applications can be applied. A high number of short transactions are typical in OLTP systems,
hence the size of the log file should be relatively large, otherwise more processing time is spent
managing log files, rather than writing to the transaction logs. A good starting point for the size of
the log file in HCL Commerce implementations is to set the value to 10000.

Primary log files are allocated when the database is activated, or on the first connect. If a long
running transaction fills up all the primary logs, then secondary logs are allocated as needed until
the LOGSECOND limit is reached. The allocation of a secondary log file is a significant
performance hit, and should be minimized if it cannot be avoided.

To determine the right settings for these parameters, you need to monitor the database and see
whether secondary log files are being allocated. If they are, then you need to increase the
number of primary log files. You can monitor by taking a database snapshot and look for the
following two lines:

Maximum secondary log space used (Bytes) = 0


Secondary logs allocated currently = 0

A good starting point for the number of primary log files (LOGPRIMARY) is anywhere from 6 - 10.
Parameters related to disk I/O

In addition to physical disk layout, several tuning parameters can be manipulated to affect disk
I/O. Two key parameters are NUM_IOSERVERS and NUM_IOCLEANERS.

NUM_IOSERVERS specifies the number of processes that are launched to prefetch data from
disk to the buffer pool pages. To maximize read parallelism, this parameter should be set to the
number of physical disks that are being used by the database, to enable reading from each disk
in parallel.

NUM_IOCLEANERS specifies the number of processes that are launched to flush dirty buffer
pool pages to disk. To maximize usage of system resources, this parameter should be set to the
number of CPUs on the system.

The frequency of how often dirty buffer pool pages are flushed to disk can be influenced by the
CHNGPGS_THRESH parameter. Its value represents the limit, in the form of a percentage, that
a buffer pool page can be dirty before a flush to disk is forced. For OLTP applications, a lower
value is recommended. For HCL Commerce implementations, the value should be set to 40.

One final parameter to consider is MAXFILOP. It represents the maximum number of files DB2
can have open at any time. If this value is set too low, valuable processor resources are taken up
to open and close files. This parameter needs to be monitored to be set to the correct value, but
a good starting point is to set this value to 128. You can monitor by taking a database snapshot
and looking at the following line:

Database files closed = 0

If the value monitored is greater than zero, then the value for this parameter should be increased.

Parameters related to locking

Reducing locking contention is key to performance. Several parameters exist to influence locking
behavior. The total amount of memory available to the database for locks is defined by the
LOCKLIST parameter. The MAXLOCKS parameter defines the maximum amount of memory
available for each connection to the database. It is represented as a percentage of the
LOCKLIST.

The size for both of these parameters need to be adjusted to avoid lock escalations. A lock
escalation occurs when all of the memory available to a connection is used, and multiple row
locks on a table are exchanged for a single table lock. The amount of memory that is used for the
first lock on an object is 72 bytes, and each additional lock on the same object is 36 bytes.

A good starting value for LOCKLIST can be approximated by assuming that a connection
requires about 512 locks at any time. The following formula can be used:

LOCKLIST = (512 locks/conn * 72 bytes/lock * # of database connections) / 4096


bytes/page

MAXLOCKS can be set to 10 - 20 to start. Further monitoring is necessary to adjust both of these
values. In the database snapshot output, look for the following lines:

Lock list memory in use (Bytes) = 432 Lock escalations = 0 Exclusive lock
escalations = 0

If lock escalations occur (value higher than 0), increase the locklist to minimize the escalations
and increase the MAXLOCKS value to increase the limit of how much of the LOCKLIST a
connection can use.
Best practices

Here are some of the most common best practices for any IBM DB2 UDB implementation.

Reorganizing data in table spaces

When a high number of inserts, updates, or deletes operations are performed against a table in
the database, the physical placement of the rows and related indexes might not be optimal. DB2
provides a utility to reorganize data for a table:

db2 REORG TABLE <tabschema>.<tabname>;

DB2 also provides a utility to check whether a table or index data needs to be organized. While
connected to a database, the following command can be issued:

db2 REORGCHK

This command checks all tables in the database and produce a listing, first by table and second
by index. In the listing, an asterisk('*') in any of the last three columns implies that the table or
index requires a REORG.

Collecting statistics

Each SQL statement that is submitted to the database is parsed, optimized, and a statement
access plan is created for execution. To create this access plan, the optimizer relies on table and
index statistics. In order for the optimizer to generate the best access plan, up-to-date statistics
are required. Collecting statistics frequently (or at least when a significant amount of data
changes) is a good practice.

To collect statistics for a table, the following command can be issued:

db2 RUNSTATS ON table <tabschema>.<tabname> WITH DISTRIBUTION AND DETAILED INDEXES


ALL;

Statistics on the catalog tables should also be collected.

Database (Oracle) performance


considerations
The database is typically one of the potential areas for bottlenecks that makes HCL
Commerce unable to scale and perform well. Therefore, it is crucial that the database is tuned for
your implementation.
If you are using Oracle database as your HCL Commerce database, consider the following
performance tuning recommendations:

 Keep your database statistics up-to-date:

Gathering statistics on the HCL Commerce schema object helps the database choose
the best execution plan for SQL queries. When you run an SQL query, the database
converts the query into an execution plan, and chooses the best way to retrieve data. For
the Oracle database to choose the best execution plan for an SQL query, it relies on
these statistics information about the tables and indexes in the query. Choosing the best
execution plan for your SQL queries helps improve the performance of the database,
improving HCL Commerce performance.
You are recommended to use the DBMS_STATS package instead of the ANALYZE
command to gather your database statistics. From the SQLPlus prompt, run the following
commands:

exec dbms_stats.gather_database_stats;1
exec
dbms_stats.gather_schema_stats( ownname=>'schema_name',granularity=>'ALL',
DEGREE=>3, OPTIONS=>'GATHER',CASCADE=>TRUE);2

1. The first command line gathers statistics for the entire database.
2. The second command line gathers statics for a schema; where schema_name is
the name of your HCL Commerce schema.

For more information about using the DBMS_STATS package, see the Oracle
documentation.

 Review and verify your need for indexes on order-processing related database tables
and other tables where block contention occurs during high-peak workloads.

For an Oracle database, high-peak workloads might adversely affect performance during
order processing. This effect can occur due to block contention on an index that is
defined for the ORDERS database table or other tables that are queried or updated
frequently during high-peak workloads. For instance, contention can occur on an index
that is defined for the STORE_ID column of the ORDERS table. If the index where block
contention occurs does not provide significant performance benefits for your site, such as
to improve queries against the associated table, consider dropping the index.

Before you drop the index, verify through performance reports, such as Automatic
Workload Repository (AWR) reports, that the benefits for your site from keeping the index
are insignificant. If you have verified that the benefits for keeping the index are not
significant, you can drop the index.

Overview of the Data Load utility


The Data Load utility is an enhanced business object based loading utility. This utility provides an
efficient solution for loading information into your HCL Commerce database. You can also
customize the Data Load utility to load other types of data. The Data Load utility is the
recommended loading utility.

The Data Load utility supports loading data into a workspace. When you load data into a
workspace you can make and preview changes to managed assets, without affecting what is
running on your site.

The following general user roles interact with the Data Load utility:

Business user

Responsible for managing the business data.


Developer
Responsible for defining the data source template, business object mappings, and
customizing the Data Load utility.

Site administrator
Responsible for the day-to-day operation of the Data Load utility.

Data Load utility - user interaction diagram

The following diagram describes how the user roles interact with the data
load utility:

1. The business user provides the developer with the business data.

2. The developer creates a data source template, which defines how


source data must be formatted before the data is loaded.
3. The developer also creates the business object configuration file. The
business object configuration file defines how the Data Load utility
maps the input data to the business object and how to transform the
business object to physical data.

4. The site administrator uses the business object configuration file to


define and create the load order configuration file.

5. The site administrator sets the store and database settings in the
environment configuration file.

6. The business data is formatted according to the rules of the data


source template before the data is loaded to the database.

7. The formatted source data is provided to the site administrator.

8. The site administrator runs the Data Load utility along with the three
configuration files (environment, load order, and business object
configuration files) to load the formatted source data into the HCL
Commerce database. After the utility runs, the site administrator also
verifies the results of the load.

9. The business data is available in HCL Commerce to be managed by


the business user.

For more information about how the Data Load process is structured,
see Data Load utility architectural overview. For more information about how
the Data Load utility works and what components are included in the process,
see Data Load utility framework process and components.

Running the Data Load utility

To use the Data Load utility to load data into your HCL Commerce database,
you must first configure or create the required files. To run the Data Load
utility a business object configuration file, load order configuration file,
environment configuration file, and data source file are all required. For more
information, see Configuring and running the Data Load utility.
Best practices
When you are using the Data Load utility to load data, follow these best
practices:

 Data Load utility best practices

If the data you are loading is catalog, price, inventory, member, or Commerce
Composer:

 General data load best practices

 Data Load best practices for Catalog

 Data Load utility best practices for Inventory

 Data Load best practices for Price

 Data Load best practices for Member

 Data Load utility best practices for Commerce Composer

Data Load utility limitations

 You cannot use Spring Data JPA finders in Data Load


components. Instead, the Data Load framework provides a
class, DBPreparedStatement, with which you can issue SQL
commands to fetch data directly from the database.
 If you run the Data Load utility in parallel/multi-threaded mode, i.e. by
setting multipleThreadsEnabled="true", the input CSV files must have
header information. In addition, ensure that the corresponding wc-
loader-object.xml file contains the

property firstLineIsHeader="true". If the header information is absent,


then the Data Load utility can give an error while reprocessing those
CSV files that contain error records. For more information about this
setting, see Data Load parallelization.
 Newly created business objects might not show up immediately in the
storefront due to caching. To resolve this problem, invalidate the
dynamic cache manually. For more information about cache
invalidation, see Removing cache entries through the Cache Monitor.
 When you update business objects, the changes might not show up in
the Management Center. This issue can occur due to caching. To
resolve this problem, invalidate the data cache.
 You can load data in only CSV and XML formatted files. By default,
most sample configuration files that are provided with HCL
Commerce are configured to load CSV files. To load XML files, you
must configure the utility to use the XML data reader. Your XML input
files must also use a CSV-like structure. For more information,
see File format for Data Load input files.

If you want to load data in other formats, or in an XML structure that is


not supported by the default XML data reader, you must create your
own custom data reader.

 You must configure the Data Load utility to use a business object
mediator to map your input file data with the appropriate HCL
Commerce business objects. By default, you are provided with
business object mediators for loading data for business objects in the
following components:
o Catalog

o Inventory

o Price and Catalog filter

o Member

o Location

o Commerce Composer

o Promotions

o Marketing
To load other data, you can use
the com.ibm.commerce.foundation.dataload.businessobjectme
diator.TableObjectMediator, or you can create your own custom
business object mediators.

 The TableObjectMediator does not support workspace locking.


 If your site uses HCL Commerce search, the delta search index might
not rebuild correctly when you delete some catalog objects with the
Data Load utility in delete mode. When you delete a child object of a
catalog entry or category with the utility in delete mode, both the child
and parent object are removed from the delta search index rebuild.
This removal can cause the parent catalog entry or category to no
longer be indexed or display correctly in the storefront.

Use the utility in replace mode to delete catalog objects when your
site uses HCL Commerce search. To delete objects with the utility in
replace mode, include the value 1 for the Delete column of an object
in your input file. If you do decide to delete catalog objects with the
utility in delete mode, run a full index rebuild after the load operation
completes.

 The member component business object mediators do not support


the following actions with the Data Load utility:
o Deleting users or organizations

o Loading user passwords

o Modifying the parent organization of a member

o Modifying the distinguished name (DN) of an organization

o Modifying the entity type of an organization

 When you are loading userData into custom extension database


tables, you cannot load data into columns that have a data type that is
not supported by the utility.
 Run a separate data load for each data type. If you perform a single
data load for two different data types (for example, product, category),
then it impacts the search index by producing unpredictable outcomes
in it on performing certain business tasks (NRT update) using
the Management Center. To resolve this, you need to then re-index
the data after the successful completion of the data load into the
database.

Procedures and samples


The following topics are available to help you learn more about the Data Load
utility.

The following table lists topics by role and main task.

Role Topic Summary

Site File format A data load input file contains the actual information that
Administra for Data the Data Load utility populates into your database. Learn
tor Load input about how to construct such files to ensure that the
files loading process is successful.

Creating Learn how to create a data load input file in the


data in supported XML format.
XML
format

Creating Learn how to create a data load input file in the


data in supported CSV format
CSV
format

Configurin If you routinely load the same generated Data Load input
g the Data file from an external system or source, you can choose to
Load run a file difference preprocess. You can run this
utility to preprocess as part of the Data Load process to ensure
run a file that you are loading only new changes when you load
difference your newest input file.
preproces
s

Configurin The CVS data reader is already provided with the Data
g the CSV Load utility. Learn how to configure the provided CSV
data data reader to change the way data is read from your CSV
reader input files.
The following table lists topics by role and main task.

Role Topic Summary

Configurin An XML data reader is provided by default with the Data


g the XML Load utility. Learn how to configure this provided data
data reader to change the way data is read from your XML
reader formatted input files.

Configurin The BaseBusinessObjectBuilder and TableObjectBuil


g the der business object builders by defining sub elements
componen and attribute values in the business object configuration
t business file.
object
builder

Configurin Learn how to configure the business object configuration


g the file that defines how to load the data into the database.
business In this file, you specify the implementation classes for
object your Data Reader, Business Object Builder, and Business
configurati Object Mediator components.
on file

Configurin The data load order configuration file controls the load
g the data order of the Data Load utility. Learn how to configure
load order your data load order file.

Configurin You can configure a column exclusion list that causes the
g a column Data Load utility to avoid loading data into the specified
exclusion columns of a table.
list

Configurin Learn how to configure the environment variables that


g the data are used by the Data Load utility in your environment
load settings file.
environme
To configure the Data Load utility to load data into a
nt settings
workspace, you must add the workspace attribute to the
business context in the environment settings file.

Note: When you load data into a workspace, the Data


Load utility respects the locking policy set in the
workspace. For more information about workspace
locking policies, see Workspaces locking policies.

Data Load Learn how to run the utility command that runs the Data
utility Load process
command
syntax

Verifying Learn how to verify that a load operation with the Data
The following table lists topics by role and main task.

Role Topic Summary

the results Load utility is successfully completed.


of the
data load

Loading Load data into workspaces with the data load utility
data into
workspace
s using the
Data Load
utility

Loading Load data for attribute values for attributes with a single
values for value and attributes with multiple values
single and
multiple
value
attributes

Reuse Configure the utility to reuse assigned values for


attribute attributes when the same value is needed for multiple
assigned catalog entries.
values
with the
Data Load
utility

Loading Load data to create or change promotions for a store.


promotion
s with the
Data Load
utility

Loading Load a list of email addresses to create a customer


customer segment and populate the segment with members that
segment are associated with the email address.
members
by email
address
with the
Data Load
utility

Loading Load members into a member group based on the


member member email addresses.
group
The following table lists topics by role and main task.

Role Topic Summary

members
by email
address
with the
Data Load
utility

Configurin Use the HCL Commerce Administration Console to


g a Data configure the Data Load utility to run as a scheduled job.
Load
utility
scheduler
job

Loading Load data to create marketing activities, campaigns,


marketing customer segments, content, attachments, and e-
objects Marketing Spots.
with the
Data Load
utility

Samples Scenario: The initial load is the scenario when you finish creating
Initial load and configuring a new HCL Commerce instance; you then
load your initial data into the HCL Commerce database.

Scenario: The delta load scenario is when your HCL


Delta load Commerce server is up and running, then you want to
insert, update, or delete your catalog, inventory, or price
data.

Scenario: The utility can run in an update mode to update catalog


Catalog entry data. The update mode replaces or adds data for
entry only the columns that are specified in the input file. All
update other columns remain unchanged.
load

Scenarios: These scenarios detail the workspace locking policies that


Workspac are in effect when the Data Load utility loads data into a
e locking workspace. These locking policies affect the loading of
for the data into the workspace database.
Data Load
utility

Sample: This sample demonstrates how to set up the Data Load


Setting up utility for its first use.
the Data
The following table lists topics by role and main task.

Role Topic Summary

Load
utility

Catalog These samples use CSV or XML files to demonstrate how


samples to run the Data Load utility to load catalog data

Inventory These samples use CSV files to demonstrate how to run


samples the Data Load utility to load inventory data.

Member These samples use CSV or XML files to demonstrate how


samples to run the Data Load utility to load member data.

These samples include a sample for loading customer


segment member data.

Price These samples use CSV files to demonstrate how to run


samples the Data Load utility to load price data.

Location These samples use CSV files to demonstrate how to run


samples the Data Load utility to load location data.

Store This sample uses a CSV or XML file to demonstrate how


configurati to load configuration name-value pair properties for a
on store.
samples

Commerce These samples use CSV file to demonstrate how to


Composer load Commerce Composer assets with the Data Load
samples utility.

Promotion These samples use input files to demonstrate how to load


samples promotions, promotion folders, and load promotions into
the promotion folders.

Sample: These samples use input files to demonstrate how to load


Loading marketing activities, campaigns, customer segments,
marketing content, attachments, and e-Marketing Spots.
data

Examples Examples: These examples use a CSV file to demonstrate how to


Mapping insert, replace, or delete catalog data.
catalog
data

Examples: These examples use a CSV file to demonstrate how to


Mapping insert, replace, or delete inventory data.
inventory
The following table lists topics by role and main task.

Role Topic Summary

data

Examples: These examples use a CSV file to demonstrate how to


Mapping insert, replace, or delete price data.
pricing
data

Examples: These examples use a CSV file to demonstrate how to


Mapping insert, replace, or delete catalog filter data.
catalog
filter data

Examples: These examples use a CSV file to demonstrate how to


Mapping insert, replace, or delete member data.
member
data

Developer Data Load An understanding of how the Data Load utility works, and
utility the components that make up the Data Load utility.
architectu
ral
overview

Customizi Learn how to customize elements of the Data Load utility


ng the to create custom data readers, column handlers, business
Data Load object mediators, and load extension tables. By
utility customizing the Data Load utility you can complete the
following tasks:

 Load data from sources that are not in a CSV


or specific XML format.
 Resolve data from database columns that are
based on input values that cannot be mapped
to the column value through the default HCL
Commerce configuration.
 Create a custom mediator to load data for
extended or custom business objects.
 Load UserData into custom extension tables.

Configurin Customize an SFTP transport for a Data Load utility


g an SFTP scheduled job to use to retrieve input files from an
transport external source to load into HCL Commerce.
to retrieve
external
files for
The following table lists topics by role and main task.

Role Topic Summary

the Data
Load
utility

 Data Load utility architectural overview


In order to work with the Data Load utility, you must first have an
understanding of how it works.
 Running the Data Load utility in the HCL Commerce development
environment
Run the Data Load utility inside the HCL Commerce development
environment when testing customizations or when debugging issues
with the utility. This configuration is less time-consuming than first
deploying to a production environment, and then running the utility for
testing purposes.
 File format for Data Load input files
To run the Data Load utility, an input file is required. The input file
must be a CSV or XML file. Your input file must follow a particular
format for the Data Load to be able to load your file.
 Data Load utility business object mediators
The data load business object mediator converts the business objects
into physical objects. Two types of mediators that are provided with
the Data Load utility are the component-based mediator and the
table-based mediator. Several component-based mediators are
available for catalog, inventory, and price components.
 Data Load utility table-based mediator and builder
Use the table object builder and table object mediator to load data
directly into a table with the Data Load utility when a component-
based business object mediator does not exist. You can use
the TableObjectMediator and TableObjectBuilder to load your
custom data directly into your target database tables. This mediator
and builder can ensure that the data is loaded into the correct table
and columns. You do not have to create a custom business object
mediator and builder or extend an existing mediator or builder.
 Data Load file difference preprocessing
You can run a file difference preprocess for routine data loads to
improve the Data Load utility performance for loading these files.
Running a file difference can reduce the loading time that is required
to load your routine updates to your HCL Commerce database,
reduce server usage time, and improve server performance.
 Data Load parallelization
The data load utility is improved in HCL Commerce Version 9.1 to
allow for parallelization. Parallelization allows for certain data load
jobs to complete much faster, by increasing the number of threads
that are used to load data into the database.
 Data Load utility best practices
When you use the Data Load utility, there are some general
configuration recommendations to ensure that you take advantage of
the full capability of data load. If you are loading catalog, price,
member, or inventory-related data with the Data Load utility, review
the component-related best practices.
 Configuring and running the Data Load utility
To configure and run the Data Load utility, create a file to load, define
the data reader, load order, and environment configuration files.
Then, enter the data load shell script command with modifying
parameters from a command-line interface.
 Scenarios: Data Load utility
By reviewing different scenarios for using the Data Load utility, you
can learn more about how you can use the utility.
 Samples: Data Load utility
The samples that are provided with the Data Load utility demonstrate
best-practice methods for loading data using common loading
scenarios. Use the samples as a template for loading your own data
to your store.
 Initializing the attribute dictionary
If your store does not have an attribute dictionary, set
the initAttributeDictionary property to initialize the attribute
dictionary and then use the data load utility to create the attribute
dictionary in Management Center.
 Substituting data load attribute values with variables
You can substitute the values of most attributes in the wc-
dataload.xml data load order configuration file and wc-
dataload-env.xml data load environment configuration files. By
using variable substitutions, you can change the value of attributes
without editing the configuration file. For example, you can substitute
the value of the user ID attribute for a variable.
 Loading data into workspaces using the Data Load utility
The Data Load utility supports loading data into a workspace. By
loading data into a workspace you can make and preview changes to
managed assets, without affecting what is running on your site.
 Loading values for single and multiple value attributes
You can configure the loading process for Catalog Upload and the
Data Load utility to load data separately for both single and multiple
value descriptive attributes. By separating these load processes, you
can ensure that the values for your single value and multiple value
attributes are updated accurately.
 Reuse attribute assigned values with the Data Load utility
You can use the Data Load utility to reuse assigned values for
attributes when the same value is needed for multiple catalog entries.
By reusing attribute assigned values across catalog entries, you can
reduce the number of duplicate values that are created in the
database.
 Loading customer segment members by email address with the Data
Load utility
You can create a customer segment that business users can manage
in the Marketing tool by loading a list of email addresses.
 Loading member group members by email address with the Data
Load utility
You can load members into member groups by resolving the unique
ID for a member with only the member email address. By resolving
the unique ID with the email address, you do not need to include the
member logon ID or distinguished name in your input file.
 Loading promotions with the Data Load utility
You can configure the Data Load utility to load promotion data in an
XML input file to create or change promotions in a store. By using the
Data Load utility to load promotions, you can quickly create or change
multiple promotions in a single operation.
 Loading marketing objects with the Data Load utility
You can configure the Data Load utility to load marketing data to
create customer segments, e-Marketing Spots, activities, marketing
content, and any associated assets. By using the Data Load utility to
load this data, you can quickly create multiple marketing objects and
types of objects in a single operation.
 Configuring a Data Load utility scheduler job
You can use the HCL Commerce Administration Console to schedule
a Data Load utility job for your site. By using a scheduler job, you can
configure the Data Load utility to routinely load an input file, such as
for loading frequently updated data.
Related concepts

 Data Load utility performance tuning


Related tasks

 Loading data into workspaces using the Data Load utility


Related reference

 Commerce Composer object input file definitions

Data Load parallelization


The data load utility is improved in HCL Commerce Version 9.1 to allow for parallelization.
Parallelization allows for certain data load jobs to complete much faster, by increasing the
number of threads that are used to load data into the database.

Important:

 Performance tuning of the Data Load utility and your data is required to utilize this feature
effectively. This feature can reduce overall Data Load performance. You must carefully
consider the relationship between parallelization, its configuration, the type of data, and
its particular structure.

 Data Load parallelization is only compatible with CSV formatted data.

In previous versions of HCL Commerce, the data load utility was a single threaded application
that was constrained by singleton classes which were not designed for parallel usage. This
design limited the use of the utility for some large data jobs, hamstringing the performance of the
tool. In some instances, users of the tool can find their ability to get work done impaired by long
running jobs. With this new upgrade to the data load utility, multiple users of the tool can load
data concurrently. In addition, shorter jobs can allow for future jobs to be run sooner.
Architecture

The architectural enhancements made to the data load utility include the addition of a queue
where the reader of the CSV file creates batches of data to be processed. The queue has a
maximum size, which when reached temporarily halts the reader from further production of
batches. The reader thread will continue to enter batches of data into the queue as the batches
are consumed. After all of the data is read from the input file, the reader thread will place an
empty batch into the queue, and then exit with a data load summary report.

Each writer thread will remove one batch from the queue and process the batch to load data into
the database. When a writer thread gets an empty batch from the queue, it will place the empty
batch back into the queue and the writer thread will exit with a data load summary report.

Until all writer threads finish and exit, the Data Load utility will check if there are any errors from
each writer thread. If there are reprocessing error CSV files created by the writer threads, the
Data Load utility will merge all error reprocess CSV files into a single error reprocess CSV file,
and then reload this CSV file using a single writer thread. Once all writer threads finish, the Data
Load utility will produce a combined data load summary report.

Performance considerations and error handling

Due to the complex nature of loading hierarchical data with multiple threads, error handling must
be carefully considered when enabling and configuration parallelization. This is especially true
when it comes to performance tuning for your particular environment and dataset.

Warning: If your data contains many line items that reference the same data, SQL deadlocks
can occur. Ensure that the data you are loading is clean of duplicate or contradictory entries, and
structured in a way that avoids the potential for multiple writer threads from writing to the same
SQL entries.

When implementing parallelization, consider the following as best practice:

 Use the existing data load parameters commitCount, batchSize and maxError per
LoadItem, to ensure your data load utility performance is dialed in.

 Format your data to leverage parallelization appropriately. For example, if your data
contains hierarchical data, place parent data together towards the beginning of the file.
This will reduce the chances of attempting to load child data for which parent data is not
yet present.

Configurable parameters and defaults

By default, the data load utility is set to run in single-thread mode. This ensures the same
expected job behavior and performance as users have come to expect. The following new
parameters have been added to control the parallelization of the data load utility.

A sample data load configuration that includes the full use of these parameters is available here.

Valu
e Default
Parameter type value Description

numberOfThread Integ 1 The maximum number of individual writer threads that take
s
er batches of data from the queue, process them in order, and
write the processed data into the database. By default,
the numberOfThreads parameter is set to 1, meaning the data
load utility should run in single threaded (legacy) mode.

The maximum number of threads is 8.

From internal performance testing, HCL recommends that the


number of threads used be 4. Use of more than four threads
has shown to reduce overall load performance, and can result
in errors such as the following:
com.ibm.commerce.foundation.dataload.exception.Dat
aLoadApplicationException: A problem occurred
while initializing the property information during
the business object builder initialization.

If a number greater than 8 is provided, the maximum


number of threads is used.
inputDataListS Integ 20 The maximum number of CSV line entries that is included in a
ize
er batch of data to be added to the queue. Each writer thread
handles a single batch of data from the queue. Once it is
loaded, the thread is freed to process another batch from the
queue. By default, the inputDataListSize parameter is set
to 20.
queueSize Integ numberOfT The maximum number of batches that can exist in the queue.
er hreads Once the queue is filled with the maximum number of batches,
the reader waits for batches from the queue to be consumed
Valu
e Default
Parameter type value Description

before continuing to produce and queue further batches. By


default, this is set to the numberOfThreads property value.
multipleThread Bool false Defines whether parallelization is enabled for the specific load
sEnabled
ean item. By setting this parameter to false for a specific LoadItem,
you override the set parallelization parameters and force the
data load utility into single threaded operation.

Manually set per LoadItem, if this parameter is not


specified its default value of false is assumed.

Index modulesedit

Index Modules are modules created per index and control all aspects related to an index.

Index Settingsedit

Index level settings can be set per-index. Settings may be:

static

They can only be set at index creation time or on a closed index, or by using the update-index-
settings API with the reopen query parameter set to true (which automatically closes and reopens
impacted indices).

dynamic

They can be changed on a live index using the update-index-settings API.

Changing static or dynamic index settings on a closed index could result in incorrect settings that are
impossible to rectify without deleting and recreating the index.

Static index settingsedit

Below is a list of all static index settings that are not associated with any specific index module:

index.number_of_shards

The number of primary shards that an index should have. Defaults to 1. This setting can only be set at
index creation time. It cannot be changed on a closed index.

The number of shards are limited to 1024 per index. This limitation is a safety limit to prevent
accidental creation of indices that can destabilize a cluster due to resource allocation. The limit can
be modified by specifying export ES_JAVA_OPTS="-Des.index.max_number_of_shards=128" system
property on every node that is part of the cluster.

index.number_of_routing_shards

Integer value used with index.number_of_shards to route documents to a primary shard.


See _routing field.
Elasticsearch uses this value when splitting an index. For example, a 5 shard index
with number_of_routing_shards set to 30 (5 x 2 x 3) could be split by a factor of 2 or 3. In other
words, it could be split as follows:

 5 → 10 → 30 (split by 2, then by 3)

 5 → 15 → 30 (split by 3, then by 2)

 5 → 30 (split by 6)

This setting’s default value depends on the number of primary shards in the index. The default is
designed to allow you to split by factors of 2 up to a maximum of 1024 shards.

In Elasticsearch 7.0.0 and later versions, this setting affects how documents are distributed across
shards. When reindexing an older index with custom routing, you must explicitly
set index.number_of_routing_shards to maintain the same document distribution. See the related
breaking change.

index.codec

The default value compresses stored data with LZ4 compression, but this can be set
to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower
stored fields performance. If you are updating the compression type, the new one will be applied
after segments are merged. Segment merging can be forced using force merge.

index.routing_partition_size

The number of shards a custom routing value can go to. Defaults to 1 and can only be set at index
creation time. This value must be less than the index.number_of_shards unless
the index.number_of_shards value is also 1. See Routing to an index partition for more details about
how this setting is used.

index.soft_deletes.enabled

[7.6.0] Deprecated in 7.6.0. Creating indices with soft-deletes disabled is deprecated and will be
removed in future Elasticsearch versions.Indicates whether soft deletes are enabled on the index.
Soft deletes can only be configured at index creation and only on indices created on or after
Elasticsearch 6.5.0. Defaults to true.

index.soft_deletes.retention_lease.period

The maximum period to retain a shard history retention lease before it is considered expired. Shard
history retention leases ensure that soft deletes are retained during merges on the Lucene index. If a
soft delete is merged away before it can be replicated to a follower the following process will fail due
to incomplete history on the leader. Defaults to 12h.

index.load_fixed_bitset_filters_eagerly

Indicates whether cached filters are pre-loaded for nested queries. Possible values are true (default)
and false.

index.shard.check_on_startup
Expert users only. This setting enables some very expensive processing at shard startup and is only
ever useful while diagnosing a problem in your cluster. If you do use it, you should do so only
temporarily and remove it once it is no longer needed.

Elasticsearch automatically performs integrity checks on the contents of shards at various points
during their lifecycle. For instance, it verifies the checksum of every file transferred when recovering
a replica or taking a snapshot. It also verifies the integrity of many important files when opening a
shard, which happens when starting up a node and when finishing a shard recovery or relocation.
You can therefore manually verify the integrity of a whole shard while it is running by taking a
snapshot of it into a fresh repository or by recovering it onto a fresh node.

This setting determines whether Elasticsearch performs additional integrity checks while opening a
shard. If these checks detect corruption then they will prevent the shard from being opened. It
accepts the following values:

false

Don’t perform additional checks for corruption when opening a shard. This is the default and
recommended behaviour.

checksum

Verify that the checksum of every file in the shard matches its contents. This will detect cases where
the data read from disk differ from the data that Elasticsearch originally wrote, for instance due to
undetected disk corruption or other hardware failures. These checks require reading the entire shard
from disk which takes substantial time and IO bandwidth and may affect cluster performance by
evicting important data from your filesystem cache.

true

Performs the same checks as checksum and also checks for logical inconsistencies in the shard, which
could for instance be caused by the data being corrupted while it was being written due to faulty
RAM or other hardware failures. These checks require reading the entire shard from disk which takes
substantial time and IO bandwidth, and then performing various checks on the contents of the shard
which take substantial time, CPU and memory.

Dynamic index settingsedit

Below is a list of all dynamic index settings that are not associated with any specific index module:

index.number_of_replicas

The number of replicas each primary shard has. Defaults to 1.

WARNING: Configuring it to 0 may lead to temporary availability loss

during node restarts or permanent data loss in case of data corruption.

index.auto_expand_replicas

Auto-expand the number of replicas based on the number of data nodes in the cluster. Set to a dash
delimited lower and upper bound (e.g. 0-5) or use all for the upper bound (e.g. 0-all). Defaults
to false (i.e. disabled). Note that the auto-expanded number of replicas only takes allocation
filtering rules into account, but ignores other allocation rules such as total shards per node, and this
can lead to the cluster health becoming YELLOW if the applicable rules prevent all the replicas from
being allocated.

If the upper bound is all then shard allocation


awareness and cluster.routing.allocation.same_shard.host are ignored for this index.

index.search.idle.after

How long a shard can not receive a search or get request until it’s considered search idle. (default
is 30s)

index.refresh_interval

How often to perform a refresh operation, which makes recent changes to the index visible to search.
Defaults to 1s. Can be set to -1 to disable refresh. If this setting is not explicitly set, shards that
haven’t seen search traffic for at least index.search.idle.after seconds will not receive background
refreshes until they receive a search request. Searches that hit an idle shard where a refresh is
pending will trigger a refresh as part of the search operation for that shard only. This behavior aims
to automatically optimize bulk indexing in the default case when no searches are performed. In order
to opt out of this behavior an explicit value of 1s should set as the refresh interval.

index.max_result_window

The maximum value of from + size for searches to this index. Defaults to 10000. Search requests take
heap memory and time proportional to from + size and this limits that memory. See Scroll or Search
After for a more efficient alternative to raising this.

index.max_inner_result_window

The maximum value of from + size for inner hits definition and top hits aggregations to this index.
Defaults to 100. Inner hits and top hits aggregation take heap memory and time proportional to from
+ size and this limits that memory.

index.max_rescore_window

The maximum value of window_size for rescore requests in searches of this index. Defaults
to index.max_result_window which defaults to 10000. Search requests take heap memory and time
proportional to max(window_size, from + size) and this limits that memory.

index.max_docvalue_fields_search

The maximum number of docvalue_fields that are allowed in a query. Defaults to 100. Doc-value
fields are costly since they might incur a per-field per-document seek.

index.max_script_fields

The maximum number of script_fields that are allowed in a query. Defaults to 32.

index.max_ngram_diff

The maximum allowed difference between min_gram and max_gram for NGramTokenizer and
NGramTokenFilter. Defaults to 1.

index.max_shingle_diff
The maximum allowed difference between max_shingle_size and min_shingle_size for
the shingle token filter. Defaults to 3.

index.max_refresh_listeners

Maximum number of refresh listeners available on each shard of the index. These listeners are used
to implement refresh=wait_for.

index.analyze.max_token_count

The maximum number of tokens that can be produced using _analyze API. Defaults to 10000.

index.highlight.max_analyzed_offset

The maximum number of characters that will be analyzed for a highlight request. This setting is only
applicable when highlighting is requested on a text that was indexed without offsets or term vectors.
Defaults to 1000000.

index.max_terms_count

The maximum number of terms that can be used in Terms Query. Defaults to 65536.

index.max_regex_length

The maximum length of regex that can be used in Regexp Query. Defaults to 1000.

index.query.default_field

(string or array of strings) Wildcard (*) patterns matching one or more fields. The following query
types search these matching fields by default:

 More like this

 Multi-match

 Query string

 Simple query string

Defaults to *, which matches all fields eligible for term-level queries, excluding metadata fields.

index.routing.allocation.enable

Controls shard allocation for this index. It can be set to:

 all (default) - Allows shard allocation for all shards.

 primaries - Allows shard allocation only for primary shards.

 new_primaries - Allows shard allocation only for newly-created primary shards.

 none - No shard allocation is allowed.

index.routing.rebalance.enable

Enables shard rebalancing for this index. It can be set to:

 all (default) - Allows shard rebalancing for all shards.

 primaries - Allows shard rebalancing only for primary shards.


 replicas - Allows shard rebalancing only for replica shards.

 none - No shard rebalancing is allowed.

index.gc_deletes

The length of time that a deleted document’s version number remains available for further versioned
operations. Defaults to 60s.

index.default_pipeline

Default ingest pipeline for the index. Index requests will fail if the default pipeline is set and the
pipeline does not exist. The default may be overridden using the pipeline parameter. The special
pipeline name _none indicates no default ingest pipeline will run.

index.final_pipeline

Final ingest pipeline for the index. Indexing requests will fail if the final pipeline is set and the
pipeline does not exist. The final pipeline always runs after the request pipeline (if specified) and the
default pipeline (if it exists). The special pipeline name _none indicates no final ingest pipeline will
run.

You can’t use a final pipeline to change the _index field. If the pipeline attempts to change
the _index field, the indexing request will fail.

index.hidden

Indicates whether the index should be hidden by default. Hidden indices are not returned by default
when using a wildcard expression. This behavior is controlled per request through the use of
the expand_wildcards parameter. Possible values are true and false (default).

Settings in other index modulesedit

Other index settings are available in index modules:

Analysis

Settings to define analyzers, tokenizers, token filters and character filters.

Index shard allocation

Control over where, when, and how shards are allocated to nodes.

Mapping

Enable or disable dynamic mapping for an index.

Merging

Control over how shards are merged by the background merge process.

Similarities

Configure custom similarity settings to customize how search results are scored.

Slowlog

Control over how slow queries and fetch requests are logged.
Store

Configure the type of filesystem used to access shard data.

Translog

Control over the transaction log and background flush operations.

History retention

Control over the retention of a history of operations in the index.

Indexing pressure

Configure indexing back pressure limits.

X-Pack index settingsedit

Index lifecycle management

Specify the lifecycle policy and rollover alias for an index.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy