Performance Redis
Performance Redis
Topics in the Performance section describe the means by which to plan, implement, test, and re-
visit the optimization of HCL Commerce site performance.
Methodology
Like site security, a proper approach to site performance requires planning, execution,
testing, monitoring, and maintenance. The following collections of topics describe each
phase of this process, and contain the recommended considerations, and best practices
for getting the most out of your HCL Commerce deployment.
Setting cache providers in HCL Commerce
Starting with HCL Commerce Version 9.1, all caches are automatically configured to use
the HCL Cache. However, the cache provider for each cache can be set to use the
DynaCache provider, or the WebSphere Extreme Scale DynaCache provider.
HCL Cache
The HCL Cache integrates with DynaCache and improves performance and scalability
with features such as remote caching, real-time monitoring, and auto-tuning capabilities.
Experiments considerations
When experiments that you have created in WebSphere Commerce Accelerator are
running in the store, the Scheduler launches a job that determines whether any of the
current experiments have expired. The job compares the expiration date specified in the
experiment to the current system date. When an experiment is identified as expired, its
status is updated in the database, and the experiment is prevented from displaying to
customers.
Website performance tuning
There are four steps for evaluating performance of an HCL Commerce website that is
based on Transaction server.
Tuning the price rule object cache
You can improve performance by changing the size of the cache used to store price rule
business objects during storefront runtime.
Data Load utility performance tuning
Scheduled Data Load utility jobs can affect HCL Commerce performance. You can
reduce the impact of this process by tuning the data load performance appropriately for
your implementation.
Workspaces performance tuning
Workspaces use database views instead of tables to retrieve data. Retrieval of
underlying data might be more time-consuming because of the complexity of SQL
statements that are used in workspace view definitions.
Configuring custom DynaCache objects in WebSphere Liberty
The WebSphere Liberty profile, can be configured with distributed map for the custom
cache objects. To configure the custom cache objects for WebSphere Liberty in Runtime
and Development, follow the steps outlined below:
Configuring custom DynaCache objects in HCL Commerce
Define the custom cache in WebSphere Application Server. You can then use the cache
name to store and retrieve the cache objects.
Preparing HCL Commerce for the peak shopping periods
When preparing for peak shopping periods, such as the holiday season, use the following
best practices to effectively manage your HCL Commerce environment for peak
efficiency, and to ensure that your HCL Commerce site is ready to handle the increased
load.
OneTest Performance sample script for Emerald B2C Store
HCL OneTest provides software testing tools to support a DevOps approach: API testing,
functional testing, UI testing, performance testing and service virtualization. This
approach aids in test automation, for earlier and more frequent discovery of errors.
Docker performance tuning
The following performance tuning information can be used to help you configure your
containerized HCL Commerce environments for optimal performance.
Thread Monitor - Yaml configuration file
The Thead Monitor tool gathers thread dumps and Javacores at a configured interval,
and during events such as high WebContainer/Default_Executor pool thread
usage. Yaml configuration file must exist with
name /SETUP/support/thread_monitor.yaml, or under the location specified with
the THREAD_MONITOR_CFG environment variable.
Search health check
The search health check API is used to regularly check the status of Commerce
containers to make sure they are in a healthy state.
Elasticsearch performance tuning
You have numerous options when tuning the performance of NiFi and Elasticsearch. The
following guide introduces tools for monitoring performance and validating key tuning
parameters, and provides a performance tuning strategy that you can use with the
component.
Optimizing index build and overall flow
Optmizations are provided full index builds and for tuning parameters. Potential
improvements that can be implimented for Near Real Time (NRT) index updates are not
described.
Monitoring and understanding Elasticsearch and NiFi metrics
You can use Grafana and related tools to analyze the performance of the Ingest pipeline,
and Kibana to do the same with Elasticsearch.
Tunable Parameters in the setup of NiFi and Elasticsearch
How you can modify the values for the tunable parameters, and some default values and
how they can be improved in different circumstances.
Elasticsearch scaling and hardware requirements
You can achieve additional capacity by implementing or extending pod clustering on
Elasticsearch, NiFi, or both. You can also consider the hardware footprint and key
resources that impact the processing and index creation speed.
Elasticsearch with dedicated Auth and Live nodes
For production environments, Elasticsearch is recommended in a clustered configuration
for high availability and scalability considerations.
Minimal and recommended configuration tunable parameter values
Two configurationsof the NiFi tunable parameter values are presented, a minimal and an
optimal value.
Measurement
As each deployment of HCL Commerce is different, a measured approach to modifying
your defaults must be taken to successfully tune your site. The following collection of
topics provides details on how to measure and troubleshoot site performance.
Emerald REST Caching On TS Server for Commerce 9.1
Emerald Store is powered by the REST framework, and the cache is implemented using
a REST servlet.
Methodology
Like site security, a proper approach to site performance requires planning, execution, testing,
monitoring, and maintenance. The following collections of topics describe each phase of this
process, and contain the recommended considerations, and best practices for getting the most
out of your HCL Commerce deployment.
Design phase
During the design phase of your project, it is important to consider performance priorities.
It is important to design a solution that has a good balance of features and performance
consideration, build a performance test environment, and begin to deploy tools that are
required to support your performance strategy. It is also important to identify architecture
limitations that might change a non-functional requirement.
Development phase
During the development phase, it is important to measure the response time of business
critical steps in the development environment.
Testing phase
During the testing phase, it is important to start with simple, common test cases and
gradually go to more complex scenarios and environment configurations. Other priorities
include measuring the performance of the application as load scales up to projected peak
and identifying and resolving the root causes of resource constraints and application
bottlenecks.
Maintenance phase
Post-launch, during the maintenance phase, instrument and monitor performance
indicators in the production environment to enable the team to proactively identify and
address potential issues before users are affected. Priorities also include fostering
communication between marketing, performance, and operation teams to be better
prepared for promotional events. And using the data that is captured from production to
optimize planning for future marketing and promotional activities.
Web server considerations
Even though the web server is a mature technology, it can still be tuned to get better
performance, especially on large websites. This section covers the common performance
considerations for all the web servers, and is operating system independent.
Application server considerations
HCL Commerce is ultimately another Java 2 application that runs on top of WebSphere
Application Server and WebSphere Liberty. As a result, WebSphere Application
Server acts as the operating system for HCL Commerce. Optimizing the application
servers optimized improves the performance of HCL Commerce.
JDBC_MONITOR_ENABLED parameter
JDBC Monitor (Java Database Connectivity) is a debugging tool that allows you to
examine SQLs that are executed in any application for performance and troubleshooting
purposes. This document provides the installation steps for JDBC Monitor.
Tuning
an HCL Commerce deployment is made up of many individual parts that work together to
provide the e-commerce experience. By default, each part is configured to provide a
base level of performance. However, individual changes can be made to potentially gain
extra performance. The following collection of topics provides details on how to make
these performance changes to the individual components of your site.
Design phase
During the design phase of your project, it is important to consider performance priorities. It is
important to design a solution that has a good balance of features and performance
consideration, build a performance test environment, and begin to deploy tools that are required
to support your performance strategy. It is also important to identify architecture limitations that
might change a non-functional requirement.
During the design phase, a number of key activities are carried out. Here are some examples:
The possibility that features might put a greater demand on resources than projected
over the capacity plan estimate.
Application performance and user experience might be adversely affected by third-party
application interfaces.
The lack of an appropriate test environment might hamper performance testing
capabilities and put test results into question.
Caching strategy
When you plan a caching strategy for HCL Commerce, considerations such as which
pages will be cached, and where they will be cached are important. These decisions are
influenced by whether you are caching a local (Transaction) server, or a remote (Store)
server. To help with these decisions, consider the following approaches.
Marketing overview: precision marketing features
Marketing Managers can use the extensive precision marketing features in the marketing
tool to deliver targeted marketing messages to customers.
Facet Navigation widget
This widget automatically retrieves and displays a list of facets, such as the Brand, Price,
and Color facets. For each facet value, the number of matching catalog entries is
displayed in parentheses. Typically, this widget is placed in the left sidebar of the page.
Marketing cache design considerations
When you are designing the marketing caching for your site and stores, there are many
options, enhancements, and best practices to consider for improving marketing
performance.
Promotion evaluation considerations
When you are designing promotions for your site, consider how your promotions are
being evaluated. How you design your promotions and configure your promotion
evaluation process can affect your site performance during promotion evaluation. When
you are creating promotions, consider the promotion type, the promotion conditions, the
size of orders that are evaluated, and the agenda builder that is used for promotion
evaluation.
Caching strategy
When you plan a caching strategy for HCL Commerce, considerations such as which pages will
be cached, and where they will be cached are important. These decisions are influenced by
whether you are caching a local (Transaction) server, or a remote (Store) server. To help with
these decisions, consider the following approaches.
Theoretically, caching takes place in the tier closest to the user. In reality, other factors such as
security and user-specific data can influence the choice of the best place to cache the content.
To maximize the benefit of dynamic caching, elements of a page can be fragmented as finely as
possible so that they can be cached independently in different cache entries.
For example, the non-user specific, non-security sensitive fragments are generally useful to
many users, and can be cached in a more public space and closer to the user. The security
sensitive data can be cached behind the enterprise firewall.
For stores that run on the Transaction server, caching outside of WebSphere Application Server
can be used with larger databases and sites to improve performance. Edge Server and the ESI
cache-plugin are provided with WebSphere Application Server to provide extra caching
capability. Session information (language ID, preferred currency ID, parent Organization, contract
ID, and member group) must be stored in session cookies. The cookies are required in order for
caching to be done on a server external to WebSphere Application Server.
All web pages consist of smaller and often simpler fragments. An example of a page fragment
might be a header, sidebar, footer, or an e-Marketing Spot. Breaking a web page into fragments
or components makes more caching possible for any page, even for personalized pages.
Fragments can be designed to maximize their reusability.
Caching a whole web page means that the entire page is cached as a large cache entry that
includes all the content from all fragments that have no includes or forwards. This approach can
save a significant amount of application server processing and is typically useful when the
external HTTP request contains all the information that is needed to find the entry.
If web pages are broken into different fragments and the fragments are cached individually, then
some fragments can be reused for a wider audience. When a web page is requested, then
different fragments are reassembled to produce the page. For more information, see Full page
and fragment caching.
If the page output has sections that are user-dependent, then the page output is cached in a
manner that is known as fragment caching. That is, the JSP pages are cached as separate
entries and are reassembled when they are requested. If the page output always produces the
same result based on the URL parameters and request attributes, then this output can be cached
with the cache-entry. Use the property consume-subfragments (CSF) , and the HCL Commerce's
store controller servlet (com.ibm.commerce.struts.ECActionServlet for HCL Commerce Version
9.0.0.x, or com.ibm.commerce.struts.v2.ECActionServlet for Version 9.0.x) as the servlet name.
Web pages can be cached by using full page caching or fragment caching, or a combination of
both methods.
Caching remote (Store) server pages
If you are using remote stores that run under the WebSphere Liberty Profile, your caching
strategy must change to reflect the containerization of the Transaction and Search servers. In
particular, you need to cache the results of REST calls differently.
Your selection of pages to be cached is largely similar for local and remote strategies. Aside from
the servlet cache, a remote store server cache contains not only the JSP/Servlet rendering
result, but also the remote REST access result. One thing to bear in mind is that calls that were
previously local in Local Store topologies are now remote calls. Therefore, you have two
considerations. You still need to provide rapid response times to calls from the customer
browser, but you must also minimize the number of calls that are passed to remote servers. You
can cache content that is frequently fetched from the Transaction or Search servers, such as
rendering results and REST results for common remote queries.
Consider caching REST tag results. If the REST result is user-, security-, or environment-neutral,
then it is a candidate for caching in the REST result cache. You can use the wcf:rest tag
attribute "cached" to declare that a call's result can be cached.
You can trigger cache invalidation passively, or actively. Passive invalidation uses the Time To
Live (TTL) parameter of cache entries for webpages and fragments to trigger the action. After the
TTL time expires, the cache data container will trigger cache invalidation. This configuration
works best when you set the TTL at the level of whole web pages, and refresh the cache daily.
When you use the TTL parameter, any custom logic you create is overridden by the cache
expiry.
If you are using Elasticsearch with HCL Commerce Version 9.1, your caching solution is Redis.
You can monitor and control Redis using the Cache Manager, as described in HCL Cache. For
information about caching and invalidation in Elasticsearch, see HCL Cache with Redis.
For more information about configuring Apache Kafka, see Cache invalidation using Kafka and
ZooKeeper.
Alternatively, if you use WebSphere eXtreme Scale as the centralized cache data container,
cache invalidation is also centralized. In this case, you do not need to use a separate messaging
system such as Kafka.
The integration includes pre-defined dashboards that are relevant to cache tuning:
Redis
If Redis is installed with the Prometheus exporter, the available dashboard can display
information about the connected clients, memory used, commands executed, and more.
Grafana also supports a Redis datasource. The Redis Dashboard retrieves information directly
from Redis by querying the Redis datasource. The output complements the information provided
by the Redis Prometheus exporter.
grafana:
plugins:
- redis-datasource
WebSphere cache monitor
Although the WebSphere cache monitor is supported, the HCL Cache provides newer
functions and tools that are more suitable for managing a distributed cache system in
production.
logMetricsFrequency configuration in HCL Cache
The logMetricsFrequency configuration option can be used to specify, in seconds, the
frequency at which cache statistics are written to the logs. This can be especially useful
for environments where the Prometheus and Grafana integration is not available.
HCL Cache configurable Prometheus metrics
The HCL Cache provides cache level configurations to customize the metrics created for
the Prometheus integration.
1. Cache monitoring: The HCL Cache integrates with Prometheus and Grafana for real-time
monitoring and alerting.
2. Cache invalidations: The cache manager provides REST APIs for clears and
invalidations.
3. Troubleshooting: The cache manager has troubleshooting APIs, such as listing the cache
ids for a dependency id, or details for a cache id. The Redis database can also be
queried to inspect the cache or monitoring invalidations. See Troubleshooting for details.
The WebSphere Cache monitor can be used in development to inspect the contents of Servlet
and JSP fragment caching, and to assist with the development of cachespec.xml rules.
The WebSphere Cache monitor only displays the contents and statistics of local caches. Cache
clears and invalidations issued from the monitor are also executed on the remote cache and
propagated to other containers (as applicable).
logMetricsFrequency configuration
in HCL Cache
The logMetricsFrequency configuration option can be used to specify, in seconds, the
frequency at which cache statistics are written to the logs. This can be especially useful for
environments where the Prometheus and Grafana integration is not available.
Enabling logMetricsFrequency
The logMetricsFrequency setting is a top level configuration option. See cache configuration for
details.
apiVersion: v1
data:
cache_cfg-ext.yaml: |-
redis:
enabled: true
yamlConfig: "/SETUP/hcl-cache/redis_cfg.yaml" # Please leave this line
untouched
logMetricsFrequency: 60
cacheConfigs:
baseCache:
remoteCache:
shards: 5
redis_cfg.yaml: |-
...
Although changes are not typically required, if you are integrating with a 3rd-party monitoring
system and there is a cost associated with the retrieval or storage of metrics, these
configurations can be used to fine-tune the metrics to be used.
Cache configurations
Metrics are configurable at the cache level. Changes can be applied to a single cache, or to the
default configuration using defaultCacheConfig. See cache configuration for details.
The Timer metrics used by the HCL Cache support histograms for the calculation of
percentiles. The tracking of histogram values requires the definition of additional metrics.
This support can be disabled to reduce the amount of metrics created.
hclcache_cache_clears_total{cachespace="demoqaauth",name="baseCache",scope
="local",} 100.0
hclcache_cache_clears_duration_seconds_sum{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 1.3296758
hclcache_cache_clears_duration_seconds_max{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 0.0897587
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="1.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="3.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="5.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="7.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.001",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.003",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.005",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.01",} 23.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.05",} 99.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.1",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.5",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="+Inf",} 100.0
hclcache_cache_clears_duration_seconds_count{cachespace="demoqaauth",name=
"baseCache",result="ok",scope="remote",} 100.0
The default histogram configuration is as follows:
defaultCacheConfig:
metrics:
timerNanoBuckets:
- 100000 # 0.1 ms
- 300000 # 0.3 ms
- 500000 # 0.5 ms
- 700000 # 0.7 ms
- 1000000 # 1.0 ms
- 3000000 # 3.0 ms
- 5000000 # 5.0 ms
- 10000000 # 10.0 ms
- 50000000 # 50.0 ms
- 100000000 # 100.0 ms
- 500000000 # 500.0 ms
Values are in nanoseconds.
Although the use of a remote cache in Redis is highly recommended, it is not required in all
configurations.
Elasticsearch: Redis is required. The Redis servers must be shared by authoring and
live. This is required as NiFi must interact with both Authoring and Live environments.
Solr Search: Redis is recommended but not required. Migrated environments that do not
implement Redis must continue using Kafka to replicate invalidations. Redis replaces
Kafka for invalidations and can also act as a remote cache. Authoring and Live can be
configured with separate Redis servers. This is recommended for production
environments.
Using the Bitnami Charts to install Redis within the Kubernetes cluster
Redis standalone (single instance) will work appropriately for pre-production and many
production environments. Larger environments can benefit from a clustered Redis, which
allows for multiple masters and replicas.
A Redis Cluster is configured with multiple masters (3+). Caches and shards will be
distributed across the master servers. Each master can be configured with zero or more
replicas. Replicas can help scalability by handling read traffic (GET operations) and can
take over should the master become unavailable.
Required configurations
Use the following configurations in the Redis server. See key eviction for details.
maxmemory
Indicates the amount of memory available to hold cached data, and should be set to a
non-zero value.
maxmemory-policy
Must be set to volatile-lru
Besides the topology, consider the following key tuning configurations. Most
apply to locally installed Redis, but can be relevant to redis as-a-service as
well.
To validate, or compare the performance of the Redis server, you can use the
the Redis benchmark utility, and the HCL Cache's hcl-cache-
benchmark utility.
Ensure the limits set on the Redis pods are set appropiately:
Storage
If persistence is used, the Persistent Volumes needs to be sized acordingly. For example,
the Bitnami Charts set a limit of 8 GB by default. This might not be enough for production
environments and might lead to a crash.
CPU
CPU throttling can freeze the Redis server. Kubernetes is very aggressive with CPU
throttling. To avoid it, set a high limit, or remove the CPU limit for the Redis pods.
Memory
The memory required by the Redis container is a function of
the maxmemory configuration. maxmemory should be less than 70% of the container
limit.
Redis persistence
Redis includes persistence (AOF/ RDB) options that save the memory contents to disk.
This allows Redis to recover the memory contents (cache) in case of a crash. For the use
with HCL Cache, enabling RDB persistence and disabling AOF should be sufficient.
Persistence is required when replicas are used. Otherwise, it is optional and Redis does
not require a Persistence Volume. Without persistence, in the unlikely case that Redis
crashes or becomes unresponsive, Kubernetes should be able to restart the service
almost instantaneously, but with an empty cache.
If the Commerce site is not tuned to absorb a cache clear during peak traffic period,
persistence is recommended. When persistence is enabled, the startup will be delayed
by a number of seconds while Redis re-loads the memory cache from disk. It is also
possible that if the Kubernetes node crashes, manual intervention may be required to
release the Persistence Volume from the problematic node and to allow Kubernetes to
reschedule the pod (due to ReadWriteOnce-RWO mode).
Note: Disclaimer: Redis is a registered trademark of Redis Labs Ltd. Any rights therein
are reserved to Redis Labs Ltd. Any use by HCL is for referential purposes only and does
not indicate any sponsorship, endorsement, or affiliation between Redis Labs Ltd.
Memory management in Redis
Redis is the database of choice to back cache
systems. It uses a memory management model
that supports LRU (Least Recently Used) and
other algorithms to evict keys in order to allow for
new data when the memory is full.
Bitnami Redis installation
Although Redis is installed automatically as a sub-
chart of HCL Commerce, it can also be installed
separately, or with different configurations. This
document describes the recommended
deployment options and the use of Bitnami charts.
HCL Cache in Redis
Redis is the remote database that stores HCL
Cache data, and it is also used to replicate
invalidations to local caches. For troubleshooting
and adminstrative purposes, you might sometimes
need to use the Redis command line to query or
configure the database.
Use of Redis replicas
With Redis cluster, master nodes can be backed
by replicas (one or many). Replicas are used for
failover and scalability:
Installation considerations:
Topology
1. standalone: A single master with no replicas can work well in many scenarios.
Because HCL Cache is a multi-tiered framework, the most frequently-accessed
content is served from local caches, reducing the load on Redis and therefore
decreasing its capacity requirements. (The amount of caching and hit ratios will
affect the load on each site). HCL Cache is also designed with high availability
features, and implements circuit breakers that block Redis access until the server
recovers. During that time, the local caches remain available. Kubernetes will
detect hang or crash conditions and rapidly re-spawn the master container based
on the probes defined in the Redis deployment.
Note: If replicas/ subordinates were defined (without Sentinel), the replicas are
for ready-only access and are not promoted to master. The system still needs to
wait for the master to be re-spawned. See topologies for more details.
2. cluster: Clustering can be used to scale Redis. Although each HCL Cache can
only exist on a single node (each cache is tied to a single slot), HCL
Commerce defines multiple caches (50 or greater) that can be distributed across
the Redis cluster nodes. With slot migration, it is possible to select what caches
land on each server. A Redis cluster requires a minimum of three master servers.
If replicas are used, six containers need to be deployed. See the Redis Cluster
tutorial for more details.
Persistence
Redis offers persistence AOF (Append Only File) and RDB (Redis Database) options
that save the memory contents to disk. This allows Redis to recover the memory contents
(cache) in case of a crash. For more information on AOF and RDB, see Redis
Persistence.
With standalone Redis, the use of persistence is optional but with Redis cluster it is
recommended. The use of persistence can add a small overhead to runtime operations.
There can also be a delay during Redis startup as it loads the persisted cache into
memory. This delay varies depending on the size of the file. For use with HCL Cache,
use of RDB only (not AOF) can be sufficient.
When configuring Kubernetes persistent volumes for Redis, select a storageClass with
fast SSD storage. By default, Redis requests only 8 GB of storage for a persistant
volume. That may not be enough, especially if AOF persistence is enabled. Request a
larger size (For example, 30 GB) and monitor usage to get a better understanding for
how much storage is required.
Use this Redis chart to install Redis standalone, with no persistence. Review redis-
standalone-values.yaml for details.
helm install hcl-commerce-redis bitnami/redis -n redis -f redis-
standalone-values.yaml
Note: If Prometheus is not set up, disable the metrics section prior to install. For more
infromation on Promethus and Grafana integration, see HCL Commerce Monitoring -
Prometheus and Grafana Integration
Redis Cluster
These steps install a Redis Cluster with three masters. Review redis-cluster-
values.yaml For details.
helm install hcl-commerce-redis bitnami/redis-cluster -n redis -f redis-
cluster-values.yaml
Note: If Prometheus is not set up, disable the metrics section prior to install. For more
infromation on Promethus and Grafana integration, see HCL Commerce Monitoring -
Prometheus and Grafana Integration
Redis configurations
Determines the size of the memory available to store Redis objects. The amount of
cache will vary from site to site. 10 GB is a good starting configuration. The pod memory
limit must be higher.
maxmemory-policy
Using volatile-lru is required for the HCL Cache. This allows Redis to evict cache entries
but not dependency IDs. The options appendonly and save are for persistence, which is
disabled in the sample.
This section can also be used to enable debug settings, such as for SLOWLOG:
slowlog-log-slower-than 10000
slowlog-max-len 512
latency-monitor-threshold 100
cluster-require-full-coverage: no:
When not all of the slots are covered (e.g. due to master down),
the CLUSTERDOWN error is issued. Configuring cluster-require-full-
coverage to no enables the subset of nodes that remain available to continue to serve
requests.
If you plan to enable replicas, see Use of Redis Replicas for additional configurations.
Persistence
If Prometheus is set up, you can enable metrics and serviceMonitors (requires Kube-
prometheus-stack). Redis metrics can be consumed with the Redis Grafana dashboard.
The HCL Cache - Remote dashboard also displays Redis metrics.
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: redis
Add this section on the values.yaml file to configure THP and somaxconn as follows:
sysctl:
enabled: true
mountHostSys: true
command:
- /bin/sh
- -c
- |-
sysctl -w net.core.somaxconn=10240
echo madvise > /host-sys/kernel/mm/transparent_hugepage/enabled
Key eviction
Overview of Redis key eviction policies (LRU, LFU, etc.)
When Redis is used as a cache, it is often convenient to let it automatically evict old
data as you add new data. This behavior is well known in the developer community,
since it is the default behavior for the popular memcached system.
This page covers the more general topic of the Redis maxmemory directive used to limit
the memory usage to a fixed amount. It also extensively covers the LRU eviction
algorithm used by Redis, which is actually an approximation of the exact LRU.
For example, to configure a memory limit of 100 megabytes, you can use the following
directive inside the redis.conf file:
maxmemory 100mb
Setting maxmemory to zero results into no memory limits. This is the default behavior for
64 bit systems, while 32 bit systems use an implicit memory limit of 3GB.
When the specified amount of memory is reached, how eviction policies are
configured determines the default behavior. Redis can return errors for commands
that could result in more memory being used, or it can evict some old data to return
back to the specified limit every time new data is added.
Eviction policies
The exact behavior Redis follows when the maxmemory limit is reached is configured
using the maxmemory-policy configuration directive.
noeviction: New values aren’t saved when memory limit is reached. When a database
uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-
to-live (TTL) value.
Picking the right eviction policy is important depending on the access pattern of your
application, however you can reconfigure the policy at runtime while the application is
running, and monitor the number of cache misses and hits using the
Redis INFO output to tune your setup.
Use the allkeys-lru policy when you expect a power-law distribution in the
popularity of your requests. That is, you expect a subset of elements will be
accessed far more often than the rest. This is a good pick if you are unsure.
Use the allkeys-random if you have a cyclic access where all the keys are
scanned continuously, or when you expect the distribution to be uniform.
Use the volatile-ttl if you want to be able to provide hints to Redis about what
are good candidate for expiration by using different TTL values when you create
your cache objects.
The volatile-lru and volatile-random policies are mainly useful when you want to use
a single instance for both caching and to have a set of persistent keys. However it is
usually a better idea to run two Redis instances to solve such a problem.
It is also worth noting that setting an expire value to a key costs memory, so using a
policy like allkeys-lru is more memory efficient since there is no need for
an expire configuration for the key to be evicted under memory pressure.
If a command results in a lot of memory being used (like a big set intersection stored
into a new key) for some time, the memory limit can be surpassed by a noticeable
amount.
However, since Redis 3.0 the algorithm was improved to also take a pool of good
candidates for eviction. This improved the performance of the algorithm, making it
able to approximate more closely the behavior of a real LRU algorithm.
What is important about the Redis LRU algorithm is that you are able to tune the
precision of the algorithm by changing the number of samples to check for every
eviction. This parameter is controlled by the following configuration directive:
maxmemory-samples 5
The reason Redis does not use a true LRU implementation is because it costs more
memory. However, the approximation is virtually equivalent for an application using
Redis. This figure compares the LRU approximation used by Redis with true LRU.
The test to generate the above graphs filled a Redis server with a given number of
keys. The keys were accessed from the first to the last. The first keys are the best
candidates for eviction using an LRU algorithm. Later more 50% of keys are added, in
order to force half of the old keys to be evicted.
You can see three kind of dots in the graphs, forming three distinct bands.
In a theoretical LRU implementation we expect that, among the old keys, the first half
will be expired. The Redis LRU algorithm will instead only probabilistically expire the
older keys.
As you can see Redis 3.0 does a better job with 5 samples compared to Redis 2.8,
however most objects that are among the latest accessed are still retained by Redis
2.8. Using a sample size of 10 in Redis 3.0 the approximation is very close to the
theoretical performance of Redis 3.0.
Note that LRU is just a model to predict how likely a given key will be accessed in the
future. Moreover, if your data access pattern closely resembles the power law, most of
the accesses will be in the set of keys the LRU approximated algorithm can handle well.
In simulations we found that using a power law access pattern, the difference between
true LRU and Redis approximation were minimal or non-existent.
However you can raise the sample size to 10 at the cost of some additional CPU usage
to closely approximate true LRU, and check if this makes a difference in your cache
misses rate.
To experiment in production with different values for the sample size by using
the CONFIG SET maxmemory-samples <count> command, is very simple.
LFU is approximated like LRU: it uses a probabilistic counter, called a Morris counter to
estimate the object access frequency using just a few bits per object, combined with a
decay period so that the counter is reduced over time. At some point we no longer
want to consider keys as frequently accessed, even if they were in the past, so that the
algorithm can adapt to a shift in the access pattern.
That information is sampled similarly to what happens for LRU (as explained in the
previous section of this documentation) to select a candidate for eviction.
However unlike LRU, LFU has certain tunable parameters: for example, how fast should
a frequent item lower in rank if it gets no longer accessed? It is also possible to tune
the Morris counters range to better adapt the algorithm to specific use cases.
Those should be reasonable values and were tested experimentally, but the user may
want to play with these configuration settings to pick optimal values.
Instructions about how to tune these parameters can be found inside the
example redis.conf file in the source distribution. Briefly, they are:
lfu-log-factor 10
lfu-decay-time 1
The decay time is the obvious one, it is the amount of minutes a counter should be
decayed, when sampled and found to be older than that value. A special value
of 0 means: we will never decay the counter.
The counter logarithm factor changes how many hits are needed to saturate the
frequency counter, which is just in the range 0-255. The higher the factor, the more
accesses are needed to reach the maximum. The lower the factor, the better is the
resolution of the counter for low accesses, according to the following table:
+--------+------------+------------+------------+------------+------------+
| factor | 100 hits | 1000 hits | 100K hits | 1M hits | 10M hits |
+--------+------------+------------+------------+------------+------------+
+--------+------------+------------+------------+------------+------------+
+--------+------------+------------+------------+------------+------------+
+--------+------------+------------+------------+------------+------------+
+--------+------------+------------+------------+------------+------------+
So basically the factor is a trade off between better distinguishing items with low
accesses VS distinguishing items with high accesses. More information is available in
the example redis.conf file.
Although the use of a remote cache in Redis is highly recommended, it is not required in all
configurations.
Elasticsearch: Redis is required. The Redis servers must be shared by authoring and
live. This is required as NiFi must interact with both Authoring and Live environments.
Solr Search: Redis is recommended but not required. Migrated environments that do not
implement Redis must continue using Kafka to replicate invalidations. Redis replaces
Kafka for invalidations and can also act as a remote cache. Authoring and Live can be
configured with separate Redis servers. This is recommended for production
environments.
Selecting a Redis topology
Redis can be installed in a variety of configurations. The selection depends on your performance
and high-availability requirements. The alternatives include:
Using the Bitnami Charts to install Redis within the Kubernetes cluster
Redis standalone (single instance) will work appropriately for pre-production and many
production environments. Larger environments can benefit from a clustered Redis, which
allows for multiple masters and replicas.
A Redis Cluster is configured with multiple masters (3+). Caches and shards will be
distributed across the master servers. Each master can be configured with zero or more
replicas. Replicas can help scalability by handling read traffic (GET operations) and can
take over should the master become unavailable.
Required configurations
Use the following configurations in the Redis server. See key eviction for details.
maxmemory
Indicates the amount of memory available to hold cached data, and should be set to a
non-zero value.
maxmemory-policy
Must be set to volatile-lru
Besides the topology, consider the following key tuning configurations. Most
apply to locally installed Redis, but can be relevant to redis as-a-service as
well.
To validate, or compare the performance of the Redis server, you can use the
the Redis benchmark utility, and the HCL Cache's hcl-cache-
benchmark utility.
Ensure the limits set on the Redis pods are set appropiately:
Storage
If persistence is used, the Persistent Volumes needs to be sized acordingly. For example,
the Bitnami Charts set a limit of 8 GB by default. This might not be enough for production
environments and might lead to a crash.
CPU
CPU throttling can freeze the Redis server. Kubernetes is very aggressive with CPU
throttling. To avoid it, set a high limit, or remove the CPU limit for the Redis pods.
Memory
The memory required by the Redis container is a function of
the maxmemory configuration. maxmemory should be less than 70% of the container
limit.
Redis persistence
Redis includes persistence (AOF/ RDB) options that save the memory contents to disk.
This allows Redis to recover the memory contents (cache) in case of a crash. For the use
with HCL Cache, enabling RDB persistence and disabling AOF should be sufficient.
Persistence is required when replicas are used. Otherwise, it is optional and Redis does
not require a Persistence Volume. Without persistence, in the unlikely case that Redis
crashes or becomes unresponsive, Kubernetes should be able to restart the service
almost instantaneously, but with an empty cache.
If the Commerce site is not tuned to absorb a cache clear during peak traffic period,
persistence is recommended. When persistence is enabled, the startup will be delayed
by a number of seconds while Redis re-loads the memory cache from disk. It is also
possible that if the Kubernetes node crashes, manual intervention may be required to
release the Persistence Volume from the problematic node and to allow Kubernetes to
reschedule the pod (due to ReadWriteOnce-RWO mode).
Note: Disclaimer: Redis is a registered trademark of Redis Labs Ltd. Any rights therein
are reserved to Redis Labs Ltd. Any use by HCL is for referential purposes only and does
not indicate any sponsorship, endorsement, or affiliation between Redis Labs Ltd.
Memory management in Redis
Redis is the database of choice to back cache
systems. It uses a memory management model
that supports LRU (Least Recently Used) and
other algorithms to evict keys in order to allow for
new data when the memory is full.
Bitnami Redis installation
Although Redis is installed automatically as a sub-
chart of HCL Commerce, it can also be installed
separately, or with different configurations. This
document describes the recommended
deployment options and the use of Bitnami charts.
HCL Cache in Redis
Redis is the remote database that stores HCL
Cache data, and it is also used to replicate
invalidations to local caches. For troubleshooting
and adminstrative purposes, you might sometimes
need to use the Redis command line to query or
configure the database.
Use of Redis replicas
With Redis cluster, master nodes can be backed
by replicas (one or many). Replicas are used for
failover and scalability:
The HCL Cache has requirements that go beyond a simple key-value database. To support
these requirements, such as invalidation by dependency, the cache must maintain sets of
metadata for each key. Redis must not be allowed to evict metadata information, as this creates
inconsistencies in the cache, such as entries not getting invalidated. To maintain metadata,
the HCL Cache implements a set of maintenance processes.
maxmemory
The amount of memory made available for keys (cached data).
maxmemory-policy
Must be set to volatile-lru, which removes least recently used keys with the expire field
set to true.
With Redis Enterprise, maxmemory is not used. Caches are maintained using number
of entries instead. See softMaxSize.
Expiry
Object Use Set
{}-data-* HASH key that contains the cached data and additional YES
metadata, such as creation time, dependencies and others
{}-dep-* SET key for each dependency id. The set contains a list of all NO
the cache id that are associated to this dependency id.
{}- ZSET by expiry time that contains cache keys and their NO
maintenance
dependencies
Cached data ({}-data-) must always have an expiry set and may be evicted by
Redis when available memory is exhausted. Metadata information ({}-dep-,-
maintenance,-inactive) has no expiry and thus cannot be evicted by Redis. It
must be maintained by HCL Cache maintenance processes.
In order to deal with metadata, the HCL Cache implements the following
maintenance processes. For more details see: Cache Maintenance.
Expired maintenance
When a key expires, Redis automatically removes it from memory. The Expired
Maintenance job is responsible for removing references from the metadata to the expired
key.
Low-memorymaintenance
When used memory reaches 100% of maxmemory, Redis starts evicting keys. Testing has
shown that full memory conditions can lead to errors such as "command not allowed
when used memory > 'maxmemory'". To prevent this situation, the HCL
Cache monitors memory usage and triggers jobs to reduce the size of each cache,
before the available memory is exhausted. The jobs remove both cache entries and their
associated metadata. The cache entries selected for removal are those that are sooner
to expire.
Inactive maintenance
This job is not required for memory maintenance, but helps reduce the memory
requirements by removing idle cache entries. Its design is very similar to that of expired
maintenance, but for cache entries that have not yet expired.
This page includes a list of commands and concepts that you might find useful when learning or
troubleshooting the cache system. For a complete list of comands check the Redis site.
This document assumes you installed Redis on the Kubernetes cluster with the Bitnami charts,
but the commands should work on all distributions.
Note: Changing the cache contents directly on the Redis database can break the consistency of
the cache. The supported way to operate with the cache is by using the Cache or Cache
Manager APIs.
Most Redis commands only apply to the local server. If you are running a cluster, you need to
first identify the server that contains the cache.
The namespace allows you to share Redis for multiple environments, and to distinguish between
Auth and Live. The prefix also contains the name of the cache and it is bracketed { }. For cluster
environments, this ensures all the cache contents are created on the same node. This is a
design decision for performance purposes.
The main two object types are -data (cache contents) and -dep, for dependencies. For example:
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_PathToken_
1=api:DC_PathToken_2=v2:DC_PathToken_3=categories:storeId=11:langId=-
1:contractId=-11005:depthAndLimit=11,11:UTF-8:requestType=GET"
"{cache-demoqalive-baseCache}-dep-storeId:categoryId:11:10506"
Querying the HCL Cache
The KEYS command can be used to inspect the contents of the Redis cache in
a TEST environment. This command should not be used in a live/ production
environment because it can lock the Redis thread. In production use the SCAN command
(and its variations) instead, because it retrieves data in chunks with a cursor.
The redis-cli interface provides a shortcut to run the SCAN command --scan that
automatically follows the cursor.
I have no name!@redis-master-0:/$ redis-cli --scan --pattern "{cache-
demoqalive-baseCache}-*"
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_Path
Token_1=api:DC_PathToken_2=v2:DC_PathToken_3=products:catalogId=11501:cont
ractId=-11005:langId=-1:partNumber=BD-BEDS-0002:storeId=11:UTF-
8:requestType=GET"
"{cache-demoqalive-baseCache}-dep-WCT+ESINDEX"
"{cache-demoqalive-baseCache}-dep-storeId:partNumber:11:LR-FNTR-0002"
"{cache-demoqalive-baseCache}-dep-storeId:categoryId:11:10506"
"{cache-demoqalive-baseCache}-dep-storeId:partNumber:11:BD-BEDS-0002"
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_Path
Token_1=api:DC_PathToken_2=v2:DC_PathToken_3=products:catalogId=11501:lang
Id=-1:partNumber=LR-FNTR-0002:storeId=11:UTF-8:requestType=GET"
"{cache-demoqalive-baseCache}-data-/search/resources/
org.springframework.web.servlet.DispatcherServlet.class:DC_Envtype:DC_Path
Token_1=api:DC_PathToken_2=v2:DC_PathToken_3=categories:storeId=11:langId=
-1:contractId=-11005:depthAndLimit=11,11:UTF-8:requestType=GET"
..
Cache keys (-data) are stored as HASH objects and contain the cached value along with
metadata.
The HGETALL command can be used to retrieve the contents of a cache entry:
For example:
127.0.0.1:6379> HGETALL
"{cache-demoqalive-baseCache}-data-/search/resources/org.springframework.w
eb.servlet.DispatcherServlet.class:DC_Envtype:DC_PathToken_1=api:DC_PathTo
ken_2=v2:DC_PathToken_3=categories:storeId=11:langId=-1:contractId=-
11005:id=10501,10516:UTF-8:requestType=GET"
1) "created-by"
2) "demoqalivequery-app-d7ff6c649-ch78j"
3) "created-at"
4) "1636061491787"
5) "dependencies"
6) "WCT+ESINDEX;;;WCT+FULL_ESINDEX;;;"
7) "value"
8) "\x04\x04\t>0com.ibm.ws.cache.servlet.FragmentComposerMemento\x00\x00\
x00\x00P\x00 \x02\x00\x00\x00\n>\x0eattributeBytes\x16\x00>\nattributes\
x16\x00>\x11characterEncoding\x16\x00>\x13consumeSubfragments \x00>\
x0bcontainsESI \x00>\x0bcontentType\x16\x00>\bcontents\x16\x00>\
x15externalCacheFragment\x16\x00>\x14externalCacheGroupId\x16\x00>\
x0boutputStyle#\x00\x16\x01B\x01\
t>4com.ibm.ws.cache.servlet.CacheProxyRequest$Attribute\xdb\x97\x83\x0cL\
xc2!\x1d\x00\x00\x00\x02>\x03key\x16\x00>\x05value\x16\x00\x16\x04;\xff>\
x15REST_REQUEST_RESOURCE>\x15/api/v2/categories?id\x01\x00\x00\x01B\x03\
x16\x04\t>0com.ibm.ws.cache.servlet.DefaultStatusSideEffect\xe2\xe3\xc1\
xc89\x19\x01y\x00\x00\x00\x01>\nstatusCode#\x00\x16\x00\x00\x00\xc8\x04\
t>)com.ibm.ws.cache.servlet.HeaderSideEffect\x8a\xc4#[9\xfb\xfc=\x00\x00\
x00\x03>\x04name\x16\x00>\x03set \x009\xf4\x16\x00\x16>\x0cContent-Type\
x00>\x10application/jsonC\a\xaf!{\"contents\":
[{\"name\":\"Kitchen\",\"identifier\":\"Kitchen\",\"shortDescription\":\"C
reate a kitchen that suits your needs and fits your
lifestyle\",\"resourceId\":\"https://www.demoqalive.andres.svt.hcl.com/
search/resources/api/v2/categories?
storeId=11&id=10516&id=10501&contractId=-11005&langId=-
1\",\"uniqueID\":\"10516\",\"parentCatalogGroupID\":\"/
10516\",\"thumbnail\":\"/hclstore/EmeraldCAS/images/catalog/kitchen/
category/dep_kitchen.jpg\",\"seo\":{\"href\":\"/
kitchen\"},\"storeID\":\"11\",\"sequence\":\"5.0\",\"fullImage\":\"/
hclstore/EmeraldCAS/images/catalog/kitchen/category/
dep_kitchen.jpg\",\"id\":\"10516\",\"links\":{\"parent\":{\"href\":\"/
search/resources/api/v2/categories?storeId=11&id=-1\"},\"children\":
[\"href: /search/resources/api/v2/categories?storeId=11&id=10518\",\"href:
/search/resources/api/v2/categories?storeId=11&id=10517\"],\"self\":
{\"href\":\"/search/resources/api/v2/categories?
storeId=11&id=10516\"}},\"description\":\"Create a kitchen that suits your
needs and fits your lifestyle\"},{\"name\":\"Living
Room\",\"identifier\":\"LivingRoom\",\"shortDescription\":\"Bring your
living space together with comfort and
style\",\"resourceId\":\"https://www.demoqalive.andres.svt.hcl.com/
search/resources/api/v2/categories?
storeId=11&id=10516&id=10501&contractId=-11005&langId=-
1\",\"uniqueID\":\"10501\",\"parentCatalogGroupID\":\"/
10501\",\"thumbnail\":\"/hclstore/EmeraldCAS/images/catalog/livingroom/
category/dep_livingroom.jpg\",\"seo\":{\"href\":\"/living-
room\"},\"storeID\":\"11\",\"sequence\":\"1.0\",\"fullImage\":\"/
hclstore/EmeraldCAS/images/catalog/livingroom/category/
dep_livingroom.jpg\",\"id\":\"10501\",\"links\":{\"parent\":{\"href\":\"/
search/resources/api/v2/categories?storeId=11&id=-1\"},\"children\":
[\"href: /search/resources/api/v2/categories?storeId=11&id=10503\",\"href:
/search/resources/api/v2/categories?storeId=11&id=10502\",\"href:
/search/resources/api/v2/categories?storeId=11&id=10504\"],\"self\":
{\"href\":\"/search/resources/api/v2/categories?
storeId=11&id=10501\"}},\"description\":\"Bring your living space together
with comfort and style\"}]}\x01\x01\x00\x00\x00\x01"
9) "expiry-at"
10) "1636075891787"
The TTL command shows the time-to-live remaining for an entry. When the time expires,
the entry is deleted by Redis.
127.0.0.1:6379> TTL
"{cache-demoqalive-baseCache}-data-/search/resources/org.springframework.w
eb.servlet.DispatcherServlet.class:DC_Envtype:DC_PathToken_1=api:DC_PathTo
ken_2=v2:DC_PathToken_3=categories:storeId=11:langId=-1:contractId=-
11005:id=10501,10516:UTF-8:requestType=GET"
(integer) 13609
Dependency ID information
Dependency information is stored in sets that link to cache-ids. Redis has multiple
commands to operate on sets. SCARD shows the size of the set (number of cache IDs
linked to the dependency ID).
127.0.0.1:6379> SCARD "{cache-demoqalive-baseCache}-dep-WCT+ESINDEX"
(integer) 9
SMEMBERS lists all the cache-ids for a dependency. This command should only be
used for small dependency IDs. For dependency IDs that can link to a large number
of cache-ids, SSCAN should be used instead.
Dependency IDs do not set an expiration, and as per the volatile-lru memory
management rule, they cannot be evicted, because this would result in missing
invalidations.
The HCL Cache also maintains other objects such as the cache
registry, {cache_registry-namespace} and {cache-namespace-cachename}-maintenance
which contain information used for maintenance.
Deletions
Invalidations
PUBSUB command
The PUBSUB CHANNELS command lists all the topics with an active subscriber.
I have no name!@redis-master-0:/$ redis-cli PUBSUB CHANNELS
1) "{cache-demoqaauth-services/cache/WCSearchFacetDistributedMapCache}-
invalidation"
5) "{cache-demoqalive-services/cache/WCSearchSTADistributedMapCache}-
invalidation"
6) "{cache-demoqalive-services/cache/SearchContractDistributedMapCache}-
invalidation"
7) "{cache-demoqalive-services/cache/WCCatalogGroupDistributedMapCache}-
invalidation"
8) "{cache-demoqaauth-services/cache/WCCatalogGroupDistributedMapCache}-
invalidation"
9) ...
SUBSCRIBE commmand
The SUBSCRIBE and PSUBSCRIBE commands start a topic listener and can be used to
monitor invalidations.
The following example uses PSUBSCRIBE to subscribe to all live caches. The use of
the --csv makes the output more readable.
I have no name!@redis-master-0:/$ redis-cli --csv PSUBSCRIBE "{cache-
demoqalive-*}-invalidation"
Reading messages... (press Ctrl-C to quit)
"psubscribe","{cache-demoqalive-*}-invalidation",1
"pmessage","{cache-demoqalive-*}-invalidation","{cache-demoqalive-
baseCache}-invalidation","[p:demoqalivecache-app-8597fc98cc-dm2rz]> inv-
cache-dep:product:10001"
"pmessage","{cache-demoqalive-*}-invalidation","{cache-demoqalive-
baseCache}-invalidation","[p:demoqalivecache-app-8597fc98cc-dm2rz]> inv-
cache-dep:product:10002"
PUBLISH commmand
The PUBLISH command is the counterpart of the SUBSCRIBE command. Although the
command is available and could be used to publish invalidation messages, the supported
and recommended way to test invalidations is by using the Cache Manager REST
services. This ensures the message format is preserved. For testing or learning
purposes, you can publish invalidations as follows:
PUBLISH "{cache-demoqaauth-baseCache}-invalidation" product:10002
PUBLISH "{cache-demoqaauth-baseCache1}-invalidation" inv-cache-clear
The Cache APIs add optional metadata information that help identify the source of the
invalidations, the time at which they were created and the intended consumers.
Although other topologies support the use of replicas, this document is written with clusters in
mind.
Failover scenarios
When the Redis cluster detects a master node is down, it initiates failover to one of the master's
replicas. Replicas use the replication process to mirror updates from the master node as they
happen.
In Kubernetes, when the previously crashed pod recovers and re-joins the cluster, it will switch to
a replica role of the master currently serving the slots (which used to be its replica).
If replicas are not used, Kubernetes will still detect (using probes) and restart unresponsive pods.
The slots served by the impacted master will be temporarily unavailable. Depending on the
duration of the outage, HCL CacheCircuit Breakers will activate. It might take a couple of minutes
for the Redis node to be available again. This time is extended when persistence is used, as
Redis needs to re-load the cache upon start, and the service is unavailable until the cache is
done loading.
Scalability scenarios
Besides their role for failover, replicas can increase scalability by handling GET/ read operations.
This frees resources on the master node and enables more efficient use of resources. The HCL
Cache Redis client can be configured to direct read operations to replicas using
the readMode configuration.
When replicas are used for read operations, the following consideration must be made:
The replication process introduces a delay. If a read operation happens immediately after
a write, the read might return stale data, or no data. This could introduce functional
issues for certain caches and customizations scenarios. The HCL Cache includes a
number of configurations that control whether reads are directed to masters or replicas,
and wait times for replications to complete.
If replicas are used for reads, both master and replica servers must be available for
optimal performance: An unavailable replica can lead to WAIT command timeouts
during PUT operations (syncReplicas, see below), and failed read (GET) operations
executed on the replicas. When the recovered master is restarted, it reconfigures itself a
replica and starts a new syncronization process. If a full synchronization is required, the
replica server might be unavailable for some time while the database is replicated. The
system might take longer to recover when read operations are offloaded to replicas.
Configurations
Replicas and the HCL Cache might require configuration changes in Redis, the Redis client or
the HCL Cache:
Redis configurations
cluster-replica-validity-factor: xxxxxxx
repl-diskless-sync
client-output-buffer-limit
The HCL Cache can be configured to issue read (GET) operations to replica servers with
the readMode setting.
HCL Cache configurations
The HCL Cache includes a number of advanced cache level configurations to control the
behaviour of PUT operations when replicas are used. These settings are more relevant
when readMode: SLAVE is used.
cacheConfigs:
cacheName:
remoteConfig:
forceReadFromMaster: [TRUE|false]
syncReplicas: [NULL| <number_of_replicas> OR all : timeout_ms]
limitSyncReplicasToNumberAvailable: [TRUE|false]
forceReadFromMaster
Redis benchmark
Using the redis-benchmark utility on a Redis server
Redis includes the redis-benchmark utility that simulates running commands done by N
clients while at the same time sending M total queries. The utility provides a default set
of tests, or you can supply a custom set of tests.
Usage: redis-benchmark [-h <host>] [-p <port>] [-c <clients>] [-n <requests]> [-k <boolean>]
-r <keyspacelen> Use random keys for SET/GET/INCR, random values for SADD
Using this option the benchmark will expand the string __rand_int__
specified range.
-t <tests> Only run the comma separated list of tests. The test
You need to have a running Redis instance before launching the benchmark. You can
run the benchmarking utility like so:
redis-benchmark -q -n 100000
Running only a subset of the tests
You don't need to run all the default tests every time you execute redis-benchmark. For
example, to select only a subset of tests, use the -t option as in the following example:
This example runs the tests for the SET and LPUSH commands and uses quiet mode
(see the -q switch).
This is obtained by using the -r switch. For instance if I want to run one million SET
operations, using a random key for every operation out of 100k possible keys, I'll use
the following command line:
$ redis-cli flushall
OK
50 parallel clients
3 bytes payload
keep alive: 1
$ redis-cli dbsize
(integer) 99993
Using pipelining
By default every client (the benchmark simulates 50 clients if not otherwise specified
with -c) sends the next command only when the reply of the previous command is
received, this means that the server will likely need a read call in order to read each
command from every client. Also RTT is paid as well.
Redis is a server: all commands involve network or IPC round trips. It is meaningless to
compare it to embedded data stores, because the cost of most operations is primarily
in network/protocol management.
Redis commands return an acknowledgment for all usual commands. Some other data
stores do not. Comparing Redis to stores involving one-way queries is only mildly
useful.
Naively iterating on synchronous Redis commands does not benchmark Redis itself, but
rather measure your network (or IPC) latency and the client library intrinsic latency. To
really test Redis, you need multiple connections (like redis-benchmark) and/or to use
pipelining to aggregate several commands and/or multiple threads or processes.
Redis is an in-memory data store with some optional persistence options. If you plan to
compare it to transactional servers (MySQL, PostgreSQL, etc ...), then you should
consider activating AOF and decide on a suitable fsync policy.
Redis is, mostly, a single-threaded server from the POV of commands execution
(actually modern versions of Redis use threads for different things). It is not designed to
benefit from multiple CPU cores. People are supposed to launch several Redis instances
to scale out on several cores if needed. It is not really fair to compare one single Redis
instance to a multi-threaded data store.
The redis-benchmark program is a quick and useful way to get some figures and evaluate
the performance of a Redis instance on a given hardware. However, by default, it does
not represent the maximum throughput a Redis instance can sustain. Actually, by using
pipelining and a fast client (hiredis), it is fairly easy to write a program generating more
throughput than redis-benchmark. The default behavior of redis-benchmark is to
achieve throughput by exploiting concurrency only (i.e. it creates several connections
to the server). It does not use pipelining or any parallelism at all (one pending query
per connection at most, and no multi-threading), if not explicitly enabled via the -
P parameter. So in some way using redis-benchmark and, triggering, for example,
a BGSAVE operation in the background at the same time, will provide the user with
numbers more near to the worst case than to the best case.
To run a benchmark using pipelining mode (and achieve higher throughput), you need
to explicitly use the -P option. Please note that it is still a realistic behavior since a lot
of Redis based applications actively use pipelining to improve performance. However
you should use a pipeline size that is more or less the average pipeline length you'll be
able to use in your application in order to get realistic numbers.
The benchmark should apply the same operations, and work in the same way with the
multiple data stores you want to compare. It is absolutely pointless to compare the
result of redis-benchmark to the result of another benchmark program and
extrapolate.
/SETUP/hcl-cache/ Config You can create this file to extend and override the
cache_cfg-ext.yaml Map configuration in cache_cfg.yaml. This file has the
same format as cache_cfg.yaml
The Redis client connection information is stored in redis_cfg.yaml. This file contains
details about the topolgy (standalone, cluster, etc), the connection information
(hostname, TLS, authentication), pools and timeouts. See Redis Client Configuration for
details.
Cache configurations
Caches are configured in two main files: cache_cfg.yaml exists on all contains and
defines default values. Specific values for default caches, and cache_cfg-ext.yaml can
be used to update defaults or set new configurations for custom caches. See: Cache
Configuration.
Configurations in Kubernetes
Cache uses a technique whereby the contents of the files are stored in Kubernetes
Configuration Maps (configmaps). This allows for updates to be made without
having to create custom images for each environment, as for example the Redis
hostname might be different from environment to environment. Pods are configured
to load the config map during initialization and make its contents available as
regular files.
Configurations in Helm
The configmaps are originally created during installation of the Helm chart. The original
values are defined in the HCL Cache section of values.yaml. These values can be
updated as required.
hclCache:
configMap:
cache_cfg_ext: |-
redis:
enabled: true
line untouched
cacheConfigs:
...
redis_cfg: |-
singleServerConfig:
...
To update the configuration, the recommended approach is to update the chart, and
perform Helm Update. If the configmap is updated directly, pods must be manually
restarted to load the updated values.
Note: Change to one release (auth/ share/ live) might need to be repicated to the others.
After installation, you can inspect the contents of the configmap as follows:
demo-qa-auth-demoqa-hcl-cache-config 2 15d
demo-qa-live-demoqa-hcl-cache-config 2 15d
demo-qa-share-demoqa-hcl-cache-config 2 15d
config
Custom caching
HCL Cache extends the capabilities of DynaCache and introduces remote caching. Therefore,
additional configuration options are available for custom caches. Custom caches can be
configured using the Cache Configuration Yaml file for extensions.
Custom caches are declared in the WebSphere configuration and accessed with
the DistributedMap interface. Migrated custom caching code does not require modification to use
the HCL Cache.
The size of a cache is used as the starting point for local caching. Disk offload is not available --
remote caching is instead recommended. See Local and Remote Caching for details.
When custom caches are added with the Transaction server Run Engine commands run-
engine command, they are by default automatically mapped to the HCL Cache cache
provider.
Liberty containers
Custom caches defined in the configDropins/overrides directory must explicitly specify
the HCL Cache cacheProviderName as in the example below:
<?xml version="1.0" encoding="UTF-8"?>
<server>
<distributedMap id="services/cache/CustomCache1"
memorySizeInEntries="2000" memorySizeInMB="100" cacheProviderName="hcl-
cache"/>
</server>
The HCL Cache is traditionally used for caching scenarios, where if entries are not found
in the cache, they can be regenerated by the application. With the incorporation of the
remote cache that allows for large amounts of data storage, the HCL Cache can also be
used as temporary in-memory database. In its default configuration, the HCL
Cache implements maintenance processes that remove cache entries when needed to
avoid out of memory conditions. This could lead to the loss of cache entries.
If the objects stored in the cache cannot be regenerated, the Low Memory
Maintenance process for the specific cache must be disabled to avoid data loss:
services/cache/MyCustomCache:
remoteCache:
onlineLowMemoryMaintenance:
enabled: false
Low Memory Maintenance can continue to work on other caches. If the caches that
disable Low Memory Maintenance require a significate amount of memory, the memory
made available (maxmemory) might need to be retuned. The Redis persistence options
might also need to be updated to a more durable configuration (e.g. enable AOF and
RDB)
// Invalidate by dependency id
myCustomCache.invalidate("dependencyId1");
In addition to APIs to clear and invalidate cached data, the Cache Manager includes APIs that
can be used to retrieve cache entry and dependency details for debugging information.
cache=baseCache'
Metho
d Path Description
Table 2. Invalidate and clear REST API:APIs to clear and invalidate caches:
Metho
d Path Description
Metho
d Path Description
GET /cache Returns a list of all the registered caches and current sizes.
GET /cache/id/byIds Returns cache entry details for the specified ID (for
debugging).
The Cache Manager pod must be enabled during installation in values.yaml by configuring
with enabled: true.
cacheApp:
name: cache-app
enabled: true
For high availability, you might choose to run redundant cache manager pods.
Cache Manager can be accessed with port-forwarding or by enabling Ingress. The Swagger
API is available under the path /openapi/ui/#/.
Port forwarding
40901:40901
2. Access the HCL Cache Manager Swagger/ API using localhost and
path /openapi/ui/#/.
https://localhost:40901/openapi/ui/#/
Ingress
Ingress access can optionally be enabled in values.yaml for both authoring and live
environments. The cache manager endpoints do not implement authentication. Only
enable access through ingress definitions that are internal and restricted.
cache:
auth:
enabled: true
}}
live:
enabled: true
}}
Monitoring
The HCL Cache Manager makes available additional remote-only APIs, which are
used from the HCL Cache - Remote dashboard:
Metric Use
Utilities
The Cache Manager pod also makes available a number of cache utilities for
benchmarking, debugging, and configuration. They are available under
the /SETUP/hcl-cache/utilities/ directory. For more information, see HCL
Cache utilities.
Monitoring
Debugging a complex distributed system without the support of metrics and monitoring can be a
challenging task. The Prometheus and Grafana integration gives you visibility into the number
and performance of all cache operations and maintenance processes, which can enable you to
quickly narrow down the problem.
Cache manager
The Cache Manager includes a number of debug APIs to retrieve details about the caches and
cached data. See Cache Manager for details.
Redis database
Redis is a database, and provides a command interface (redis-cli) and commands that can be
used to query it and retrieve information about the existing cache keys and metatada. For details
see HCL Cache in Redis.
Tracing
The following string is used to trace the operation of the HCL Cache:
com.hcl.commerce.cache*=all
If enabled at the fine level instead, the HCL Cache will create a less verbose output, with timing
and invalidation details.
Troubleshooting scenarios
Troubleshooting Near-Real-Time
(NRT) index building
Some operations, such as updating a product description, trigger a Near-Real-Time (NRT) delta
build index with NiFi. uses invalidation messages to notify the NiFi server of the event. This
document will help you troubleshoot the aspects of the NRT process.
Troubleshooting steps
Confirm the Transaction Server is sending messages
When the operation is performed, the Transaction Server will write the event on
the WCNifiDistributedMapCache channel. Use the SUBSCRIBE command to confirm
the event is being written to the queue:
redis-cli subscribe
"{cache-demoqaauth-services/cache/WCNifiDistributedMapCache}-invalidation"
If the SUBSCRIBE command does not capture any events, there might be a problem with
the Redis connection from the Transaction Server, or the specific event might not be
configured to do so.
The NiFi server registers PUBSUB listeners with Redis to receive the events.
Use the PUBSUB CHANNELS command on all the master servers to confirm NiFi's
listeners are enabled:
redis-cli pubsub channels | grep -i nifi
{cache-demoqaauth-services/cache/WCNifiDistributedMapCache}-invalidation
{cache-demoqaauth-services/cache/WCNifiBatchDistributedMapCache}-
invalidation
The next step is to use tracing in NiFi to confirm that the NRT listeners are triggering
when events are posted to the channels
(WCNifiDistributedMapCache and WCNifiBatchDistributedMapCache).
DynaCache offers in-memory caching, disk off-loading, cache replication, and Servlet and JSP
fragment caching. DynaCache also offers a pluggable architecture that enables the use of
different cache providers (such as HCL Cache, Redis and IBM WebSphere eXtreme Scale) while
maintaining access with a consistent set of interfaces such
as DistributedMap and cachespec.xml.
HCL Cache is installed as a DynaCache cache provider, which enables its use through
DynaCache interfaces without code changes. HCL Cache provides the following:
When a cache (default or custom) that is configured with the HCL Cache provider is accessed,
DynaCache defers the processing to the custom provider. HCL Cache interacts with the local
and remote caches, and replicates invalidations as required according to the configuration.
Local caching
The behaviour of local caching is similar to that of the traditional DynaCache caches, with some
important differences:
The use of local caching requires the replication of cache clear and invalidation
messages to ensure stale or outdated content is removed from caches on other
containers. In HCL Commerce Version 9.0 this was achieved with the use of Kafka.
When HCL Cache is enabled with Redis, invalidations are handled automatically by the
framework. See Invalidations for details.
Monitoring capabilities
HCL Cache implements a comprehensive set of metrics for local caches to support
monitoring, debugging and tuning. The metrics enable tracking of cache sizes (by
number of entries and memory footprint in MB), hit ratios, cache operations and internal
removals per second (expiry, inactivity, explicit removal and LRU eviction). Local cache
metrics can be tracked with the "HCL Cache - Local Cache Details", and "HCL Cache -
Local Cache Summary" dashboards. See Monitoring for details.
HCL Cache: Local Cache Details -
dashboard:
HCL Cache caches do not support disk offload. Disk offload configurations in WebSphere
DynaCache are ignored. To scale beyond the local JVM memory limits, local caches are
designed to be used in conjunction with remote Redis caches.
Remote caching
This is the default configuration. Local caches act as a near-cache for remote caches,
keeping copies of the most recently accessed cache entries. These cache entries can be
served directly from the local container, without making remote calls to the Redis servers,
improving performance and reducing overhead on remote servers.
When local and remote caches are used together, the caching flow is as follows:
Existing in
Cache Miss Local Existing in Remote
4- Remote PUT
Local-only caching
The primary reason for disabling remote caching is if the objects stored in the cache are
not serializable. If custom caches store objects that are not serializable, remote caching
should be disabled for the cache in the configuration. See Custom caching for details.
Local-only caches must still use Redis for replication of invalidations.
Remote-only caching
Remote-only caching might be desirable for caches that store frequently updated objects,
when changes must be immediately available to all containers. For example, changes to
user session data must be immediately available to all containers. Disabling local caches
eliminates the risk of reading stale data due to timing issues with invalidations. Examples
of default caches that are configured as remote-only include caches for Precision
Marketing and Punch-out integration.
Automatic memory footprint tuning in
local caches
Local caches reside in the local
application server Java Virtual
Machine (JVM). Each local cache
holds a number of cache entries, and
each cache entry has a cache ID, a
cache value, and a list of
dependency IDs. Controlling the
memory footprint of local caches is
important, since larger caches can
improve performance, but a cache
that is too large can lead to low or out
of memory conditions.
Each local cache has a configured maximum number of cache entries it can hold
(memorySizeInEntries) and an optional maximum memory footprint (memorySizeInMB). For
example, in a WebSphere Application Server V8.5.5 Liberty configuration server.xml file, the
following line configures the memory footprint of the HCL Cache with JNDI
name services/cache/SearchQueryDistributedMapCache:
<distributedMap id="services/cache/SearchQueryDistributedMapCache"
cacheProviderName="hcl-cache" memorySizeInEntries="30012" memorySizeInMB="400"
lowThreshold="95" highThreshold="98"/>
By default, HCL Cache automatically increase or decrease their memory footprint according to
how much JVM heap is available. When the JVM heap utilization is below 65% used, HCL local
caches will increase their maximum sizes up to 400% of their configured sizes, and conversely
when the JVM heap is more than 75% utilized, they will decrease their maximum sizes down to
10% of their configured sizes. In this way, HCL local caches take advantage of available free
memory, while helping to avoid low or out of memory conditions.
Screen capture from the HCL Cache - Local Cache Details dashboard. For more details,
see Monitoring.
The automatic memory footprint feature provides configurations that can be used for advanced
troubleshooting or tuning scenarios. See Cache Configuration for details of updating the HCL
Cache configuration.
By default, caches can increase their maximum sizes when used JVM memory is less
than 65% of the maximum heap size, and will decrease their maximum sizes when used
JVM memory is more than 75% of the maximum heap size.
globalLocalCache:
localCacheTuning:
tightMemoryPercentUsedThreshold: 75
normalMemoryPercentUsedThreshold: 65
Configuring minimum and maximum scale factors
By default, caches will not increase their maximum sizes to more than 400% of their
configured maximum sizes, and will not decrease their maximum sizes to less than 10%
of their configured maximum sizes.
globalLocalCache:
localCacheTuning:
maxScaleFactor: 400
minScaleFactor: 10
Disabling automatic memory footprint tuning
HCL local cache can calculate the memory footprint of cache entries when they contain
values composed of typical java objects. When other objects are encountered, the
calculated memory footprint may be inaccurate. Specify "reportUnsizeable: true" to log
an information message when HCL Cache is unable to calculate an accurate object
memory footprint. The default value of this configuration setting is false.
globalLocalCache:
reportUnsizeable: false
Invalidation support
Local caches require a mechanism for replication of invalidation messages to ensure that local
cache entries associated with an invalidation ID are removed from all containers.
In IBM Websphere Commerce Version 8, the IBM Data Replication Service (DRS) is integrated
with WebSphere DynaCache and performs the job of replicating invalidation messages. In HCL
Commerce Version 9.0, Kafka is used to send invalidation messages. With HCL Cache with
Redis in HCL Commerce Version 9.1, replication of invalidation messages is handled within
DynaCache by the HCL Cache Provider. HCL Cache automatically issues invalidation messages
when clear and invalidate operations are issued for a cache that enables local caching. The
same cache on other containers implement a listener, and when the invalidation messages are
received, the indicated invalidate and clear operations are performed on the local cache.
HCL Cache relies on Redis PUBSUB technology (Elasticsearch-based search solution uses
Redis PUBSUB for Near Real-Time (NRT) updates) to replicate invalidation messages. For more
information on Redis PUBSUB for NRT updates, see The Elasticsearch index lifecycle. Each
cache defines a topic, with format {cache-namespace-cacheName}-invalidation where
invalidation messages are issued and received.
The Redis database provides commands, that allow you to list listeners (PUBSUB CHANNELS),
publish (PUBLISH) and subscribe to messages (SUBSCRIBE). See Invalidation in Redis for
details.
Timing considerations
Sending and receiving invalidation messages using Redis is fast, but not instantaneous.
Consider an HCL Commerce request that executes in the Transaction serverts-app and makes
a change to data in the database. Immediately after the database transaction commits, the local
cache is invalidated and invalidation messages are sent to peer application servers. Meanwhile,
the application may execute a subsequent request that expects to use the updated data. When
the messages are received by the peer servers, they are immediately processed and the local
cache is invalidated according to the messages received. But between the time that the
messages are sent and the time that the local caches are invalidated, the data in the local
caches is "stale". If the subsequent request is received in a peer server before the cache
invalidation has completed, it will see the stale data, perhaps causing incorrect processing to
occur.
To help avoid accessing stale data due to this situation, the HCL Commerce data cache provides
optional configurations to introduce a short delay in the original Transaction server request, just
after the invalidation messages are sent, and before the request returns. If the delay is long
enough to allow the invalidation messages to be completely processed in peer application
servers, the timing problem can be avoided.
If a circuit breaker detects a Redis server is failing, it prevents new requests to the Redis server
for a period of time. Circuit breakers are used in addition to high availablity configurations
provided by Kubernetes and Redis itself, such as replicas.
Local Caches
As invalidation messages can neither be sent nor received during a Redis outage, local
caches implement a shorter timeout for new and existing entries. By default, the timeout
during outages is configured to five minutes.
Remote Only Caches
Remote only caches become unavailable during a Redis outage.
Circuit breaker configurations can be adjusted using the Cache YAML configuration.
The configuration for the circuit breaker is available in the cache YAML file
under redis, circuitBreaker. The maximum timeout for local caches in outage
mode is configured using the maxTimeToLiveWithRemoteOutage element
under localCache, as in the following example:
redis:
circuitBreaker:
scope: auto
retryWaitTimeMs: 60000
minimumFailureTimeMs: 10000
minimumConsecutiveFailures: 20
minimumConsecutiveFailuresResumeOutage: 2
cacheConfigs:
defaultCacheConfig:
localCache:
enabled: true
maxTimeToLiveWithRemoteOutage: 300
Defa
Setting ult Use
maintenance
implements a number of required maintenance processes.
To support features such as invalidation by dependency ID, the maintains metadata information
for each cache entry. This metadata cannot be expired or evicted by Redis because this would
lead to inconsistencies such as missed invalidations.
Expired maintenance
The maintenance jobs can add overhead to the Redis servers. It is important
that performance test environments accurately simulate production
environments, exercising the maintenance processes in a similar manner. For
example, if the production environment typically fills up the Redis memory,
the performance environment should do the same. Short tests (e.g. one hour
in duration) might not be long enough to simulate expired and inactivity
maintenance processing conditions.
numCacheIdPerLUACall
This is the maximum number of cache entries that will be inspected and processed by a
LUA script. Increasing the number speeds up maintenance but can also block the Redis
thread for a longer period.
numLUACallsInPipeline
The number of LUA scripts that are sent together as a batch. The Redis thread is only
locked during each individual script execution.
Expired maintenance
(onlineExpiredEntriesmaintenance)
The speed of maintenance adjusts depending on the age of the oldest expired entry. For
example, if the maintenance process finds cache entries that have been expired for
seven minutes, it will use the maintenance configuration for objects from 5-8 minutes,
which cleans at a rate of 20/ second.
newerThan: 180 secs ( 3 mins) inLUA: 1 pipeline: 1 delayMs: 60000 --
speed: 0/sec, 1/min
newerThan: 300 secs ( 5 mins) inLUA: 2 pipeline: 5 delayMs: 500 --
speed: 20/sec, 1,200/min
newerThan: 420 secs ( 7 mins) inLUA: 3 pipeline: 5 delayMs: 125 --
speed: 120/sec, 7,200/min
newerThan: 540 secs ( 9 mins) inLUA: 5 pipeline: 5 delayMs: 100 --
speed: 250/sec, 15,000/min
newerThan: 720 secs ( 12 mins) inLUA: 5 pipeline: 5 delayMs: 50 --
speed: 500/sec, 30,000/min
newerThan: 960 secs ( 16 mins) inLUA: 5 pipeline: 5 delayMs: 25 --
speed: 1,000/sec, 60,000/min
newerThan: ~ ALL ~ inLUA: 5 pipeline: 5 delayMs: 12 --
speed: 2,083/sec, 125,000/min
For details on updating the configuration see Updating the default maintenance values.
Due to differences in architecture, Redis Enteprise does not make used memory
statistics available to the application. But this is the trigger the low memory maintenance
process uses to determine when and how much maintenance is required. As a result,
with Redis Enterprise, the softMaxSize configuration must be manually configured for
each cache to define a maximum size in number of entries.
Low memory maintenance default
configurations
The default configurations are as follows. For details on updating the configuration
see Updating the default maintenance values .
Defaul
Configuration t Use
Inactivity maintenance
(onlineInactiveEntriesMaintenan
ce)
cacheConfigs:
defaultCacheConfig:
remoteCache:
onlineExpiredEntriesMaintenance
:
...
onlineLowMemoryMaintenance:
...
onlineInactiveEntriesMaintenanc
e:
See Cache Configuration for details about how these settings can be applied to custom or default
caches.
Compression
HCL Cache provides the option to use a compression algorithm, LZ4, on the cache key values.
Caches with large keys, such as JSP caching in baseCache can benefit from compression.
Compression reduces the size of the keys in Redis, and reduces network traffic, but it can
increase CPU usage on the client containers. You might see no benefit from enabling
compression on caches with small keys.
Sharding
Sharding is available from HCL Commerce Version 9.1.10. The number of shards defaults to
one. To enable sharding, add the shards configuration under the remoteCache element for a
particular cache, with a value higher than one.
cacheConfigs:
baseCache:
remoteCache:
shards: 3
When sharding is enabled, the cache is internally partitioned by the number of shards specified.
For example, if three shards are specified, three shards are created. Regardless of the number
of shards, invalidation processing is still handled at the cache level; each shard processes all
invalidations for its cache.
{cache-demoqalive-baseCache}-invalidation
{cache-demoqalive-baseCache:s=0}-(dep|data)
{cache-demoqalive-baseCache:s=1}-(dep|data)
{cache-demoqalive-baseCache:s=2}-(dep|data)
Because each shard is assigned a unique hash slot, sharding is typically used with Redis
Clustering, since it allows each shard or cache segment to be handled by a different Redis node.
Sharding can be helpful for caches that might overwhelm a single Redis node, either due to their
memory footprint, or the amount of load/operations they generate. baseCache is an example of a
cache that might benefit from sharding.
In a Redis cluster environment, the slot assignment is done considering the namespace, cache
name and shard number. It is not guaranteed that the shards will be evenly distributed across the
Redis nodes, but this issue can be overcome by increasing the number of shards or by using slot
migration.
Inactivity
To enable removal of inactive cache entries, specify enabled: true and specify a
number of minutes using the inactivityMins configuration.
cacheConfigs:
baseCache:
remoteCache:
onlineInactiveEntriesMaintenance:
enabled: true
inactivityMins: 30
See Cache Maintenance for details on how inactivity maintenance is performed, and
can be monitored and tuned.
Local caches support inactivity at the cache entry level. Inactivity can be configured using
the cachespec.xml, or programatically with the DistributedMap interface. Inactivity set
by DynaCache is used for local caching only and does not impact the inactivity process
of the remote cache, which must be enabled independently.
Inactivity Maintenance runs from all containers, while Low Memory Mainteance is
active from only a single container at any one time.
Low Memory Maintenance must remove a percentage of the cache, and it does
so by selecting entries that are sooner to be expired, even if they may have high
reuse. However, Inactivity Maintenance only removes inactive entries, helping to
retain other high reuse entries.
The HCL Cache has special configurations called Cache directives that are used with
cache entries to skip local or remote caching. Caches that enable local and remote
caching, and deal with entries that might not see reuse, may benefit from these
configurations. Examples include REST calls that specify searchTerm, faceting and
pagination. Caches that create many cache entries that are not reused can be inefficient.
Disabling remote caching for those caches can help reduce remote cache memory
footprint. From HCL Commerce Version 9.1.10, you can choose to allow these entries in
the remote cache, while relying on Inactivity processing to remove inactive entries.
The QueryApp container implements the skip-remote directive for certain "searchTerm"
caches in cachespec.xml. If you enable Inactivity for baseCache, consider allowing
these caches to use the remote cache, by customizing the cachespec.xml file to
remove the following snippets:
<component id="DC_HclCacheSkipRemote" type="attribute">
<required>true</required>
</component>
smallCacheClearOptimizationMaximumSize is available
since 9.1.6.
cacheConfigs:
baseCache:
remoteCache:
smallCacheClearOptimizationMaximumSize:
50000
HCL Cache configurable
Prometheus metrics
The HCL Cache provides cache level configurations to customize the metrics created for the
Prometheus integration.
Although changes are not typically required, if you are integrating with a 3rd-party monitoring
system and there is a cost associated with the retrieval or storage of metrics, these
configurations can be used to fine-tune the metrics to be used.
Cache configurations
Metrics are configurable at the cache level. Changes can be applied to a single cache, or to the
default configuration using defaultCacheConfig. See cache configuration for details.
The Timer metrics used by the HCL Cache support histograms for the calculation of
percentiles. The tracking of histogram values requires the definition of additional metrics.
This support can be disabled to reduce the amount of metrics created.
hclcache_cache_clears_total{cachespace="demoqaauth",name="baseCache",scope
="local",} 100.0
hclcache_cache_clears_duration_seconds_sum{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 1.3296758
hclcache_cache_clears_duration_seconds_max{cachespace="demoqaauth",name="b
aseCache",result="ok",scope="remote",} 0.0897587
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="1.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="3.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="5.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="7.0E-4",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.001",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.003",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.005",} 0.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.01",} 23.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.05",} 99.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.1",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="0.5",} 100.0
hclcache_cache_clears_duration_seconds_bucket{cachespace="demoqaauth",name
="baseCache",result="ok",scope="remote",le="+Inf",} 100.0
hclcache_cache_clears_duration_seconds_count{cachespace="demoqaauth",name=
"baseCache",result="ok",scope="remote",} 100.0
The default histogram configuration is as follows:
defaultCacheConfig:
metrics:
timerNanoBuckets:
- 100000 # 0.1 ms
- 300000 # 0.3 ms
- 500000 # 0.5 ms
- 700000 # 0.7 ms
- 1000000 # 1.0 ms
- 3000000 # 3.0 ms
- 5000000 # 5.0 ms
- 10000000 # 10.0 ms
- 50000000 # 50.0 ms
- 100000000 # 100.0 ms
- 500000000 # 500.0 ms
Values are in nanoseconds.
Closed-loop cycle
The workload defines how the performance of a system is evaluated. A workload should have
the following characteristics:
Measurable: A metric that can be quantified, such as throughput and response time.
Reproducible: The same results can be reproduced when the same test is run multiple
times.
Static: The same results can be achieved no matter how long you execute the run.
Representative: The workload realistically represents the stress to the system under
normal operating considerations.
Improving performance is always a matter of identifying where the bottleneck is, and changing
your system configuration to avoid it. Monitoring system performance and identifying problems
are the most essential skills to ensure good performance of the system. All tools have strengths
and weaknesses. Some tools might alter the flow and the timing of the applications, but they
provide information to the developer and the system administrator, such as Rational Application
Developer's Profiling function. Other tools have minimal impact on the overall system but provide
little information, or offer specific information that might not be helpful to identify the source of the
problem.
To save time, adopt a top-down approach by changing the system level first, followed by the
application level, and then to the programming level. By removing the inefficiencies from the top
level, the underlying problems at the lower level might be minimized.
Levels Tuning
The system level consists of components such as processors, memory subsystem, network
configuration, and disk subsystem. Bottlenecks at this level are easier to identify and address by
modifying the hardware configuration or operating system level optimization.
Closed-loop cycle
The closed-loop cycle is a method for implementing the top down tuning approach. This method
prescribes the way to gather and analyze data, come up with ideas for resolving issues,
implement enhancements, and test results. The process is driven by data, and the results from
one iteration of the loop drive the next iteration of the loop.
When you are considering how to tune your Data Load process, ensure that you review the Data
Load summary reports that generate after you run the Data Load utility. These reports can be
used to identify what elements of the Data Load process that require tuning to improve
performance.
Before you begin
Ensure that you are familiar with and understand the following concepts and tasks that are
related to the Data Load utility:
To tune your Data Load utility process, you can tune the following processes, parameters, and
caches:
Validation options
ID resolver cache
Network latency
Database tuning
When used with CSV formatted data, Data Load can be run with multiple writer threads. This can
drastically increase load performance for large sets of data that are formatted with considerations
made for parallelization.
For more information on Data Load parallelization, see Data Load parallelization.
The Data Load mode parameter is used to set the type of load process that the Data Load utility
is to run. You can set this mode to be Insert, Replace, or Delete in the wc-dataload.xml file
for the data you are loading. Typically, Replace is used, however Insert and Delete can run
faster. Running the utility in Insert or Delete mode does not require as many Ids to be resolved
with the ID resolver utility. When you are using Insert or Delete, ensure that these actions are the
only database operations that are required by your CSV file.
You can run a file difference preprocess for routine data loads to improve the Data Load utility
performance for loading these files. By using this preprocessor that you can compare two input
files, such as a previously loaded file and a new version of this file. The preprocessor generates
a difference file that contains only the records in the new file that are not within the old file or that
are changed from the records in the old file. The Data Load utility can then load this difference
file. If your routinely loaded files contain many previous loaded records, then running this file
difference can result in shorter load times. Running a file difference can reduce the loading time
that is required to load your routine updates to your HCL Commerce database, reduce server
usage time, and improve server performance.
You can configure the Data Load utility file difference preprocessor to compare files by the
values in each column, instead of entire records, to identify the changed records. You can also
configure the file difference preprocessor to ignore specific columns when the process is
comparing files.
For more information about this preprocessor, see Data Load file difference preprocessing.
Validation options
Configuring the Data Load utility to validate the data that you are loading can affect your Data
Load performance. The validation of the data your are loading is performed against your HCL
Commerce database. If you are validating many records, or the validation process encounters
many invalid records, this validation process can affect performance. By default, the following
validations options are available as configurable properties for the Data Load utility:
attributeValueValidation
Indicates whether to validate the attribute value. The attribute value is mandatory except
within a product and defining attribute relationship.
validateAttribute
Validates whether a SKU and a product have compatible defining attributes when the
SKU is moved under the product. The validation logic determines whether the attributes
or allowed values that are to be created, updated, or deleted belong to the current store.
validateCatalog
Validates whether more than one master catalog is being created for a store. If the store
supports sales catalogs, the validation checks whether a catalog entry belongs to more
than one master category. The validation also checks whether an attribute allowed value
can be set to default in the current store.
validateCatalogEntry
Validates whether to check the types of the SKU and product when the Data Load adds a
SKU under a product. This check is to make sure that the SKU is really a SKU and the
product is really a product.
validateCatalogGroup
Validates whether a catalog group belongs to a specified catalog.
validateUniqueDN
Validates the uniqueness of the distinguished name (DN) to identify a user in CSV file. By
default, to optimize data load performance, users in the CSV file are identified by the
logon ID instead of the distinguished name.
If the data that you are loading does not require any of the
validation processes to occur, ensure that the configurable
property is set false to ensure that no validation performs.
ID resolver cache
If the ID resolver cache size is large enough, the cache
can store all of the required IDs for a database table that
data is being loaded into. If the size of this cache is not
large enough to store all of the required Ids, then none of
the IDs are cached. When the IDs are not cached, the
data load process requires that the IDs are resolved
directly against the database. When you are configuring
your cache setting, consider the following behavior:
The time that it takes to fetch and load the IDs for
a table into the ID resolver cache
Change the Data Load utility batch size and commit count
parameters to reduce the effect of network latency and
reduce the processing load on the Transaction server.
The commit count must be a multiple of the batch size.
The commit count parameter specifies the number of rows
that are flushed in a single transaction before a commit is
issued. Database transactions are loaded in batches.
These batches are kept in the Java memory until there
are enough rows stored for a flush. Then, the batch
contents are stored to the database as a single packet of
data. The batches are stored in the database until the
next commit occurs and the database loads the changes.
Network latency
Database tuning
By tuning the database, you can improve the performance
of the Data Load utility by reducing the time that is
required to commit records. You can view the time that is
required to commit loaded data records in the Data Load
utility summary report. There are many performance
tuning tools available for improving database
performance. For more information about database
performance tuning, see:
Note: HCL Commerce ships with default DB2 optimization settings, such as optimization levels
and optimization profile registries. It is highly recommended that you thoroughly test any changes
that are made to the optimization settings in a production-like environment, before you use them
in a production system. Changing the optimization settings can affect the overall performance of
the application, either immediately or later, such as when the data volume increases or the data
distribution changes.
Considerations for the physical environment are related to how the data is spread among the
disks and how the memory is managed for the databases.
Layout on disk
Reading from the database and writing back to the database (disk I/O) can become a bottleneck
for any application that is accessing a database. Proper database layout can help reduce the
potential for this bottleneck. It is a significant effort to change the physical layout of the database
once it is created, hence proper planning at the initial stages is important.
The first consideration is to ensure that the DB2 transaction logs reside on their own physical
disk. Every update that is issued against the database is written to the logs (in addition to being
updated in memory). Hence there is a lot of disk I/O in the location where the DB2 transaction
logs reside. It is a good practice to try to ensure that all read/write activity on the disk is related
only to the transaction logs, thus eliminating any I/O contention with other processes that might
access the disk.
To set the location for the DB2 transaction logs, issue the following command:
Before the logs are stored in the location that is specified, disconnect all sessions, or deactivate
the database by issuing the db2 deactivate command.
The second consideration in terms of disk layout is to determine how to manage the table spaces
efficiently. One performance principle in the management of Relational Database Management
Systems (RDBMs) is to separate the database table data and database index data onto different
physical disks. This separation enables better query performance, since index scans can execute
in parallel with data fetch operations because they are on different physical disks.
In DB2, automatic storage table spaces are used by default. That is, the System Managed
Storage (SMS) and Database Managed Storage (DMS) table space types are deprecated for
user-defined permanent table spaces and might be removed in a future release.
Automatic storage table spaces are the easiest table spaces to set up and maintain, and are
recommended for most applications. They are particularly beneficial when:
You have larger tables or tables that are likely to grow quickly
You do not want to have to make regular decisions about how to manage container
growth.
You want to be able to store different types of related objects (for example, tables, LOBs,
indexes) in different table spaces to enhance performance.
For more information, see Comparison of automatic storage, SMS, and DMS table spaces.
Memory
DB2 associates memory for the database through buffer pool objects. A buffer pool has a page
size that is associated with it and is linked to one or more table spaces. Thus, if table spaces of
different page sizes are created, then buffer pools corresponding to the different page sizes are
required.
While you can create multiple buffer pools having the same page size, it is recommended that
only one buffer pool per page size be created, for the most efficient usage of memory on the
database server.
The question is always, how much memory to assign to the buffer pools. For DB2 32-bit
implementations, there is a limit, based on the operating system, that can be available for buffer
pools.
Assuming a dedicated database server, allocate a large proportion of memory available on the
server, about 75% to 80%, but not exceeding the platform limits.
Note that for 64-bit implementations of DB2, the limits are increased. In this case, the buffer pool
hit ratio would need to be monitored to determine the optimal setting for the buffer pools. You can
also monitor the hit ratio for 32-bit implementation using database snapshots using the following
command:
The output that is generated contains some statistics on buffer pool logical and physical reads:
In this output, DLR, DPR, ILR, and IPR have actual values. The hit ratio can be computed using
the following formula:
The size of the buffer pool can be changed using the ALTER BUFFERPOOL command, or
the BUFFPAGE parameter if the size of the buffer pool is set to -1.
There are many parameters to consider for performance. This section describes a subset of
these that are considered important for HCL Commerce implementations. To set the values for
the parameters, the following command can be used:
The database heap (DBHEAP) contains control block information for database objects (tables,
indexes, and buffer pools), as well as the pool of memory from which the log buffer size
(LOGBUFSZ) and catalog cache size (CATALOGCACHE_SZ) are allocated. Its setting depends
on the number of objects in the database and the size of the two parameters mentioned.
In general, the following formula can be used to estimate the size of the database heap:
The log buffer is allocated from the database heap, and is used to buffer writes to the transaction
logs for more efficient I/O. The default size of this setting 128 4K pages. A recommended starting
point for the log buffer size (LOGBUFSZ) in HCL Commerce implementations is 256.
When you are considering values for the transaction log file size (LOGFILSIZ) and the number of
primary (LOGPRIMARY) and secondary (LOGSECOND) logs, some generalizations for OLTP
applications can be applied. A high number of short transactions are typical in OLTP systems,
hence the size of the log file should be relatively large, otherwise more processing time is spent
managing log files, rather than writing to the transaction logs. A good starting point for the size of
the log file in HCL Commerce implementations is to set the value to 10000.
Primary log files are allocated when the database is activated, or on the first connect. If a long
running transaction fills up all the primary logs, then secondary logs are allocated as needed until
the LOGSECOND limit is reached. The allocation of a secondary log file is a significant
performance hit, and should be minimized if it cannot be avoided.
To determine the right settings for these parameters, you need to monitor the database and see
whether secondary log files are being allocated. If they are, then you need to increase the
number of primary log files. You can monitor by taking a database snapshot and look for the
following two lines:
A good starting point for the number of primary log files (LOGPRIMARY) is anywhere from 6 - 10.
Parameters related to disk I/O
In addition to physical disk layout, several tuning parameters can be manipulated to affect disk
I/O. Two key parameters are NUM_IOSERVERS and NUM_IOCLEANERS.
NUM_IOSERVERS specifies the number of processes that are launched to prefetch data from
disk to the buffer pool pages. To maximize read parallelism, this parameter should be set to the
number of physical disks that are being used by the database, to enable reading from each disk
in parallel.
NUM_IOCLEANERS specifies the number of processes that are launched to flush dirty buffer
pool pages to disk. To maximize usage of system resources, this parameter should be set to the
number of CPUs on the system.
The frequency of how often dirty buffer pool pages are flushed to disk can be influenced by the
CHNGPGS_THRESH parameter. Its value represents the limit, in the form of a percentage, that
a buffer pool page can be dirty before a flush to disk is forced. For OLTP applications, a lower
value is recommended. For HCL Commerce implementations, the value should be set to 40.
One final parameter to consider is MAXFILOP. It represents the maximum number of files DB2
can have open at any time. If this value is set too low, valuable processor resources are taken up
to open and close files. This parameter needs to be monitored to be set to the correct value, but
a good starting point is to set this value to 128. You can monitor by taking a database snapshot
and looking at the following line:
If the value monitored is greater than zero, then the value for this parameter should be increased.
Reducing locking contention is key to performance. Several parameters exist to influence locking
behavior. The total amount of memory available to the database for locks is defined by the
LOCKLIST parameter. The MAXLOCKS parameter defines the maximum amount of memory
available for each connection to the database. It is represented as a percentage of the
LOCKLIST.
The size for both of these parameters need to be adjusted to avoid lock escalations. A lock
escalation occurs when all of the memory available to a connection is used, and multiple row
locks on a table are exchanged for a single table lock. The amount of memory that is used for the
first lock on an object is 72 bytes, and each additional lock on the same object is 36 bytes.
A good starting value for LOCKLIST can be approximated by assuming that a connection
requires about 512 locks at any time. The following formula can be used:
MAXLOCKS can be set to 10 - 20 to start. Further monitoring is necessary to adjust both of these
values. In the database snapshot output, look for the following lines:
Lock list memory in use (Bytes) = 432 Lock escalations = 0 Exclusive lock
escalations = 0
If lock escalations occur (value higher than 0), increase the locklist to minimize the escalations
and increase the MAXLOCKS value to increase the limit of how much of the LOCKLIST a
connection can use.
Best practices
Here are some of the most common best practices for any IBM DB2 UDB implementation.
When a high number of inserts, updates, or deletes operations are performed against a table in
the database, the physical placement of the rows and related indexes might not be optimal. DB2
provides a utility to reorganize data for a table:
DB2 also provides a utility to check whether a table or index data needs to be organized. While
connected to a database, the following command can be issued:
db2 REORGCHK
This command checks all tables in the database and produce a listing, first by table and second
by index. In the listing, an asterisk('*') in any of the last three columns implies that the table or
index requires a REORG.
Collecting statistics
Each SQL statement that is submitted to the database is parsed, optimized, and a statement
access plan is created for execution. To create this access plan, the optimizer relies on table and
index statistics. In order for the optimizer to generate the best access plan, up-to-date statistics
are required. Collecting statistics frequently (or at least when a significant amount of data
changes) is a good practice.
Gathering statistics on the HCL Commerce schema object helps the database choose
the best execution plan for SQL queries. When you run an SQL query, the database
converts the query into an execution plan, and chooses the best way to retrieve data. For
the Oracle database to choose the best execution plan for an SQL query, it relies on
these statistics information about the tables and indexes in the query. Choosing the best
execution plan for your SQL queries helps improve the performance of the database,
improving HCL Commerce performance.
You are recommended to use the DBMS_STATS package instead of the ANALYZE
command to gather your database statistics. From the SQLPlus prompt, run the following
commands:
exec dbms_stats.gather_database_stats;1
exec
dbms_stats.gather_schema_stats( ownname=>'schema_name',granularity=>'ALL',
DEGREE=>3, OPTIONS=>'GATHER',CASCADE=>TRUE);2
1. The first command line gathers statistics for the entire database.
2. The second command line gathers statics for a schema; where schema_name is
the name of your HCL Commerce schema.
For more information about using the DBMS_STATS package, see the Oracle
documentation.
Review and verify your need for indexes on order-processing related database tables
and other tables where block contention occurs during high-peak workloads.
For an Oracle database, high-peak workloads might adversely affect performance during
order processing. This effect can occur due to block contention on an index that is
defined for the ORDERS database table or other tables that are queried or updated
frequently during high-peak workloads. For instance, contention can occur on an index
that is defined for the STORE_ID column of the ORDERS table. If the index where block
contention occurs does not provide significant performance benefits for your site, such as
to improve queries against the associated table, consider dropping the index.
Before you drop the index, verify through performance reports, such as Automatic
Workload Repository (AWR) reports, that the benefits for your site from keeping the index
are insignificant. If you have verified that the benefits for keeping the index are not
significant, you can drop the index.
The Data Load utility supports loading data into a workspace. When you load data into a
workspace you can make and preview changes to managed assets, without affecting what is
running on your site.
The following general user roles interact with the Data Load utility:
Business user
Site administrator
Responsible for the day-to-day operation of the Data Load utility.
The following diagram describes how the user roles interact with the data
load utility:
1. The business user provides the developer with the business data.
5. The site administrator sets the store and database settings in the
environment configuration file.
8. The site administrator runs the Data Load utility along with the three
configuration files (environment, load order, and business object
configuration files) to load the formatted source data into the HCL
Commerce database. After the utility runs, the site administrator also
verifies the results of the load.
For more information about how the Data Load process is structured,
see Data Load utility architectural overview. For more information about how
the Data Load utility works and what components are included in the process,
see Data Load utility framework process and components.
To use the Data Load utility to load data into your HCL Commerce database,
you must first configure or create the required files. To run the Data Load
utility a business object configuration file, load order configuration file,
environment configuration file, and data source file are all required. For more
information, see Configuring and running the Data Load utility.
Best practices
When you are using the Data Load utility to load data, follow these best
practices:
If the data you are loading is catalog, price, inventory, member, or Commerce
Composer:
You must configure the Data Load utility to use a business object
mediator to map your input file data with the appropriate HCL
Commerce business objects. By default, you are provided with
business object mediators for loading data for business objects in the
following components:
o Catalog
o Inventory
o Member
o Location
o Commerce Composer
o Promotions
o Marketing
To load other data, you can use
the com.ibm.commerce.foundation.dataload.businessobjectme
diator.TableObjectMediator, or you can create your own custom
business object mediators.
Use the utility in replace mode to delete catalog objects when your
site uses HCL Commerce search. To delete objects with the utility in
replace mode, include the value 1 for the Delete column of an object
in your input file. If you do decide to delete catalog objects with the
utility in delete mode, run a full index rebuild after the load operation
completes.
Site File format A data load input file contains the actual information that
Administra for Data the Data Load utility populates into your database. Learn
tor Load input about how to construct such files to ensure that the
files loading process is successful.
Configurin If you routinely load the same generated Data Load input
g the Data file from an external system or source, you can choose to
Load run a file difference preprocess. You can run this
utility to preprocess as part of the Data Load process to ensure
run a file that you are loading only new changes when you load
difference your newest input file.
preproces
s
Configurin The CVS data reader is already provided with the Data
g the CSV Load utility. Learn how to configure the provided CSV
data data reader to change the way data is read from your CSV
reader input files.
The following table lists topics by role and main task.
Configurin The data load order configuration file controls the load
g the data order of the Data Load utility. Learn how to configure
load order your data load order file.
Configurin You can configure a column exclusion list that causes the
g a column Data Load utility to avoid loading data into the specified
exclusion columns of a table.
list
Data Load Learn how to run the utility command that runs the Data
utility Load process
command
syntax
Verifying Learn how to verify that a load operation with the Data
The following table lists topics by role and main task.
Loading Load data into workspaces with the data load utility
data into
workspace
s using the
Data Load
utility
Loading Load data for attribute values for attributes with a single
values for value and attributes with multiple values
single and
multiple
value
attributes
members
by email
address
with the
Data Load
utility
Samples Scenario: The initial load is the scenario when you finish creating
Initial load and configuring a new HCL Commerce instance; you then
load your initial data into the HCL Commerce database.
Load
utility
data
Developer Data Load An understanding of how the Data Load utility works, and
utility the components that make up the Data Load utility.
architectu
ral
overview
the Data
Load
utility
Important:
Performance tuning of the Data Load utility and your data is required to utilize this feature
effectively. This feature can reduce overall Data Load performance. You must carefully
consider the relationship between parallelization, its configuration, the type of data, and
its particular structure.
In previous versions of HCL Commerce, the data load utility was a single threaded application
that was constrained by singleton classes which were not designed for parallel usage. This
design limited the use of the utility for some large data jobs, hamstringing the performance of the
tool. In some instances, users of the tool can find their ability to get work done impaired by long
running jobs. With this new upgrade to the data load utility, multiple users of the tool can load
data concurrently. In addition, shorter jobs can allow for future jobs to be run sooner.
Architecture
The architectural enhancements made to the data load utility include the addition of a queue
where the reader of the CSV file creates batches of data to be processed. The queue has a
maximum size, which when reached temporarily halts the reader from further production of
batches. The reader thread will continue to enter batches of data into the queue as the batches
are consumed. After all of the data is read from the input file, the reader thread will place an
empty batch into the queue, and then exit with a data load summary report.
Each writer thread will remove one batch from the queue and process the batch to load data into
the database. When a writer thread gets an empty batch from the queue, it will place the empty
batch back into the queue and the writer thread will exit with a data load summary report.
Until all writer threads finish and exit, the Data Load utility will check if there are any errors from
each writer thread. If there are reprocessing error CSV files created by the writer threads, the
Data Load utility will merge all error reprocess CSV files into a single error reprocess CSV file,
and then reload this CSV file using a single writer thread. Once all writer threads finish, the Data
Load utility will produce a combined data load summary report.
Due to the complex nature of loading hierarchical data with multiple threads, error handling must
be carefully considered when enabling and configuration parallelization. This is especially true
when it comes to performance tuning for your particular environment and dataset.
Warning: If your data contains many line items that reference the same data, SQL deadlocks
can occur. Ensure that the data you are loading is clean of duplicate or contradictory entries, and
structured in a way that avoids the potential for multiple writer threads from writing to the same
SQL entries.
Use the existing data load parameters commitCount, batchSize and maxError per
LoadItem, to ensure your data load utility performance is dialed in.
Format your data to leverage parallelization appropriately. For example, if your data
contains hierarchical data, place parent data together towards the beginning of the file.
This will reduce the chances of attempting to load child data for which parent data is not
yet present.
By default, the data load utility is set to run in single-thread mode. This ensures the same
expected job behavior and performance as users have come to expect. The following new
parameters have been added to control the parallelization of the data load utility.
A sample data load configuration that includes the full use of these parameters is available here.
Valu
e Default
Parameter type value Description
numberOfThread Integ 1 The maximum number of individual writer threads that take
s
er batches of data from the queue, process them in order, and
write the processed data into the database. By default,
the numberOfThreads parameter is set to 1, meaning the data
load utility should run in single threaded (legacy) mode.
Index modulesedit
Index Modules are modules created per index and control all aspects related to an index.
Index Settingsedit
static
They can only be set at index creation time or on a closed index, or by using the update-index-
settings API with the reopen query parameter set to true (which automatically closes and reopens
impacted indices).
dynamic
Changing static or dynamic index settings on a closed index could result in incorrect settings that are
impossible to rectify without deleting and recreating the index.
Below is a list of all static index settings that are not associated with any specific index module:
index.number_of_shards
The number of primary shards that an index should have. Defaults to 1. This setting can only be set at
index creation time. It cannot be changed on a closed index.
The number of shards are limited to 1024 per index. This limitation is a safety limit to prevent
accidental creation of indices that can destabilize a cluster due to resource allocation. The limit can
be modified by specifying export ES_JAVA_OPTS="-Des.index.max_number_of_shards=128" system
property on every node that is part of the cluster.
index.number_of_routing_shards
5 → 10 → 30 (split by 2, then by 3)
5 → 15 → 30 (split by 3, then by 2)
5 → 30 (split by 6)
This setting’s default value depends on the number of primary shards in the index. The default is
designed to allow you to split by factors of 2 up to a maximum of 1024 shards.
In Elasticsearch 7.0.0 and later versions, this setting affects how documents are distributed across
shards. When reindexing an older index with custom routing, you must explicitly
set index.number_of_routing_shards to maintain the same document distribution. See the related
breaking change.
index.codec
The default value compresses stored data with LZ4 compression, but this can be set
to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower
stored fields performance. If you are updating the compression type, the new one will be applied
after segments are merged. Segment merging can be forced using force merge.
index.routing_partition_size
The number of shards a custom routing value can go to. Defaults to 1 and can only be set at index
creation time. This value must be less than the index.number_of_shards unless
the index.number_of_shards value is also 1. See Routing to an index partition for more details about
how this setting is used.
index.soft_deletes.enabled
[7.6.0] Deprecated in 7.6.0. Creating indices with soft-deletes disabled is deprecated and will be
removed in future Elasticsearch versions.Indicates whether soft deletes are enabled on the index.
Soft deletes can only be configured at index creation and only on indices created on or after
Elasticsearch 6.5.0. Defaults to true.
index.soft_deletes.retention_lease.period
The maximum period to retain a shard history retention lease before it is considered expired. Shard
history retention leases ensure that soft deletes are retained during merges on the Lucene index. If a
soft delete is merged away before it can be replicated to a follower the following process will fail due
to incomplete history on the leader. Defaults to 12h.
index.load_fixed_bitset_filters_eagerly
Indicates whether cached filters are pre-loaded for nested queries. Possible values are true (default)
and false.
index.shard.check_on_startup
Expert users only. This setting enables some very expensive processing at shard startup and is only
ever useful while diagnosing a problem in your cluster. If you do use it, you should do so only
temporarily and remove it once it is no longer needed.
Elasticsearch automatically performs integrity checks on the contents of shards at various points
during their lifecycle. For instance, it verifies the checksum of every file transferred when recovering
a replica or taking a snapshot. It also verifies the integrity of many important files when opening a
shard, which happens when starting up a node and when finishing a shard recovery or relocation.
You can therefore manually verify the integrity of a whole shard while it is running by taking a
snapshot of it into a fresh repository or by recovering it onto a fresh node.
This setting determines whether Elasticsearch performs additional integrity checks while opening a
shard. If these checks detect corruption then they will prevent the shard from being opened. It
accepts the following values:
false
Don’t perform additional checks for corruption when opening a shard. This is the default and
recommended behaviour.
checksum
Verify that the checksum of every file in the shard matches its contents. This will detect cases where
the data read from disk differ from the data that Elasticsearch originally wrote, for instance due to
undetected disk corruption or other hardware failures. These checks require reading the entire shard
from disk which takes substantial time and IO bandwidth and may affect cluster performance by
evicting important data from your filesystem cache.
true
Performs the same checks as checksum and also checks for logical inconsistencies in the shard, which
could for instance be caused by the data being corrupted while it was being written due to faulty
RAM or other hardware failures. These checks require reading the entire shard from disk which takes
substantial time and IO bandwidth, and then performing various checks on the contents of the shard
which take substantial time, CPU and memory.
Below is a list of all dynamic index settings that are not associated with any specific index module:
index.number_of_replicas
index.auto_expand_replicas
Auto-expand the number of replicas based on the number of data nodes in the cluster. Set to a dash
delimited lower and upper bound (e.g. 0-5) or use all for the upper bound (e.g. 0-all). Defaults
to false (i.e. disabled). Note that the auto-expanded number of replicas only takes allocation
filtering rules into account, but ignores other allocation rules such as total shards per node, and this
can lead to the cluster health becoming YELLOW if the applicable rules prevent all the replicas from
being allocated.
index.search.idle.after
How long a shard can not receive a search or get request until it’s considered search idle. (default
is 30s)
index.refresh_interval
How often to perform a refresh operation, which makes recent changes to the index visible to search.
Defaults to 1s. Can be set to -1 to disable refresh. If this setting is not explicitly set, shards that
haven’t seen search traffic for at least index.search.idle.after seconds will not receive background
refreshes until they receive a search request. Searches that hit an idle shard where a refresh is
pending will trigger a refresh as part of the search operation for that shard only. This behavior aims
to automatically optimize bulk indexing in the default case when no searches are performed. In order
to opt out of this behavior an explicit value of 1s should set as the refresh interval.
index.max_result_window
The maximum value of from + size for searches to this index. Defaults to 10000. Search requests take
heap memory and time proportional to from + size and this limits that memory. See Scroll or Search
After for a more efficient alternative to raising this.
index.max_inner_result_window
The maximum value of from + size for inner hits definition and top hits aggregations to this index.
Defaults to 100. Inner hits and top hits aggregation take heap memory and time proportional to from
+ size and this limits that memory.
index.max_rescore_window
The maximum value of window_size for rescore requests in searches of this index. Defaults
to index.max_result_window which defaults to 10000. Search requests take heap memory and time
proportional to max(window_size, from + size) and this limits that memory.
index.max_docvalue_fields_search
The maximum number of docvalue_fields that are allowed in a query. Defaults to 100. Doc-value
fields are costly since they might incur a per-field per-document seek.
index.max_script_fields
The maximum number of script_fields that are allowed in a query. Defaults to 32.
index.max_ngram_diff
The maximum allowed difference between min_gram and max_gram for NGramTokenizer and
NGramTokenFilter. Defaults to 1.
index.max_shingle_diff
The maximum allowed difference between max_shingle_size and min_shingle_size for
the shingle token filter. Defaults to 3.
index.max_refresh_listeners
Maximum number of refresh listeners available on each shard of the index. These listeners are used
to implement refresh=wait_for.
index.analyze.max_token_count
The maximum number of tokens that can be produced using _analyze API. Defaults to 10000.
index.highlight.max_analyzed_offset
The maximum number of characters that will be analyzed for a highlight request. This setting is only
applicable when highlighting is requested on a text that was indexed without offsets or term vectors.
Defaults to 1000000.
index.max_terms_count
The maximum number of terms that can be used in Terms Query. Defaults to 65536.
index.max_regex_length
The maximum length of regex that can be used in Regexp Query. Defaults to 1000.
index.query.default_field
(string or array of strings) Wildcard (*) patterns matching one or more fields. The following query
types search these matching fields by default:
Multi-match
Query string
Defaults to *, which matches all fields eligible for term-level queries, excluding metadata fields.
index.routing.allocation.enable
index.routing.rebalance.enable
index.gc_deletes
The length of time that a deleted document’s version number remains available for further versioned
operations. Defaults to 60s.
index.default_pipeline
Default ingest pipeline for the index. Index requests will fail if the default pipeline is set and the
pipeline does not exist. The default may be overridden using the pipeline parameter. The special
pipeline name _none indicates no default ingest pipeline will run.
index.final_pipeline
Final ingest pipeline for the index. Indexing requests will fail if the final pipeline is set and the
pipeline does not exist. The final pipeline always runs after the request pipeline (if specified) and the
default pipeline (if it exists). The special pipeline name _none indicates no final ingest pipeline will
run.
You can’t use a final pipeline to change the _index field. If the pipeline attempts to change
the _index field, the indexing request will fail.
index.hidden
Indicates whether the index should be hidden by default. Hidden indices are not returned by default
when using a wildcard expression. This behavior is controlled per request through the use of
the expand_wildcards parameter. Possible values are true and false (default).
Analysis
Control over where, when, and how shards are allocated to nodes.
Mapping
Merging
Control over how shards are merged by the background merge process.
Similarities
Configure custom similarity settings to customize how search results are scored.
Slowlog
Control over how slow queries and fetch requests are logged.
Store
Translog
History retention
Indexing pressure