Informix Tuning
Informix Tuning
Informix Tuning
Contents
[hide]
1 Introduction
2 Tuning Checkpoints
2.2 The Wise: Making The Best of Checkpoints, Foreground Writes & LRUs
2.2.1.1 onstat -R
2.2.1.2 onstat -F
2.2.1.3 onstat -p
3.4 References
4.4 References
5.2.1.2 DS_MAX_QUERIES
5.2.1.3 DS_TOTAL_MEMORY
5.2.1.4 DS_MAX_SCANS
5.4 References
6.1.2 Priority
6.1.3 Threshold
6.6 References
Introduction
Tuning a database server, not just Informix, is a major undertaking. It requires
familiarity with:
Tuning Checkpoints
In IDS version 10 and older, checkpoints can cause visible system performance
issues. The most common manifestation of a checkpoint problem are long wait
times to confirm perform write operations in the system.
Some examples of this type of transactions in the RRT application are:
BUFFERPOOL
size=2k,buffers=10000,lrus=10,lru_min_dirty=1.0,lru_max_dirty=10.00
The values in the parameter are used to control how and when the database
engine will use the memory area allocated by the LRU buffers.
size=2k - represents the page size that this buffer will be used for. IDS allows
multiple page sizes to be configured in the system, so in those cases the
BUFFERPOOLS will have to be defined for each page size. For the purposes
of this document we will assume a system with one page size, namely 2K.
buffers=10000 - represents the number of pages that will be kept in RAM. The
hardware needs to have enough physical RAM to accommodate this amount
of memory. In this case, 20MB will be allocated (10000 * 2K page)
lrus=10 - the number of groups the database will separate the 10000 buffers.
Each group will be handled by a separate worker that will read/write pages
to/from disk. This number can be tweaked, but it is usually set to values
related to the number of disks the database is stored on. With RAID and SAN
configurations, this physical disk information is obscured, so the value in this
parameter is set between 10-50 depending on the size of the hardware
configuration. Start with 20 and tune from there.
lru_max_dirty=10.00 - This parameter represents a percent value based on
the number of pages specified in the buffers parameter. When the number of
dirty pages (pages that contain modified data ready to be written to disk)
reaches this percent value, the database will start writing those dirty pages to
disk. For example: In a 10000 buffer configuration, 10.00% equals 1000
pages. So when the application updates/deletes/inserts enough data to use up
1000 pages, the database engine will start writing those pages to the physical
disk. Any number less than 10% of the buffers parameter will stay as dirty
pages in the LRU memory.
lru_min_dirty=1.00 - Like the lru_max_dirty parameter, this is a percentage
value based on the buffers number. This percentage value controls when the
database engine will stop writing the dirty pages from the LRU memory onto
disk. So, to follow up from the example in the lru_max_dirty example, 1% of
10000 pages is 100 pages. So when the LRU reaches 1000 dirty pages, the
database will start writing those to disk. Once it writes 900 pages, and the LRU
dirty page count is down to 100, the database will stop writing the pages to
disk and it will leave them in the LRU memory.
The Brute: Checkpoints
Checkpoints essentially do the same thing as LRUs, however they do it on a
regular interval set by the onconfig parameter CKPTINTVL. The parameter
specifies the number of seconds at which the database will do an automatic
cleanup of the dirty pages in the LRU memory area. The default value of this
parameter is 300, or 5 minutes.
This means, that if the application doesn't modify enough pages to trigger the
LRU writing of dirty pages to disk, lru_max_dirty, then the checkpoint is left
responsible to write those pages to disk. This is a problem as the checkpoint
operation will suspend all other operations on the database, while it's cleaning up
the dirty pages in the LRU write buffers.
The LRU dirty page cleanup does not cause this suspend condition, so it's
best to make sure the checkpoint is not left dealing with the dirty LRU
pages.
NOTE: There are other conditions that trigger checkpoints, please look at the info
provided in the IBM online manuals for IDS. This document is addressing the
problems caused by LRU activity so only that aspect of the checkpoints will be
discussed.
$ onstat -R | tail -4
9916 dirty, 2000000 queued, 2000000 total, 2097152 hash buckets, 2048
buffer size
start clean at 30.000% (of pair total) dirty, or 12500 buffs dirty, stop
at
20.000%
Monitor this command for a few minutes, while watching the online.log file in
another terminal sessions. That way you will see how many dirty pages there are
when the checkpoint operation starts working. You will recognize the checkpoint
operation working when the online.log output shows something like this:
$tail -f $INFORMIXDIR/*/online.log
20:11:09 Checkpoint Completed: duration was 3 seconds.
20:11:09 Checkpoint loguniq 32069, logpos 0x13a0018, timestamp: 0xbd88b117
Note the number of dirty pages from the onstat -R command, when the
checkpoint entry shows up in the online.log file. That number can be used to
determine what is the maximum % value in the lru_max_dirty parameter.
Here we see that there are 9916 dirty pages waiting to be written to disk, and
there are total of 2,000,000 pages in the LRU buffer memory. That is 0.4958%.
So to trigger the LRU to clean up this before the checkpoint gets to it, the
lru_max_dirty setting would have to be at most 0.48.
To make sure we don't get too close to the checkpoint, 1/2 of that value is
probably a safer setting to begin with, so something like 0.25 is a better first try.
There is an important point to be noted here. The % used to determine when the
engine starts writing the pages from an LRU is applied to the individual LRU
queues, not to the overall buffer pool.
So for the above example, with 48 LRU queues and 2 million pages, each queue
will get about 41,600 pages. So the 0.25% value at which an individual queue will
be flushed is 41,600 * 0.25% = 100 pages. So any queues that manage to stay
under 100 pages will not be flushed, and will be left for the checkpoint to take
care of.
If the 48 queues are all at 96 dirty pages, that can still leave 4600 dirty pages for
the checkpoint to deal with. So keep that in mind when dealing with the
optimization of these values.
onstat -F
The second command will show how many times the system used the LRU
automatic processing of dirty pages (good) and how many times it had to let the
checkpoint do the cleanup (not good)
$ onstat -F
FG Writes (see section on Foreground Writes) are even worse than the LRU
Writes so it's good there is 0 of those. However, in this example, the LRU writes
are also 0, and the Chunk Writes (checkpoint writes) are the largest number,
meaning the automatic LRU cleanup is not kicking it at all.
The BUFFERPOOL setting on the system the above examples are taken from is
as follows:
BUFFERPOOL
size=2k,buffers=2000000,lrus=48,lru_min_dirty=20,lru_max_dirty=30
Meaning that the LRU cleanup will only kick in at 30% dirty pages which is about
600,000 pages, or 1.2GB of changed data in 5 minutes. Bottom line: the
checkpoint is left with all the work.
To make sure the system doesn't leave the checkpoint with all the dirty pages, the
BUFFERPOOL setting would be better at:
BUFFERPOOL
size=2k,buffers=2000000,lrus=48,lru_min_dirty=0.0,lru_max_dirty=0.25
so that the cleanup is done outside of the checkpoint processing, and the system
is not suspended.
The lru_min_dirty is set at 0, so that once the LRU cleaners start they handle all
of the outstanding dirty pages in the buffers. This setting can be changed if the
system is spending too much time flushing the LRUs, however it is a good starting
value.
onstat -p
This command will report several values about the operations of the database
engine. The ones important for this document are the numckpts and the
ckpwaits.
$ onstat -p
Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits
%cached
53792905 143808046 68046147018 99.92 8099807 30614827 548449520
98.65
Hashing Overview
An example of hashing would be to put all tables in a memory bucket based on
the first letter of the table name. So all tables starting with 'a' would be in the
a_bucket, starting with 'b' in the b_bucket, and so on.
So when an application query needs to use the column names of a table named
'airports', it will know which bucket to go into and look for the cached table
schema.
The hashing method of lookup has the advantage of a quick determination of the
bucket containing the data. However, once the bucket is determined, finding the
actual table withing the bucket is a simple comparison of the names of the tables
in that bucket.
Even with this simple hashing example, the problem becomes obvious in cases
where most of the tables in the system are named with names starting with the
letter 'a':
In this case, the a_bucket will end up containing many tables in it.
To find if a table is actually in the a_bucket, a name comparison will have to be
done many times.
This is a very expensive operation when queries are expected to execute in
tens of milliseconds.
So making sure the buckets stay small is very important. This is where the DD
onconfig parameter become useful.
DD_HASHSIZE 1051
DD_HASHMAX 4
The important part is to make sure that the DD is not filling up to over it's
configured capacity. If the DD is using up all the slots() in all the buckets() it will
have to remove tables from the cache in order to introduce the new ones being
requested.
In the above example configuration, the DD is setup to be able to handle up to
310 entries, however because of the type of use the system is under, there are
375 entries in the DD cache.
This is a very inefficient scenario and should be avoided as much as possible.
When increasing the DD parameters in the onconfig, work on increasing the
DD_HASHSIZE parameter. Making the DD_HASHMAX larger will make the
buckets contain more tables in them, which will cause the lookup of individual
tables by name to slow down.
References
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.perf.doc/perf1
12.htm
NOTE: This setting was introduced into IDS 10 with a minor bug which causes it
to leave the VP to allocate unlimited memory, causing the database engine to
grow it's memory use without stop. If you set this parameter to a specific value,
this problem disappears. More info on this problem is available at: http://www-
03.ibm.com/developerworks/blogs/page/gbowerman?entry=memory_leak_in_ids_
10 From our experience, the 64 bit version of the engine does not have this issue,
however this is not confirmed by IBM, so it's best to set this parameter to some
value.
Recommended Configuration
Start with a value of 20MB, and then monitor the system to make sure the
memory is used properly.
You can set this parameter on an already running engine using the command:
And make sure you set the parameter in the onconfig file for permanent effect:
VP_MEMORY_CACHE_KB 20000
References
The IBM online documentation on this feature -
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp?topic=/com.ibm.p
erf.doc/perf76.htm
The discussion on the minor bug - http://www-
03.ibm.com/developerworks/blogs/page/gbowerman?entry=memory_leak_in_ids_
10
The DS_NONPDQ_QUERY_MEM online documentation -
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.adr
ef.doc/adref82.htm
The DS_NONPDQ_QUERY_MEM setting discussion -
http://www.webservertalk.com/archive221-2006-4-1458075.html
Tuning Update Statistics & Effects of
Parallel Data Query (PDQ)
When the database engine executes queries from the application, it has to
determine the best way to gather the data that the query is requesting.
In order to determine the fastest and least resource intensive execution for a
query, the database engine keeps information (statistics) about the table and
index data in the database.
This information describes how the data is stored in the tables, and how the
indexes are laid out in regards to being helpful in the execution of queries.
As the data in the tables changes (new rows inserted, rows deleted and updated)
the statistics about the data layout change, so the engine needs to refresh the
information it keeps on the date in all the tables.
The task responsible for this refreshing of statistics is the "UPDATE STATISTICS"
operation. Making sure the statistics are properly kept, will make a major
difference in the way the application functions. Make sure you review this section
carefully, and that you review the References section for in-depth information
about this important aspect of the database engine performance.
echo "update statistics low for table drop distributions;" | dbaccess rrtdb
echo "set PDQPRIORITY 0; update statistics for procedure;" | dbaccess rrtdb
(see the following sections on PDQ for the explanation of the PDQPRIORITY
parameter)
If you would like to use other methods for updating the statistics, please contact
RRT to discuss the impact to the application performance and operation.
Data mining
In depth reports
Long running database transactions (minutes to hours)
Very few active clients.
The RRT application will in general not benefit from heavy use of PDQ. If you are
going to change configuration parameters related to PDQ and it's directs effects
on the RRT application functionality, please contact us before making the
changes in a production system.
ONCONFIG Parameters
MAX_PDQPRIORITY
Set in the onconfig file, ranges from 0 to 100, representing percent of resources
used for parallel queries.
Having this parameter set to 50 will give you the ability to use the parallel query
feature for administrative tasks, as the PDQPRIORITY parameter is the
controlling value for how each database client session uses the PDQ feature.
If you don't have any need for the PDQ functionality, setting this parameter to a
low value, or even turning it off (set to 0) should be considered.
DS_MAX_QUERIES
This parameter limits the number of simultaneous queries which will be processed
as parallel queries using the PDQ optimization. A query will be counted against
this parameter value if it has it's PDQPRIORITY set to a non-zero value, using the
environment variable PRQPRIORITY, or the session parameter with the same
name.
As the RRT application does not benefit from PDQ optimizations in majority of it's
operation, this parameter should be set to a relatively small value. A good starting
value is 20 and then adjust as needed.
DS_TOTAL_MEMORY
This parameter controls the maximum amount of memory that the database
engine will dedicate to processing queries whose PDQPRIORITY parameter is
set to a non-zero value.
This parameter becomes relevant for the RRT application only in relation to the
DS_NONPDQ_QUERY_MEM value. In order to allow the non-PDQ queries to
benefit from the available memory on your server, set the DS_TOTAL_MEMORY
value to 1/2 of the total memory you allocate to the database with the
SHMVIRTSIZE parameter, and then monitor the database operation to adjust the
value accordingly.
DS_MAX_SCANS
This parameter specifies the maximum number of PDQ scan threads the
database engine will allow. Start with a value of 100 as the parameter is not
relevant to the general operation of the RRT application.
DS_NONPDQ_QUERY_MEM (IDS 10)
See
Informix_Tuning#The_DS_NONPDQ_QUERY_MEM_Parameter_.28IDS_10.29
Example PDQ Configuration
SHMVIRTSIZE 4096000 #4 GB of RAM allocated to the
database
...
You can change the above settings on a live configuration with the following
commands. NOTE: As with any live changes, it is best to avoid them unless
absolutely necessary.
export PDQPRIORITY=50
dbaccess rrtdb test.sql
The other is by setting it explicitly in the session to the database before executing
the SQL command:
The default value is 0, which turns off the PDQ feature. This is not the case for
stored procedures that are compiled with an explicitly defined PDQPRIORITY
setting, for details see the following note.
NOTE: If you use this parameter with UPDATE STATISTICS keep in mind the
following:
Update statistics will not execute in parallel, instead, it will only use the
memory resources specified for PDQ. Before considering this option, review
the DBUPSPACE parameter section later in this document.
Updating of the statistics for stored procedures will records the value of the
PDQPRIORITY parameter with the stored procedure. When executed, the
stored procedure will use the value off PDQPRIORITY from the update
statistics session, and not of the session invoking the stored procedure.
Optimizing The Update Statistics Process
In cases where the update statistics processing takes an extraordinary amount of
time, and production operation is affected, there are a few tuning options to help
improve the duration of the update statistics operation.
The configuration changes are recommendations only, and should be monitored
on the actual hardware in order to make sure they provide the expected
improvements.
To make sure that the statistics processing has enough memory to perform the
sorting and analysis of the table data, we call allow it run as a PDQ process, thus
gaining access to the PDQ memory.
So the basic update statistics would look something like this:
echo "set PDQPRIORITY 100; update statistics low for table drop
distributions;" | dbaccess rrtdb
echo "set PDQPRIORITY 0; update statistics for procedure;" | dbaccess rrtdb
The next change would be to run the update statistics in parallel, so that multiple
tables can be updated at the same time. NOTE: This is only helpful if the system
has enough CPUs dedicated to the database engine to handle the parallel update
statistics.
The rrt_update_statistics.sh Script
With input from administrators from existing installations, we have put together a
script that can automate this parallel update statistics operation. You can retrieve
the script at:
rrt_update_statistics.sh
The basic usage of the script is:
$ rrt_update_statistics.sh
USAGE: rrt_update_statistics.sh <database name> <number of sessions>
<notification email>
The script will basically split the tables in the system into <number of sessions>
groups and execute the update statistics for each group simultaneously.
The value of the <number of sessions> parameter will depend on the number of
CPUs you have dedicated to the database. It is best to keep that number at 1/3rd
of the number of CPUs you have allocated to the database. You can experiment
with larger values, as the benefit of parallel I/O operations may still gain
improvements even on systems with fewer CPU resources.
With the PDQ settings properly set, this approach will provide for a better use of
the system resources when the update of the statistics is executing.
NOTE: Monitor the execution of the above script for the first few days and make
sure that the system is not being overloaded.
DBUPSPACE Parameter
The update statistics operation uses the DBUPSPACE parameter to determine
the amount of disk and memory it will use. The parameter is set as an
envrionment variable for the session that will execute the update statistics
statement.
The parameter value is composed of 2 numbers:
If this parameter is not set, the default values (as of IDS 10) are:
References
http://www.ibm.com/developerworks/db2/zones/informix/library/techarticle/mille
r/0203miller.html
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.sqls.doc/sql
s887.htm#sii-02upstat-17587
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.docnotes.d
oc/uc6/ids_sqlr_docnotes_10.0.html#wq15
http://www.iiug.org/waiug/archive/iugnew2000Fall/How_to_PDQ.htm
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.perf.doc/pe
rf333.htm#perf121003254
http://docs.rinet.ru/InforSmes/ch13/ch13.htm#Heading5
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.perf.doc/pe
rf331.htm
BTREE Cleaner Optimization
During any database operation, when the table data is altered (deleted or
updated), all indexes created against the modified fields/columns also have to be
updated. In order to keep the wait time for the delete/update operations to a
minimum, database engines usually leave the old index data cleanup for a
separate cleaner module, that runs in parallel with the regular database operation.
This cleaner module in Informix is called the BTREE cleaner. The BTREE cleaner
runs as an internal database session, with a default priority set to 'low'. The low
priority means that it will not take resources away from other sessions in the
database.
In a normal operation, the BTREE cleaner activates every once in a while,
performs a few brief cleanup operations and falls back into idle mode. However,
in high load environments, an improper configuration can cause the cleaner to
significantly impact the database performance.
The primary cause for the performance impact is the fact that the cleaner session
has to load the index pages into memory in order to analyze and process them.
With configurations where the thresholds for the BTREE cleaner are set to low
values, and the database engine is not given much RAM (under 2GB), this
process can cause heavy I/O load and very slow response times.
With low memory resources, the database engine is forced to push out
cached/read-ahead data from memory, so it can make room for the BTREE
cleaner to load index pages, causing the application sessions to work with table
data directly from disk, and with very little in-memory cache.
There are several approaches in dealing with this condition, some of which are
immediate, others require planning and downtime to implement.
onmode -C disable
onmode -C stop 2 #(if there are 2 threads running, this is visible with
onstat C all)
onmode -C start 1 #(so there is only one thread from now on)
onmode -C low
onmode -C threshold 50000
onmode -C rangesize 10000
onmode C enable
BTSCANNER num=1,priority=low,threshold=50000,rangesize=10000
As this option is only available for IDS version 10 and onward, IDS 9.4x
installations will have to use the method described in the
Informix_Tuning#Adjusting_a_Live_Database section, by putting those
commands in the start up script used to boot up the database engine.
References
http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.adr
ef.doc/adref50.htm
http://publib.boulder.ibm.com/infocenter/idshelp/v10/topic/com.ibm.adref.doc/adre
f320.htm#sii03a889486
http://www.informix-support.co.uk/btscanner.htm
http://publib.boulder.ibm.com/infocenter/idshelp/v111/index.jsp?topic=/com.ibm.p
erf.doc/perf381.htm
http://www.ibm.com/developerworks/data/library/techarticle/dm-
0810duvuru/index.html
onmode -p +1 CPU
onmode -p -1 CPU
$ onstat -g iov
You can use the following command to see the AIO statistics for each database
chunk file:
$ onstat -g iof
IBM Informix Dynamic Server Version 10.00.UC9 -- On-Line (Prim) -- Up
123 days 11:05:58 -- 1631196 Kbytes
onmode -p +1 AIO
To decrease the number of AIO VPs on a running instances use the following
command:
onmode -p -1 AIO
Article
Discussion
View source
History
Log in / create account
Navigation
Main Page
Results Home
Support Contact Email
Frequent
Troubleshooting
FAQ
Installation And Maintenance Guides
RRT Releases
Recent changes
Results Internal
Contact Info
Login Info
Build And Package
Search
Go Search
Toolbox
What links here
Related changes
Upload file
Special pages
Printable version
Permanent link