Tune and Troubleshoot Oracle Data Guard (Part 4 of 8)
Tune and Troubleshoot Oracle Data Guard (Part 4 of 8)
Alireza Kamrani
1
04/15/2025
Note:Each instance of the primary database instance generates its own redo and ships redo to the
standby database in a single network stream. Therefore, maximizing single process network
throughput for each node is critical for redo transport.
Historically there are areas that can reduce network and redo transport throughput resulting in
potential transport lags:
. Network firewalls or network encryption
Network firewalls and network (not Oracle Net) encryption can reduce overall throughput
significantly. Verify throughput with the oratcp tool (described below), with and without
encryption, and tune accordingly.
At times reducing the encryption level can increase throughput significantly. A balance is
required to meet security needs with your performance and data loss requirements.
. Redo transport compression
When database initialization parameter has LOG_ARCHIVE_DEST_N attribute
COMPRESSION=ENABLE, Oracle background processes have to compress the redo before
2
sending network message, and uncompress the redo before processing the redo. This
reduces the overall redo and network throughput. Compression is only recommended if
network bandwidth is insufficient between the primary and standby destinations.
. Oracle Net encryption
Depending on the Oracle Net encryption level, this will have varying redo throughput impact,
because Oracle Net messages containing redo have to be encrypted before sending and then
unencrypted before redo processing.
Note that if database encryption is already enabled with Transparent Data Encryption (TDE),
redo is already encrypted, although Oracle Net encryption can also encrypt the message
headers.
. Untuned network for redo transport
○ Increasing maximum operating system socket buffer size can increase single process
throughput by 2-8 times. Test with different socket buffer sizes to see what value
yields positive results, and ensure throughput is greater than the peak redo
throughput.
○ Compare performance with various MTU settings.
If average redo write size is less than 1500 bytes, then try various MTU settings
including MTU=9000 (for example, Jumbo Frames) for network interface that sends or
receives redo on your system. This may reduce some unnecessary network round trips
which will increase overall throughput.
Also note that for SYNC transport, Oracle's average redo write size (for example,
Oracle message send) increases significantly as determined by v$sysstats or AWR
statistics "redo size / redo writes".
When sending redo across geographical regions, experiments have shown that using
MTU=9000 can also benefit in some network topologies. Conduct performance tests
with oratcp and compare the results with default MTU and MTU=9000 settings.
Understanding Throughput Requirements and Average Redo Write Size for Redo
Transport
Required network bandwidth of a given Data Guard configuration is determined by the redo
generate rate of the primary database.
Note:In cases where the primary database is pre-existing, a baseline for the required network
bandwidth can be established. If there is no existing primary database, skip this step and future
references to the data further in the process.
While the Automatic Workload Repository (AWR) tool can be used to determine the redo generation
rate, the snapshots are often 30 or 60 minutes apart which can dilute the peak rate. Since peak
rates often occur for shorter periods of time, it is more accurate to use the following query which
calculates the redo generation rate for each log when run on an existing database. (change the
timestamps as appropriate)
Example output:
Note:To find the peak redo rate, choose times during the highest level of processing, such as peak
Note:To find the peak redo rate, choose times during the highest level of processing, such as peak
OLTP periods, End of Quarter batch processing or End of Year batch processing.
In this short example the highest rate was about 52MB/s. Ideally the network will support the
maximum rate plus 30% or 68MB/s for this application.
Note:This tool, like any Oracle network streaming transport, can simulate efficient network packet
transfers from the source host to target host similar to Data Guard transport. Throughput can
saturate the available network bandwidth between source and target servers. Therefore, Oracle
recommends that short duration tests are performed and that consideration is given for any other
critical applications sharing the same network.
Measure the Existing Throughput of One and Many Processes
Do the following tasks to measure the existing throughput.
Task 1: Install oratcptest
Download the oratcptest.jar file from MOS note 2064368.1
Copy the JAR file onto both client (primary) and server (standby)
On all primary and standby hosts, verify that the JVM can run the JAR file by displaying the help
java -jar oratcptest.jar -server [IP of standby host or VIP in RAC configurations] -port=<any
available port number>
[Requesting a test]
Message payload = 1 Mbyte
Payload content type = RANDOM
Delay between messages = NO
Number of connections = 1
Socket send buffer = (system default)
Transport mode = ASYNC
Disk write = NO
Statistics interval = 20 seconds
Test duration = 2 minutes
Test frequency = NO
Network Timeout = NO
(1 Mbyte = 1024x1024 bytes)
In this example the average throughput between these two nodes was about 13 MB/s which does
not meet the requirements of 68 MB/s from the query.
Note:This process can be scheduled to run at a given frequency using the -freqoption to determine
if the bandwidth varies at different times of the day. For instance setting -freq=1h/24h will repeat
the test every hour for 24 hours.
. Repeat the previous test with two (2) connections (using num_conn parameter).
[Requesting a test]
Message payload = 1 Mbyte
Payload content type = RANDOM
Delay between messages = NO
Number of connections = 2
Socket send buffer = (system default)
Transport mode = ASYNC
Disk write = NO
Statistics interval = 20 seconds
Test duration = 2 minutes
Test frequency = NO
Network Timeout = NO
(1 Mbyte = 1024x1024 bytes)
. Re-run step 1 Iteratively and increase the value of num_conn by two each time until the
aggregate throughput does not increase for three consecutive values. For example if the
aggregate throughput is approximately the same for 10, 12 and 14 connections, stop.
Note:RMAN can utilize all nodes in the cluster for instantiation. To find the total aggregate
throughput, you can use 'Create Standby from Service' that is a feature in RMAN.
. Run the same test with all nodes in all clusters to find the current total aggregate throughput.
Node 1 of primary to node 1 of standby, node 2 to node 2, etc. Sum the throughput found for
all nodes.
. Reverse the roles and repeat the tests.
. Note the number of connections which achieved the best aggregate throughput.
Use the total size of the database and total aggregate throughput to estimate the amount of time it
will take to complete the copy of the database. A full instantiation also needs to apply the redo
generated during the copy. Some additional percentage (0%-50%) should be added to this
estimated time based on how active the database is.
If the estimated time meets the goal, no additional tuning is required for instantiation.
First find the current size of the kernel parameters net.ipv4.tcp_rmem and net.ipv4.tcp_wmem. The
values returned are the minimum, default and maximum size for socket buffers which TCP
dynamically allocates. If a process requires more than the default given when a socket is created,
more buffers will be dynamically allocated up to the maximum value.
# cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 6291456
# cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304
Note:Increasing these values can increase system memory usage of any network socket on the
system.
Note:Changes made with sysctl are not permanent. Update the /etc/sysctl.conf file to persist these
changes through machine restarts. There will be a step to change the configuration file at the end
of this process once the proper setting is determined.
$ java -jar oratcptest.jar -server [IP of standby host or VIP in RAC configurations]
-port=<port number>
Client (primary):
Note:Do not use the oratcptest sockbufparameter because the kernel parameters which govern
explicit requests for socket buffer size are different than those set for this test.
After the test completes the results from the client and server show the value for socket buffers
during that test. At the time of this writing, that value is half of the actual socket buffer size and
should be doubled to find the actual size used.
[Requesting a test]
Message payload = 1 Mbyte
Payload content type = RANDOM
Delay between messages = NO
Number of connections = 1
Socket send buffer = 2 Mbytes
Transport mode = ASYNC
Disk write = NO
Statistics interval = 20 seconds
Test duration = 2 minutes
Test frequency = NO
Network Timeout = NO
(1 Mbyte = 1024x1024 bytes)
(11:39:16) The server is ready.
Throughput
(11:39:36) 71.322 Mbytes/s
(11:39:56) 71.376 Mbytes/s
(11:40:16) 72.104 Mbytes/s
(11:40:36) 79.332 Mbytes/s
(11:40:56) 76.426 Mbytes/s
(11:41:16) 68.713 Mbytes/s
(11:41:16) Test finished.
Sever :
Note:oratcptest is reporting half of the buffers allocated to the socket. Double the number reported
for the actual socket buffer size used during the test.
Client:
[Requesting a test]
Message payload = 1 Mbyte
Payload content type = RANDOM
Delay between messages = NO
Number of connections = 10
Socket send buffer = (system default)
Transport mode = ASYNC
Disk write = NO
Statistics interval = 20 seconds
Test duration = 2 minutes
Test frequency = NO
Network Timeout = NO
(1 Mbyte = 1024x1024 bytes)
Note:oratcptest is reporting half of the buffers allocated to the socket. Double the number reported
for the actual socket buffer size used during the test.
Server (Each connection will have the receive buffer printed. Double the socket buffer size in each
instance)
Use the total size of the database and total aggregate throughput to estimate the amount of time it
will take to complete the copy of the database. A full instantiation also needs to apply the redo
generated during the copy. Some additional percentage (0%-50%) should be added to this
estimated time based on how active the database is.
Repeat the same oratcp performance methodology as described above with the higher MTU size to
see if greater throughput is achieved.
If performance gains are noticed, work with system and network engineers to change MTU size for
DG transport for both primary and standby databases.
Redo Transport
If the single process throughput does not exceed the single instance redo generation rate for a
primary database, the standby will not stay current with the primary during these times. Further
evaluation and network tuning by the network engineering team may be required in these cases.
Instantiation
Once the maximum aggregate throughput of all nodes is understood, a rough estimate for
instantiation can be developed. As an example, if there is a 100 TB database on a 2-node RAC to be
instantiated and each node can achieve 300 MB/s it should take about 50 hours to copy the data
files. Additional work to instantiate will add some percentage to that number (~30%).
300 MB/s * 60 seconds/minute * 60 minutes/hour * 2 nodes = ~2 TB/hr aggregate for both nodes
100TB / 2TB/hr = ~50 hours
Maximum Availability mode guarantees that no data loss will occur in cases where the primary
database experiences the first failure to impact the configuration. Unlike the Maximum Protection
mode, Maximum Availability will wait a maximum ofNET_TIMEOUT seconds for an acknowledgment
from any of the standby databases, after which it will signal commit success to the application and
move to the next transaction. Primary database availability (thus the name of the mode) is not
impacted by an inability to communicate with the standby (for example, due to standby or network
outages). Data Guard will continue to ping the standby and automatically re-establish connection
and resynchronize the standby database when possible, but during the period when primary and
standby have diverged there will be data loss should a second failure impact the primary database.
For this reason, it is a best practice to monitor protection level, which is simplest using Enterprise
Manager Grid Control, and quickly resolve any disruption in communication between the primary
and standby before a second failure can occur. This is the most common zero data loss database
protection mode.
Choose this protection mode if zero data loss is very important but you want the primary database
to continue to be available even with the unlikely case that all standby databases are not reachable.
You can complement this solution by integrating multiple standby databases or using Far Sync
instances to implement a zero data loss standby solution across a WAN. Workload impact analysis is
recommended to measure whether any overhead is acceptable when enabling SYNC transport.
Maximum Performance mode is the default Data Guard mode, and it provides the highest level of
data protection that is possible without affecting the performance or the availability of the primary
database. This is accomplished by allowing a transaction to commit as soon as the redo data
needed to recover that transaction is written to the local online redo log at the primary database
(the same behavior as if there were no standby database). Data Guard transmits redo concurrently
to 1) the standby database directly from the primary log buffer and 2) to the local online redo log
write asynchronously enabling a very low potential data loss if the primary site is lost. There is never
any wait for standby acknowledgment but the potential data loss for this data protection mode can
still be near zero..
Similar to Maximum Availability mode, it is a best practice to monitor the protection level using
Enterprise Manager Grid Control, and quickly resolve any disruption in communication between
primary and standby before a second failure can occur.
Choose this mode if minimum data loss is acceptable and zero performance impact on the primary
is required.