11g Mediator - Diagnosing Resequencer Issues PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

12/30/2014

11g Mediator Diagnosing Resequencer Issues

11g Mediator Diagnosing Resequencer


Issues
July 25, 2014 by Shreenidhi Raghuram

Leave a Comment

In a previous blog post, we saw a few useful tips to help us quickly monitor the health of resequencer
components in a soa system at runtime. In this blog post, let us explore some tips to diagnose mediator
resequencer issues. During the diagnosis we will also learn some key points to consider for Integration
systems that run Mediator Resequencer composites.
Please refer to the Resequencer White paper for a review of the basic concepts of resequencing and
the interplay of various subsystems involved in the execution of Resequencer Mediator composites.

Context
In this blog post we will refer to the AIA Communications O2C Pre-Built Integration pack (aka O2C PIPs) as
an example for understanding some issues that can arise at runtime with resequencer systems and how
we can diagnose the cause of such issues. The O2C PIP uses resequencing-enabled flows. One such is
the UpdateSalesOrder flow between OSM and Siebel. It is used to process the OSM status of Sales
Orders in proper time sequence within the Siebel system.

Isolate the server within the soa cluster


Many a times the resequencer health check queries point us to an issue occurring only in one server
within the soa cluster. While the Database queries mentioned here give us the containerId of the specific
server, it does not specify the server name. This is because mediator uses a GUID to track a runtime
server.
Trace Log messages generated by the Mediator can help us correlate this GUID to an individual server
running in the cluster at runtime. Theoracle.soa.mediator.dispatch runtime logger can be enabled from
the FMW EM console to TRACE:32 level. Figure below shows the screenshot.

data:text/html;charset=utf-8,%3Ch1%20class%3D%22entry-title%22%20style%3D%22color%3A%20rgb(49%2C%2049%2C%2049)%3B%20font-family%3A%

1/6

12/30/2014

11g Mediator Diagnosing Resequencer Issues

Enabling this logger just for a few minutes will suffice and one can see messages such as below in soa
servers diagnostic logs, once every lease refresh cycle. The default refresh cycle is 60s apart.
[APP: soa-infra] [SRC_METHOD: renewContainerIdLease] Renew container id
[34DB0F60899911E39F24117FE503A156] at database time :2014-01-31 06:11:18.913

It implies, the server which logged the above message is running with a containerId of
34DB0F60899911E39F24117FE503A156 !

Locker Thread Analysis


When one observes excessive messages piling up with a status of GRP_STATUS=READY and
MSG_STATUS=READY, it usually indicates that the locker thread is not locking the groups fast enough
for processing the incoming messages. This could be due to Resequencer Locker thread stuck or
performing poorly. For instance the locker thread could be stuck executing updates against the
MEDIATOR_GROUP_STATUS table.
It is generally useful to isolate the server which is creating the backlog using health check queries and
then isolate the server name by using the logger trace statements as described in previous section. Then
a few thread dumps of this server, could throw more light on the actual issue affecting the locker thread.
Usually thread dumps show a stack such as below for a resequencer Locker thread.
data:text/html;charset=utf-8,%3Ch1%20class%3D%22entry-title%22%20style%3D%22color%3A%20rgb(49%2C%2049%2C%2049)%3B%20font-family%3A%

2/6

12/30/2014

11g Mediator Diagnosing Resequencer Issues

"Workmanager: , Version: 0, Scheduled=false, Started=false, Wai


tim
e: 0thread
ms the
Intthe
above
" id=330 idx=0x1b0 tid=28794 prio=10 alive, sleeping, native_wa
iting,
daemon
Locker
is enqueing
at java/lang/Thread.sleep(J)V(Native Method)
messages from locked
at oracle/tip/mediator/common/listener/<strong>DBLocker.enqueueLockedMessages</strong>
groups
into
the
at oracle/tip/mediator/common/listener/DBLocker.run(DBLocke
r.java
:84
) in memory
queue
for
processing
by
at oracle/integration/platform/blocks/executor/WorkManagerE
xecut
or$
1.run(Wor
kManagerEx
at weblogic/work/j2ee/J2EEWorkManager$WorkWithListener.run(
J2E
EWorkthreads.
Manager.java:184)
the
worker
at weblogic/work/DaemonWorkThread.run(DaemonWorkThread.java:30)
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
During times of any issue,

the Locker thread could be


seen stuck doing database updates. If this is seen across thread dumps with no progress made by the
thread, then it could point to a database issue which needs to be attended.
A poor performance of the locker query on the database side will adversely impact the Resequencer
performance and hence decrease the throughput of the integration flow that uses Resequencers.
Recollect that the Locker thread runs an update query continuously attempting to lock eligible groups.
Below shown is a sample FIFO Resequencer Locker query as seen database AWR reports.

update mediator_group_status a set a.status=7 where id in ( sel


ect
id fromAWR
(select distinc
The
Database
mediator_group_status b, mediator_resequencer_message c where b
.id=c.
own
er_i
d and b.RESEQU
reports
can
also
very
b.status=0 and b.CONTAINER_ID=:1 and c.status=0 and b.component_status!=:2 ORDER BY b.lock

useful to check the

average Elapsed Time and other performance indicators for the locker query.
Huge data volume due to no proper purging strategy for Mediator tables is a common reason for
deteriorated Locker query performance. Regular data purging, partitioning, statistics gathering and
creation of required indexes on MEDIATOR_GROUP_STATUS will usually ensure good performance of
locker query.
Note that there is only one Resequencer Locker thread running per server at runtime. Any database issue
that impacts the locker thread will impair all the Mediator Composites that use the same resequencing
strategy. The mediator resequencer uses database for storage, retrieval of messages to implement the
reordering and sequencing logic. Hence, the proper and timely maintenance of SOA database goes a
long way in ensuring a good performance.

Worker Thread Analysis


Recollect that Worker threads are responsible for processing messages in order. There are multiple
worker threads per server to parallel-process multiple groups, while ensuring that each group is
exclusively processed by only one worker thread to preserve the desired sequence. Hence, the number of
worker threads configured in Mediator properties (from FMW EM console) is a key parameter for optimum
performance.
Below sample snippets from server thread dumps show Resequencer Worker threads. The first stack
shows a worker thread which is waiting for messages to arrive on the internal queue. As and when Locker
thread, locks new eligible groups, such available worker threads will process the messages belonging to
the locked groups.
Idle Worker Thread:
"Workmanager: , Version: 0, Scheduled=false, Started=false, Wai
tt
ime:
0 msstack
The
next
partial
" id=208 idx=0x32c tid=26068 prio=10 alive, parked, native_bloc
k
e
d
,
d
a
e
mon thread
shows a worker
at jrockit/vm/Locks.park0(J)V(Native Method)
which is processing a
at jrockit/vm/Locks.park(Locks.java:2230)
data:text/html;charset=utf-8,%3Ch1%20class%3D%22entry-title%22%20style%3D%22color%3A%20rgb(49%2C%2049%2C%2049)%3B%20font-family%3A%

3/6

12/30/2014

11g Mediator Diagnosing Resequencer Issues

at jrockit/proxy/sun/misc/Unsafe.park(Unsafe.java:616)[inli
ned] from a group that
message
at java/util/concurrent/locks/LockSupport.parkNanos(LockSup
por
t.jav
a:196
)[i
nlined]
has
been
locked
by
the
at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Ab
Locker.
at java/util/concurrent/<strong>LinkedBlockingQueue.poll</strong>(LinkedBlockingQueue.
at oracle/tip/mediator/common/listener/<strong>AbstractWork
er.rWorker
un</str
ong>(AbstractWor
Busy
Thread:
at oracle/integration/platform/blocks/executor/WorkManagerExecutor$1.run(WorkManagerEx
at weblogic/work/j2ee/J2EEWorkManager$WorkWithListener.run(
2EEWor
kM
anage
r.ja
va:184)
ItJ
should
be
noted
that
all
at weblogic/work/DaemonWorkThread.run(DaemonWorkThread.java
:30) processing of the
further
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
message until the next
-- end of trace
.

transaction boundary
happens in the context of
at oracle/tip/mediator/service/BaseActionHandler.requestProcess(BaseActionHandler.java
this worker thread. For
at oracle/tip/mediator/service/OneWayActionHandler.process(OneWayActionHandler.java:47
example,
the
diagram
at oracle/tip/mediator/service/ActionProcessor.onMessage(Ac
tionPro
ces
sor.java:64)[opti
below
shows
the
O2C
at oracle/tip/mediator/dispatch/MessageDispatcher.executeCase(MessageDis
patcher.java:1
at oracle/tip/mediator/dispatch/InitialMessageDispatcher.pr
UpdateSalesOrder
ocessCase(InitialMessageDis
at oracle/tip/mediator/dispatch/InitialMessageDispatcher.pr
ocessCase
s(In
itia
Integration
flow,
from
alMessageDi
at oracle/tip/mediator/dispatch/InitialMessageDispatcher.pr
o
c
e
s
s
N
o
r
m
a
l
C
a
s
e
s
(InitialMes
threads perspective. Here,
at oracle/tip/mediator/dispatch/resequencer/ResequencerMessageDispatcher.processCases(
the BPEL ABCS
at oracle/tip/mediator/dispatch/InitialMessageDispatcher.dispatch(InitialMessageDispat
processing,
the
calls
to
AIA
at oracle/tip/mediator/dispatch/resequencer/ResequencerMess
ageHandle
r.h
andl
eM
essage(Re
SessionPoolManager,
at oracle/tip/mediator/resequencer/<strong>ResequencerDBWor
ker.handleMessag<as
/strong>e(
at oracle/tip/mediator/resequencer/ResequencerDBWorker.proc
ess
(Rthe
eseSynchronous
quencerDBWorker.jav
well
as
at oracle/tip/mediator/common/listener/AbstractWorker.run(A
b
s
t
r
a
c
t
W
or
ker.java:81)
invoke to the
Siebel
at oracle/integration/platform/blocks/executor/WorkManagerExecutor$1.run(WorkManagerEx
Webservice, all happen in
at weblogic/work/j2ee/J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:184)
the
resequencer
worker
at weblogic/work/DaemonWorkThread.run(DaemonWorkThread.java
:30
)
thread.
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace

Now consider an example thread stack as shown below seen in the server thread dump. It shows a worker
thread seen to be engaged in http communication with an external system.
Stuck Worker Thread:
"Workmanager: , Version: 0, Scheduled=false, Started=false, Wai
ti
me: 0remains
ms
Ift
this
thread
at the
data:text/html;charset=utf-8,%3Ch1%20class%3D%22entry-title%22%20style%3D%22color%3A%20rgb(49%2C%2049%2C%2049)%3B%20font-family%3A%

4/6

12/30/2014

11g Mediator Diagnosing Resequencer Issues

" id=299 idx=0x174 tid=72518 prio=10 alive, in native, daemonsame position across
at jrockit/net/SocketNativeIO.readBytesPinned(Ljava/io/FileDe
scrip
tor;[B
III)I(Na
tive Met
thread
dumps
spanning
few
at jrockit/net/SocketNativeIO.socketRead(SocketNativeIO.java:32)[inlined]
minutes, it would indicate
at java/net/SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(SocketInputStr
that
the
worker
thread
at java/net/<strong>SocketInputStream.read</strong>(SocketInp
utS
tre
am.jav
a:129is
)[optimize
blocked
on
external
at HTTPClient/BufferedInputStream.fillBuff(BufferedInputStrea
m.java
:20
6)
at HTTPClient/BufferedInputStream.read(BufferedInputStream.ja
va:126)[oapplication.
ptimized]If
Webservice
at HTTPClient/StreamDemultiplexor.read(StreamDemultiplexor.ja
va:3
56)[opt
imized]
such
external
system
^-- Holding lock: HTTPClient/StreamDemultiplexor@0x1758a7ae0[recursive]
issues block a significant
at HTTPClient/RespInputStream.read(RespInputStream.java:151)[optimized]
number of worker threads
.
from the pool of available
.
at oraclele/tip/mediator/dispatch/resequencer/ResequencerMess
ageDithreads,
spatcheitr.
processCases(
worker
will
at oracle/tip/mediator/dispatch/InitialMessageDispatcher.disp
a
t
c
h
(
I
n
i
t
i
a
l
M
e
impact the overall ssageDispatch
at oracle/tip/mediator/dispatch/resequencer/ResequencerMessageHandler.handleMessage(Rese
throughput of the system.
at oracle/tip/mediator/resequencer/<strong>ResequencerDBWorker.handleMessage</strong>(Re
There
will
be
fewer
workers
at oracle/tip/mediator/resequencer/ResequencerDBWorker.proces
s(Res
equ
en
cerDB
Worker.java:
available
to
process
all
the
at oracle/tip/mediator/common/listener/AbstractWorker.run(Abs
tractWo
rk
er.java
:8
1)
groups
that
are
being
at oracle/integration/platform/blocks/executor/WorkManagerExe
cutor$
1.r
un(
WorkManagerExec
at weblogic/work/j2ee/J2EEWorkManager$WorkWithListener.run(J2
EEWor
kMa
nag
er.java:184)
locked
by
the
Locker
at weblogic/work/DaemonWorkThread.run(DaemonWorkThread.java:3
0
)
thread, across all
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
composites that use
-- end of trace

resequencers. When the


rate of incoming messages during such time is high, this issue will show up as a huge backlog of
messages with status GRP_STATUS=LOCKED and MSG_STATUS=READY in the resequencer health
check query.

Note that JTA timeout will not abort these busy threads. Such threads may eventually return after the JTA
transaction has rolled back, or in some cases depending on how sockets are handled by the external
system, may not return at all.
For such integration flows, it is advisable to configure HTTP Connect and Read timeouts for Webservice
calls at the composites Reference properties. Figure below shows a screenshot of the properties. This will
ensure that worker threads are not held up due to external issues and affect processing of other
components that rely on worker threads.

data:text/html;charset=utf-8,%3Ch1%20class%3D%22entry-title%22%20style%3D%22color%3A%20rgb(49%2C%2049%2C%2049)%3B%20font-family%3A%

5/6

12/30/2014

11g Mediator Diagnosing Resequencer Issues

Few more Loggers


The below loggers can be enabled for trace logging to gather diagnostic information on specific parts of
the Mediator/resequencer.
- Logger oracle.soa.mediator.dispatch for Initial message storage, Group Creation, Lease Renew, Node
failover
- Loggers oracle.soa.mediator.resequencer andoracle.soa.mediator.common.listener for Resequencer
Locker, Resequencer Worker, Load Balancer

Conclusion
We have explored into how problems at various different layers can manifest at the resequencer in an
Integration system and how the cause of these issues can be diagnosed.
We have seen
Useful pointers in diagnosing resequencer issues and where to look for relevant information
How a good SOA database maintenance strategy is important for resequencer health
How timeout considerations play a role in resequencer performance

data:text/html;charset=utf-8,%3Ch1%20class%3D%22entry-title%22%20style%3D%22color%3A%20rgb(49%2C%2049%2C%2049)%3B%20font-family%3A%

6/6

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy