Problem:
Behavior of osagent when server process is unreachable
Resolution:
Product Name: VisiBroker for Java and C
Product Version: All
Platform/OS Version: All
Problem Description
What is the behavior of osagent when server process is unreachable?
In normal scenario using osagent, when server process is started up, it registers the POA or object (which is achieved through BY_POA and BY_INSTANCE registration respectively, using BIND_SUPPORT_POLICY_TYPE) to osagent. When the server is stopped, it normally unregisters the POA or object from the osagent.
However, when the server is terminated unexpectedly (i.e. SIGSEGV, SIGABRT) or the network is disconnected, the CORBA server is not able to unregister itself from osagent.
This article is based on the following assumption:
- osagent (or Smart Agent) is used and full debug logs of osagent is enabled.
- load balancing and fail-over of osagent is not used.
Resolution:
The behavior of osagent when server process is unreachable can be broken down into two abnormal scenario below.
Under normal scenario, prior to server exit, osagent debug logs will show the following that the unregister is successful:
==>> Thu Jul 9 05:43:14 2009, dsaclnt.C, 0, Inf
unregisterSvc() Received from client at
Host : abc.net
User : techsup
PID : 17754
CAddr : aaa.bbb.ccc.ddd
VPort : 40876
CPort : 40877
Unregistering the following service:
*_/bank_agent_poa
at location address port
Abnormal scenario #1: Threshold on heart beat to the server has been reached
When the server is terminated unexpectedly (i.e. SIGSEGV, SIGABRT) or the network is disconnected, osagent debug logs regarding unregister will not be shown, but the heart beat sent to the server. When the threshold* has been reached, osagent debug logs will show the following:
==>> Thu Jul 9 06:51:09 2009, dsverify.C, 0, Dbg
timeoutOccured() Peer is not responding within time period
==>> Thu Jul 9 06:51:09 2009, dsaclnt.C, 0, Inf
isDown() The following client is no longer available
Host : abc.net
User : techsup
PID : 17789
CAddr : aaa.bbb.ccc.ddd
VPort : 41743
CPort : 41744
:
==>> Thu Jul 9 06:51:09 2009, dsaclnt.C, 0, Inf
DSAClient() Deleting the client object for
Host : abc.net
User : techsup
PID : 17789
CAddr : aaa.bbb.ccc.ddd
VPort : 41743
CPort : 41744
* Please refer to vbroker.agent.keepAliveTimer and vbroker.agent.keepAliveThreshold properties for more details on the threshold of heart beat.
Note that from osagent"s perspective, anyone that registers to osagent are treated as client as you can see in the above debug logs. client should not be confused with the server and client process from the ORB"s perspective.
Abnormal scenario #2: Client process tries to connect to the unreachable server process
When client starts up, it looks for the IOR of the CORBA object from osagent, which is not aware yet that the server is no longer reachable. osagent will still provide the generated IOR to the requesting client.
==>> Thu Jul 9 07:15:03 2009, dsaclnt.C, 0, Inf
getProvider() Received from client at
Host : someclient.net
User : somedomain\\someone
PID : 5004
CAddr : aaa.bbb.xxx.yyy
VPort : 2365
CPort : 2366
Requesting for the following service:
*_/bank_agent_poa
==>> Thu Jul 9 07:15:03 2009, dsaclnt.C, 0, Dbg
getProvider() Service type , Argument type .
==>> Thu Jul 9 07:15:03 2009, dsaclnt.C, 0, Inf
getProvider() Replying that service is located at,
Address : aaa.bbb.ccc.ddd
Port : 33803
When client connects to the server using the provided IOR, client will receive CORBA::OBJECT_NOT_EXIST exception and will inform the osagent to mark as suspect that the service provider is unreachable.
==>> Thu Jul 9 07:15:05 2009, dsahdlr.C, 0, Inf
inputReady() Got a from Client .
==>> Thu Jul 9 07:15:05 2009, dsahdlr.C, 0, Inf
markSuspect() Suspecting service provider is down; registered at agent with the following service:
*_/bank_agent_poa
==>> Thu Jul 9 07:15:05 2009, dsaclnt.C, 0, Inf
verifyExistence() Checking the client object for
Host : abc.net
User : techsup
PID : 17799
CAddr : aaa.bbb.ccc.ddd
VPort : 42141
CPort : 42142
==>> Thu Jul 9 07:15:05 2009, dsverify.C, 0, Inf
verify() Sent a to Client at host port .
markSuspect() is invoked when a previously registered server has gone down, but it is not certain yet that the server is down (i.e. intermittent connection, network congestion). osagent will then verify the existence by sending out "AreYouAlive" packet. And when the timeout has occurred, it will be removed from osagent.
==>> Thu Jul 9 07:15:57 2009, dsverify.C, 0, Dbg
timeoutOccured() Peer is not responding within time period
==>> Thu Jul 9 07:15:57 2009, dsaclnt.C, 0, Inf
isDown() The following client is no longer available
Host : abc.net
User : techsup
PID : 17799
CAddr : aaa.bbb.ccc.ddd
VPort : 42141
CPort : 42142
:
==>> Thu Jul 9 07:15:57 2009, dsaclnt.C, 0, Inf
DSAClient() Deleting the client object for
Host : abc.net
User : techsup
PID : 17799
CAddr : aaa.bbb.ccc.ddd
VPort : 42141
CPort : 42142
#keepAliveThreshold
#Security
#osagent
#keepAliveTimer
#unreachable
#VisiBroker