Problem
- Product Name: BES VisiBroker Edition
- Product Version: 5.2.1
- Product Component: Gatekeeper
- Platform/OS Version: All
During a client's on-going CORBA call, the Gatekeeper in the middle is restarted. This causes the client to see CORBA.TRANSIENT exception at the application level. Is this the correct behavior? Shouldn't CORBA.TRANSIENT exception be caught and handled in the ORB layer, instead of letting it come up to the application level?
Resolution
The CORBA definition of TRANSIENT exception is as such:
"TRANSIENT indicates that the ORB attempted to reach an object and failed. It is not an indication that an object does not exist. Instead, it simply means that no further determination of an object's status was possible because it could not be reached. This exception is raised if an attempt to establish a connection fails, for example, because the server or the implementation repository is down."
On the other hand, the exception like CORBA.COMM_FAILURE is defined as:
"COMM_FAILURE exception is raised if communication is lost while an operation is in progress, after the request was sent by the client, but before the reply from the server has been returned to the client."
On the surface, it would seem COMM_FAILURE is a correct exception to throw at the application level. However, a closer look at the internal working of the VisiBroker ORB shows otherwise:
1. The existing connection to the Gatekeeper is severed, throws COMM_FAILURE
2. VisiBroker ORB catches COMM_FAILURE, try to reconnect
3. Reconnect failed, throws COMM_FAILURE
4. VisiBroker ORB catches catch COMM_FAILURE, try to rebind
5. Rebind failed, throws TRANSIENT, which is uncaught and come out at the application level.
There are couple points to consider here:
- Right after the Gatekeeper is restarted in the middle of the client's request, the ORB throws a series of COMM_FAILURE. This is agreeable with the spec.
- But the ORB catches these COMM_FAILURE and retry, due to the fact that the default Quality of Service (QoS) policy is to transparently rebind. (if a different QoS policy, such as no_rebind or no_reconnect, is used, the
COMM_FAILURE exception will be thrown at the application level).
- Since the ORB retries by calling bind again (in the ORB layer), it is no longer considered to be in the context of the previous call, but a new attempt to reach the server object. So if the attempt fails, and there is no other authorized entity (like the osagent) to confirm the non-existent of server object (in this case, the Gatekeeper), the client ORB would need to throw CORBA.TRANSIENT per the CORBA spec. But if the authorized entity is there to confirm the non-existent of the Gatekeeper, the client ORB would throw OBJECT_NOT_EXIST.
- TRANSIENT is thrown so that the client is made known of the problems in reaching the remote object. The problem could be due to firewall, network glitch or something else beside the remote object's absence. Since the ORB can not explicitly determine the cause of the problem, CORBA specifies to throw TRANSIENT.
Support Case: 557853
Old KB# 26462
#TRANSIENT
#Security
#COMM_FAILURE
#gatekeeper
#VisiBroker




