Skip to main content

Solution to the naming service removing implicitly clustered objects erroneously.

  • February 16, 2013
  • 0 replies
  • 0 views

Problem:

  • Product Name: Visibroker
  • Product Version: 6.0 and later
  • Product Component: Naming Service
  • Platform/OS Version: NA

During network instability, the client could fail the verification of the server and erroneously signal to the naming service that a CORBA server is stale (that is no more running). After that, if another client requests the Naming Service for reference to the same server (whose object reference is being marked stale), then it would get object doesn't exist as Naming Service doesn't return the object references that are marked stale.

Resolution:

The Naming Service in implicit clustering mode might mark an active object reference as stale erroneously due to some transient network problem that caused the object to be unreachable for some time. The purpose of the article is to provide a way to address the above stated issue.

VBE6.5 and later versions (such as VBE7.0) have a provision where it is possible to set that Naming Service (in implicit cluster mode) also returns the object references that are marked stale (along with the active references) to the client.

To enable this feature, we merely have to set "vbroker.naming.smrr.pruneStaleRef=2". This will set that the stale object reference bindings under the cluster are not eliminated and Naming Service will also return a stale object reference upon a resolve() or select() call if such an object binding exists. However, this feature is not available in VBE6.0.

For VBE6.0, first, set vbroker.naming.smrr.pruneStaleRef=0 at the Naming Service side. This will prevent Naming Service from unbinding the object from the cluster even if object's reference is marked stale by some client. Then, a workaround is to run a separate program at regular intervals (perhaps by a cron daemon to retrieve all the objects) to retrieve both stale and active object references in the naming cluster. And then verify that each of the servers is still running. In the event that a stale reference's server is found to be running, the stale reference will be marked active again. In the event that an active reference's server is found not to be running, the active reference will be marked stale.

Related stuff:

If propBindOn is 1 then the implicit clustering feature is turned on and another property - vbroker.naming.smrr.pruneStaleRef - automatically comes into action whose default value is 1, this default value removes the stale references from the Naming Service.

vbroker.naming.propBindOn 
  - If 1, the implicit clustering feature is turned on.
vbroker.naming.smrr.pruneStaleRef 
  - This property is relevant when the name service cluster uses the Smart Round Robin criterion. When this property is set to 1, a stale object reference that was previously bound to a cluster with the Smart Round Robin criterion will be removed from the bindings when the name service discovers it. If this property is set to 0, stale object reference bindings under the cluster are not eliminated. However, a cluster with Smart Round Robin criterion will always return an active object reference upon a resolve() or select() call if such an object binding exists, regardless of the value of the vbroker.naming.smrr.pruneStaleRef property. By default, the implicit clustering in the name service uses the Smart Round Robin criterion with the property value set to 1.

So during network instability, if the client fails the verification of the server, then a server that ran fine could be removed by the Naming service as a result. Examples of the network instability could be:
 1. Network packets are getting dropped during heavy load.
 2. Network packets are getting lost due to network equipment problems.
 3. Problems with OS TCP layer, etc.

Advantages of using vbroker.naming.smrr.pruneStaleRef=0 is that fewer bad (stale) references are returned to the VBE clients. However when there is only one VBE server and it hasn't died then using vbroker.naming.smrr.pruneStaleRef=2 with v6.5 is the better solution as at least the VBE client gets a reference to try.


#stale
#cluster
#VisiBroker
#Security