Problem:
Product Name: VisiBroker
Product Version: 7.0 / 8.0
Platform/OS Version: All
The Location Service API queries OSAgent to retrieve information, such as registered object instances or available Agents. The crashing thread shows one call to sent the request from the Location Service API functions resulting in a large amount of nested function calls repeatConnect / reconnect.
Following is an example of the observed call stack with VisiBroker 8.0:
// here the thread will crash due to stack overflow
[adr] __1cGDSUserJreconnect6M_v_ (…..) 370
[adr] __1cGDSUserNrepeatConnect6Mii_v_ (…..) 388
// here several hundred repetitions of reconnect / repeatConnect will follow
[adr] __1cGDSUserNrepeatConnect6Mii_v_ (…..) 388
[adr] __1cGDSUserJreconnect6M_v_ (…..) 370
[adr] __1cGDSUserNrepeatConnect6Mii_v_ (…..) 388
[adr] __1cGDSUserJreconnect6M_v_ (…..) 370
[adr] __1cGDSUserTsendAndWaitForReply6MpnJDSRequest_rnFNCrtt_l_i_ (…..) 2f0
With VisiBroker 7.0 the recursion on the call stack consists out of four functions, due to different compiler optimizations:
[adr] __1cGDSUserTsendAndWaitForReply6MpnJDSRequest_rnFNCrtt_l_i_ (.....) 2ec
[adr] __1cGDSUserFlogin6M_i_ (.....) 68
[adr] __1cGDSUserNrepeatConnect6Mii_v_ (.....) 98
[adr] __1cGDSUserJreconnect6M_v_ (.....) 360
Also note, that the function offsets on the call stacks depend on the build, and can be different for each service pack.
Resolution:
The application ORB must be able to contact the OSAgent and login successfully. Once logged in, this connection remains established between the ORB and OSAgent. If a request sent by Location Service API is not answered, reconnect() / repeatConnect() will try to establish the connection to OSAgent again. When the ORB does not receive the reply, a reconnect() / repeatConnect() will follow. The recursion of these 2 functions lead the process to block and subsequently crash.
A possible scenario is that a firewall which blocks outgoing traffic on OSAGENT_CLIENT_HANDLER_PORT is activated on the host where OSAgent is running. It is then possible to still connect to the OSAgent via OSAGENT_PORT, but the login request sent on OSAGENT_CLIENT_HANDLER_PORT remains unanswered.
Workaround:
Before calling a function of the Location Service API, the actual system time is stored in a timestamp variable. A bool variable can be used to indicate if the timestamp needs to be monitored. It is set to TRUE before the Location Service API call and to false after it. A second thread is implemented, which periodically compares this variable against the actual system time if the monitor bool is TRUE. If the limit is reached, countermeasures can be taken to prevent the crash, for example shutting down the ORB and exit the application.
A full reconnect cycle includes 5 retries and is timed out after 52 seconds. The recommended value to detect the exceptional situation would be two times 52 seconds plus the cycle time of the monitor thread.
A pseudo-code implementation of the workaround is attached to the case.
Fix:
This has been fixed in VisiBroker 8.5 SP1 through RPI#1076948. An internal timeout was introduced to prevent the recursive loop.
Attachments
#DSUser
#repeatConnect
#Security
#Reconnect
#VisiBroker
#Crash




