Rocket iCluster

 View Only

 How to prevent iCluster from trying to replicate a short-life IFS file?

Jump to Best Answer
Satid Singkorapoom's profile image
Satid Singkorapoom posted 09-17-2023 22:19

Hello

I notice in the past entire week in iCluster event log many repetitive instances of this message (always appears in 8 occurrences for each new IFS file at issue): 

18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.
18/09/23 08:03:21   COREPRO1 HAB1012 30 A BSF object was suspended with reason code RTV.

I can see more than 100+ occurrences of this message from time to time. I guess it's when many temporary IFS files are created at the same time. 

The detail of this repetitive message is as follows: 

            iCluster Event Log Additional Message Information             
                                                                          
 Message ID . . . . . :   HAB1012       Severity . . . . . . :   30       
 Date sent  . . . . . :   18/09/23      Time sent  . . . . . :   08:03:21 
 Machine  . . . . . . :   COREPRO1                                        
                                                                          
 Message  . . . :   A BSF object was suspended with reason code RTV.   
 iCluster suspended a BSF object for group CBS_IFS with backup node COREDRP1. Technical details: . . . .  The reason for suspension is RTV. The process type is OGRP. The job name is  (218554/DMCLUSTER/CBS_IFS). The full path of the BSF object is: /x/y/z/XCEL/LNR3038_TL65163_20230816_80319_00015.xlsTEMP234257. 
                                                                          

RTV code means iCluster cannot retrieve the object description.  I guess the excel file is being exclusively locked by the application job while being populated with data.

Not being able to find the application developer who owns this IFS folder (there are more than a hundred developers for the entire suite of this large application) , I guess that the file name being reported in the message must exist temporarily for a report created by the application and it may exist for just a few seconds before its name is changed to ".xls" without TEMP.....   in the IFS folder that looks like this :

My question is whether I can specify some delay time (15 second or so? ) for IFS object that causes iCluster to not immediately trying to replicate it right away?    I'm also open to any other alternative solution. 

Another question is why does the message ALWAYS appear in exactly 8 occurrences at the same time for each new IFS file at issue?  (I guess it's actually not the same exact millisecond.) 

Thanks in advance for your response. 

Mark Watts's profile image
ROCKETEER Mark Watts Best Answer

Hi Satid,

You are correct that the RTV exception on the PRIMARY node is pointing to a possible application exclusive lock when the object is created.  There is both a global value and a group level override for "Delay for Object Processing" that can be used to delay iCluster replication before processing a new object.  It is typically used to prevent iCluster from locking an object for brief replication that could potentially conflict with an application's attempt to allocate.  The feature gives priority to the application so that it can function without interruption.  Using the feature for the group that is experiencing some conflict may also reduce iCluster suspensions.  When an object is suspended, iCluster's feature for automatic reactivation (default value in 10 minutes) will attempt the replication again and clear the suspension if successful.  So you can safely add a 10-15 second delay to see if that is enough time to avoid the suspension cycle.  The max value for the delay is 1800 seconds.  I am unsure why you get 8 messages in the log.  

'Stop the replication group Controlled, make the group level change, start the group normally and observe any benefit from the change.'

Additionally, if the objects in this case are indeed temporary and can be identified by a file specification or folder location, you could consider adding an exclusion  to avoid an attempt to replicate them at all.  When developers design their apps however, they are rarely considering how their use of temp objects might affect replication and adding an exclusion to iCluster may not be possible.

Lastly, be aware that those temporary suspensions should not effect the quality of the backup environment.  iCluster, as it has announced in the log, will suspended the unavailable object and attempt to resolve the replication exception automatically.  Only the exceptions that failed multiple activation attempts should need any intervention.  That said. your effort to review the log and minimize exception occurrence is the correct method to tune each replication group for efficiency.

Satid Singkorapoom's profile image
Satid Singkorapoom

Dear Mark

Thanks again for being an inexhaustible source of iCluster knowledge for me and helping confirm my curiosity.