Rocket iCluster

iCluster Performance - How is TCPIP communications optimized for replication?

  • 1.  iCluster Performance - How is TCPIP communications optimized for replication?

    Posted 01-04-2021 12:37
    Edited by Mark Watts 04-06-2021 13:09


    When considering TCPIP configurations for iCluster efficient replication network configuration, the protocol is so flexible it is difficult to approach the topic with definitive guidelines.  I am not a network engineer but the following information points are some general guidelines that most IBM i administrators can use or share with their network engineers and be successful setting up an HA/DR replication over TCPIP that performs well.

    Here are the primary decisions most replication engineers will make to begin with.  This is not a comprehensive list but should help you get on the right track for success.

    • Use a separate physical interface on the IBM Power System dedicated to replication away from application systems usage.
      • TCPIP is very flexible so this is not a hard requirement and logical replication demands on networks is much lower than typically anticipated when considering what must be needed to replicate the activity in a production application to a secondary node.
      • Having a dedicated physical interface on both production and the HA/DR node for replication will yield the most flexibility for routing and activation in the event there is a switch event to the secondary system.  Obviously iCluster could easily share with the single physical interface (especially considering the GBPS delivered speeds available) and creating a separate TCPIP interface on a shared device could also serve the requirements but this would limit your flexibility during maintenance activities, network connectivity and routing, potentially network usage measurements and redirection.
      • This configuration also prevents production systems users from thinking that replication traffic is reducing application response time due to conflicting network demands. 
    • Consider if it is possible for the secondary system's standby interface can be on the same TCPIP subnet as the production node.
      • The benefit here is so that a switch event can be fully contained within the responsibility of the IBM i system administrator.  When this configuration is established the secondary system can have the same IP Address as the production environment but in a 'varied off' state.  If the production server goes down unexpectedly, a swichover can be accomplished and the secondary system could be established as production role without the need for a router or DNS change (assuming a limited outage that only effects the production server, of course).    
      • If you are in the same situation as most distributed systems for HA/DR, then you are also probably going to include some activity with your router and/or DNS to get users pointed to the new production server at switch time.  There is nothing wrong with this configuration decision, but it typically requires more than one professional to complete the switch activity as responsibilities for system administration and network configuration are typical managed by separate individuals.
      • And of course, consider also how dependent systems will access the HA/DR system in the event it is activated at a remote location.
      • There are network appliances that can extend the local subnet to the remote location.  Some additional research with your network engineer will help determine if this is the right configuration decision with the right cost/benefits for your site.
    • Available bandwidth requirement for daily normal replication of IBM i is typically less than 100 MBPS.  It can even be less than 10 MBPS for normal replication for smaller systems like an IBM Processor Group P10.  The real bandwidth challenge lies in your capability to do error correction and resync activities utilizing your available network.
      • iCluster includes journal analysis reports that can be used to more accurately estimate the required bandwidth for a specified workload and period of time.  The report is found from the iCluster main menu under option 9 (iCluster reports menu) and using option 10 (Journal analysis reports).  To make sure you can direct the analysis report to the desired SBS and activity level, use a SBMJOB command with iCluster request DMANZJRN. 
      • To start replication the data and programs are first placed in the state of synchronization.  This means they are in the same state or equal at a known point in time that is recorded in the IBM logs or journals.  With this information, we can start replicating and applying changes so that the primary and secondary systems remain synchronized. If an object position or state is lost, then it is necessary to resync it, typically achieved by marking a known state and time stamp of the object and sending it to the target system for restore so that future changes can be applied to it and keep the state of the object synchronized.
      • As part of normal replication, the day to day requirement for bandwidth is low.  Journal entries are generated as data is changed and to make that same change occur on the secondary node, it is necessary to send that journal entry to the system and apply the change.  When a query or record retrieval request is performed on production, no change is yet occurring, so no journal entries are required to keep the data in sync and no additional bandwidth usage is typically anticipated for query and reporting.  Pretty simple concept for us to understand.
      • The activity required to resync data using available communication bandwidth can push the communications bandwidth requirements to 'whatever you have available'.  The factor is volume of data over bandwidth over time over cost.  Since that additional bandwidth would be mostly idle during normal replication periods the cost of additional capability to be able to quickly resync using bandwidth may come at too high a cost.  The usage of external media
        (such as tape) as a way to save data and ship the prepared media to a remote location to be restored; is the default method for large volumes of data and can reduce costs of higher than otherwise needed bandwidth.
    • If we are taking advantage of Reorganization of Physical File Members and using the 'Allow cancel = *YES' option with 'Lock state = *SHRUPD' we must be aware of special considerations.
      • Records are moved throughout the file that is being reorganized while the normal updates are occurring from the application activity. 
      • Each of these 'moves' is generating a journal entry change that must be duplicated on the backup node. 
      • The bandwidth requirement to duplicate these changes in real-time can be significant and the activity required to reorg a large file with many deleted records can cause significant latency if there is not sufficient bandwidth available.
      • Even with sufficient bandwidth, if this is a highly active production file, latency should be expected.
      • Be aware that if you use the value to 'Rebuild Access Paths = *YES' that if any Access Path is found to be invalid, it will be rebuilt from scratch.  Definitely an undesirable activity on production files when it is unexpected.  Do your research before using this command and option for the first time.    
      • To get a better idea of the impact of this activity to bandwidth requirements, running a journal analysis that includes this activity can provide insights.
      •  While we are talking about reorgs, if the reorg is invoked with Allow Cancel = *NO, then a reorg is performed on the production file as requested and iCluster passes the request on to the backup node and a reorg is executed there at the same time.  Network volume is not increased with reorg activity in this case.
    • Use of fewer network router hops to the destination can translate into a more reliable and easier to maintain network connection between the Primary and Secondary systems.
      • You can see these with command 'TRACEROUTE'   An invocation similar to this 'TRACEROUTE RMTSYS('') LCLINTNETA('')' can reveal some details about your current network configuration. 
    • Persistent and reliable communications is important.  Get a network provider and connection that is not error prone or subject to frequent drops.
      • You can observe the number of retransmissions reported by your TCP connections.  This number is reset to zero when the connection is established so unfortunately, if you get so many retransmissions that the connection is lost, this number will not reflect the number that occurred at the time of the connection drop.
      • Use command 'NETSTAT *CNN'.  From the resulting display, press F13 to 'sort by' and select 'Bytes In' or 'Bytes Out' to see the most active connections at the bottom of the result screen.  Press F11 to show the byte counts and page to the bottom.  Enter a 5 next to the most active connections from DMCLUSTER.  Then page down twice to see the retransmission counts.  Lower numbers are best.  If they are higher than zero, monitor them occasionally and see if you can determine the max counts per connection.  If they are higher than expected and resulting in disconnections, further research is warranted.  
    • Send and receive buffer size settings in the TCP attribute settings are paramount for bandwidth utilization in replication.  The default values are too low to fully utilize your expensive bandwidth you purchased over TCPIP. 
      • Use command 'CHGTCPA' and prompt the values with 'F4'.
      • For the TCP keep alive value, this value is the number of minutes between 'keep alive' packets when a connection is inactive.  We have seen this value set too high and a replication group that was active yet idle was being automatically disconnected by a firewall due to an inactive connection.  It is recommended this value be set to around 3 minutes instead of the default 120 setting.
      • For send and receive buffer settings, every tcp connection will benefit from a higher than default setting when bulk data is being sent.
      • IBM guides recommend 'not' just setting this value to the max allowable but instead making changes gradually to improve the performance and bandwidth usage.
      • The value used is recommended that we use a value that is a multiple of 512.  We have changed our system's settings to '262144' for both send and receive buffer settings in Rocket's production HA/DR connection.
      • Your network engineer may have already tuned this value and you may not need to do anything here but if it is still at the default value of '65535' or lower, it is recommended you consider increasing this value for higher communication throughput.
      • Both the Primary and Secondary buffer settings for send and receive buffers should match to get the best throughput performance.
      • You can change this value set while production is active and no IPL is required to activate it.  Every previously established connection will continue using the previous buffer settings at the time it was established.  All new established connections will use the new buffer settings.  If you are trying to get better iCluster bandwidth utilization, we should shut down all iCluster communications and end any established iCluster connections before restarting iCluster after a TCP attribute change such as buffer settings.

    What about iCluster TCPIP settings?

    • iCluster over a VPN is common and some network administrators want to limit iCluster port usage, so they have full control of IBM i communication usage over public networks. The recommend path to narrow TCP port usage is to consider the following.
      • iCluster uses some services provided by IBM i that will also require TCP ports open between nodes.  Monitoring usage with all ports open briefly may reveal usage that is required that is not mentioned in the bullets below.
      • There are controls at the Node Definition settings that control what ports iCluster will use to communicate between nodes.  Adjust these by adjusting the NODE definition with the CHANGE NODE or ADD NODE functions. (DMCHGNODE or DMADDNODE commands)
      • You change the port that iCluster listens for connections with the PORT parameter - defaulted to '4545'.  In addition to this value it is also important to equally adjust the value for the TCP service table value for 'dmcluster' to match the requested port number.  Be sure the value selected is not in conflict with another service on IBM i.  If you have a service already using '4545', I have had customers simply change to '45450' as the listening port and had easy and reliable success.
      • There are also controls for the port range that iCluster will use, defaulted to *DYNAMIC which is all and any available port in the user range can be used for established iCluster communications.   There are two controls, (Minimum port number and Maximum port number), to establish the boundaries of the iCluster allocated ports.  For a replication environment with 10 replication groups I typically reserve about 200 ports to be available.  If the administrator wants to further limit iCluster port usage it is recommended that they monitor your actual port usage and make port limits based on usage reports with some history included so that periodic activity requirements are not excluded from access.
      • There are other IBM services that utilize a connection between servers that are outside the control of iCluster such as FTP for command DMSYNCCMD, Remote journaling if the service is selected, and DDM / DRDA for monitoring services.  Making note of these TCP port requirements and adding firewall allowances for these are also required. 
    • iCluster will also automatically compress save files that it generates and replicates to the Secondary node.  In this section we will also include a value for comms auto-recovery services as it is also in the same menu. 
      • The setting for compression can be found accessible from the iCluster main menu and select option 6 iCluster System Values. The value is on the first page 'Object Compression' with a default value of *NO.  Selecting a value of *MEDIUM is a good balance of resources to compress / resulting file size after compression.  It can reduce the amount of data required to synchronize a large file.
      • To best support this value, page down to the Physical file values and change the Physical file refresh method to *SAVRST.
      • On the fourth page from the top, find the last entry on the page for 'Enable comms auto-recovery'.  The default value is *YES, which is the desired value, but if you upgraded from an older release, it is worth double checking to make sure it is turned on.
      • The first value on the fifth page is Performance / comms channel and stage store block management.  These two values will perform best if they are set to *PERGROUP.  Although only the first one is comms related, lets set both of these while we are here.
      • Press Enter to save your changes.
    • A network appliance for compression and conditioning can be used to compress the replication stream as long as the decompressed stream is reconstituted before delivering it to iCluster.  Journal records of a native file update is typically very compressible.  BSF/IFS journal records of PDFs or JPG image updates are typically less compressible due to them already being compressed before written to disk.

    I hope this Rocket Community Forum entry is helpful to you and helped you think about how you might be able to improve your communications configuration to make your replication environment more efficient and perform its best.  Please feel free to send me any corrections or areas where this entry could be expanded to make it more valuable to the community.


    Mark Watts
    Rocket Software