Dear all
I would like to share one useful information of my experience in trying to increase iCluster's data replication transport rate by changing one control parameter of IBM i TCP/IP in both IBM i machines.
iCluster 9.1.0 - later upgraded to 9.1.1 running in IBM i 7.4 with recent PTFs in Power S914 with 2-core CPU allocation + 64GB main memory for both PRD and DR machines that link together with Internet WAN with 6 GB data link. IBM i LAN is 10Gbps fiber.
A few months ago, I observed iCluster's data rate from my customer's IBM i PRD to DR machines during several nightly batch process runs from midnight to around 01:00 AM (with IBM i PDI chart named Ethernet Protocol Overview) and found that the peak data rate shown in the chart of this batch run period reached about 18,000 KB/sec max and the entire replication took about 3 hours to finish. I keep data rate chart from my past customers who used either iCluster or MIMIX, and noticed that a few of them achieved the peak data rate of about 40,000 KB/sec and so I try to see if I can make an improvement for my current customer I'm working with.
For several days, I did several Google searches and found many pieces of info that pointed to an important role played by TCP/IP parameter called Minimum RTO (Retransmission Time-out) in determining achievable data rate - a few of the info were academic papers. Most these info imply that if the RTT (Round Trip Time - obtained roughly by using PING between conversation parties) is low, MinRTO can be set low as well and can produce increased peak data rate.
So, I tested PINGing from my IBM i PRD to DR machines many times and saw RTT about 2-3 msec. which was quite impressive. Then I noticed that IBM i TCP/IP attribute named TCPIP Minimum Retransmit Time (which I was quite sure was the same as Min RTO after some checking) was set at 250 (msec. which is IBM i default value) . So, I reduced it in both machines to 150 (msec) and saw a encouraging improved peak data rate. By now, I change it to 100 and see that the peak data rate is now at about 31,000 KB/sec (up from 18,000) and the entire data replication transport for batch run's data changes reduces to about 2 hours. I target at to reduce this further to 50 msec. but it will take some more days for me to do this.
So, that's it. Anyone of you who are interested in achieving the same goal as mine are welcomed to try this method. Please be sure to restart replication jobs after the parameter change because CHGTCPA takes effect only for new socket connection made after the attribute change, not existing socket connection. I did CHGTCPA in both machines. I also believe this experience applies to all large data transfer of any kind but am willing to be wrong if anyone happen to have any experience otherwise.
I would also be interested in hearing from any of you who have this similar data rate tuning experience and also have a question on how low can I go? An article I read indicated MinRTO can be reduce down to about 3 times RTT but I want someone to share their experience on whether this is practical without any unintended consequence or not. Hope to hear from you soon.
------------------------------
Satid Singkorapoom
IBM i SME
Rocket Forum Shared Account
------------------------------

