Rocket iCluster

 View Only

 Performance Tuning Rocket iCluster V9.1 for Applying Journal Receiver on Backup Server

krittamate srikhammul srikhammul's profile image
krittamate srikhammul srikhammul posted 08-31-2023 06:07

Hi All,

I am interesting and Finding the way about Performance Tuning Rocket iCluster V9.1 for Applying Journal Receiver on Backup Server. After PRD Batch Process running completed and then sent Journal Receiver to apply on DR site, but it takes a lot of time to apply Journal Receiver on DR site. How can I tuning and adjust somethings on AS400 server to Improve Performance for applying process faster.

Thanks in advance for your response. 

---------------------------------

Krittamate Srikhammul

Satid Singkorapoom's profile image
Satid Singkorapoom

Dear Krittamate 

Which IBM i release  are you using?  If it is 7.1 or later ones, you should use IBM i Performance Data Investigator (PDI) tool to look at Wait Overview, Wait by Generic Job, and Wait by Subsystem to look first if your iCluster jobs encounter any serious wait component or not.  If so, you need to take action to reduce the wait components.  

Also check you disk response time during the period iCLuster runs the apply task to see if the general disk response time is higher then 5 millisec. or not using Disk Throughout for Disk Pool (and Disk Overview for Disk Pool if you have more than one ASP in your system).   Are you using HDD or SSD.Flash disk?  If the former, this chart is important and most of performance problem comes from bad HDD response time at high disk IO workload.   My system uses FlashSystem SAN and it is fast.

Please also read my articles on using PDI tool here to get an idea on how to analyze some important PDI charts I mentioned above :  https://www.itjungle.com/author/satid-singkorapoom/

If you can post PDI charts here, I may be able to provide analysis. 

Satid Singkorapoom's profile image
Satid Singkorapoom

Let me give you a sample Wait time chart - Wait by Generic Job 

You can see the topmost bar SMEU3 and the bar named DMS000* which are iCluster jobs (check them with WRKACTJOB SBS(XDMCLUSTER) ).  These 2 bars each has just small amount of wait time compared to Dispatched CPU time of each bar and this means good performance.  

The next chart - Wait by Subsystem - also shows that iCluster's subsystem XDMCLUSTER has just small cumulative amount of wait time compared to the Dispatched CPU Time which indicated good overall performance.   The subsystem CNVP1BSBS (the topmost bar) has a lot of cumulative wait time and this is NOT a good performance indication.

Next chart shows timeline of disk response time - Disk Throughput Overview for Disk Pool (I have only 1 disk pool in my system). 

From the chart, you see most of the time SSD response time is at 0.2 msec. and a few sporadic worst resposne time is 0.8 msec. which can be considered as consistently good.  If you see disk response time of this chart during your period of interest going at 5 msec. or higher, then it is consider a bad disk performance and you should also see some disk-related  wait inn the Wait charts.

Just for a sample to help you see what you look for first clue of your performance problem. 

Satid Singkorapoom's profile image
Satid Singkorapoom

An additional factor to improve the replicattion performance is that youy need to work in some deatils on what objects to be EXCLUDED from the replication.  This is to avoid replicating unnecessary objects, such as work files, other object types that hold run-time data (as opposed to permanent business data)  such as data queue, DB indexes (keyed logical files), etc.   

I know this part takes effort and time to implement. So, you can do this in parallel to checking on performance data looking for anomaly which can be readily identified and dealt with. 

krittamate srikhammul srikhammul's profile image
krittamate srikhammul srikhammul

Dear P' Satid krab, @Satid Singkorapoom

My AS400 OS Version is V7R4M0 and I have Sysbase and iASP Storage. Applications Data is stored in iASP Storage. For Storage we use Storage FS9200.

KKCBSD is Application Data Group of icluster for replication.

Data Collection is 25/08/2023

Thanks in advance for your response. 

---------------------------------

Krittamate Srikhammul

krittamate srikhammul srikhammul's profile image
krittamate srikhammul srikhammul

Dear P' Satid krab, @Satid Singkorapoom

For DR Site Performance and Apply Journal Slowly on this system. Resource for CPU , Memory , Disk are the same with DC Site.

Thanks in advance for your response. 

---------------------------------

Krittamate Srikhammul

Satid Singkorapoom's profile image
Satid Singkorapoom

KhunKrittamate

Let's focus on DR machine as that is where you asked about the change apply task's performance

Please increase text size in the chart by pressing Ctrl and = (equal sign) before you post the chart as I have a bad eye sight. 

 

Since you have multiple ASP, Disk Throughput Overview for Disk Pool chart shows only ASP 1.  Please post a chart named Disk Overview for Disk Pool from DR machine for me to see.

In the wait charts, it is clear that there is a dominant wait component which is  Disk Page Fault Time in your DR's iCluster jobs.  I also need to see DR's Wait Overview chart as well to make sure this Disk Page Fault Time appears a lot during the apply duration (I believe it's after midnight, right?)     

Disk Page Fault wait appears when DR's disk response time is bad (this is NOT the case for you) or memory faulting is too high due to too small memeory is allocated to your memory pool in which XDMCLUSTER subsystem runs.   Please let me know which pool number your DR's XDMCLUSTER run in and post DR's chart  Memory Pool Sizes and Fault Rates (All Pools) .     

Last thing I want to check is how much data transfer rate is for your LAN interfaces. Please post both DC and DR's chart on Communication --> Ethernet Protocol Overview. 

Make sure text size in all charts is large enough for me to see clearly. 

krittamate srikhammul srikhammul's profile image
krittamate srikhammul srikhammul

Dear P' Satid krab, @Satid Singkorapoom

Information for DR site as below;

For Disk throughput Overview only iASP(asp33) and All of charts as is DATE 25/08/2023 after midnight (EOD process start at 00:00).

For Wait Overview at DR Site

For XDMCLUSTER subsystem use Pools 2

For Memory Pool Sizes and Fault Rates (All Pools)

DR site : date 25/08/2023 after midnight

DC site : date 25/08/2023 after midnight

Thanks in advance for your response. 

---------------------------------

Krittamate Srikhammul

Satid Singkorapoom's profile image
Satid Singkorapoom

KhunKrittamate

Basically, everything looks fine - high data rate for LAN port, no CPU queuing, decent disk response  time.  The only thing to address is Disk Page Fault Wait time as I also notice substantial (not very high but not too low either).   

Since you use the core bank solution that I know for a long time.  Are you aware that during the nightly batch run, iCluster apply job is stopped and is started only after a successful finish of the batch run?  If so, it is natural to take time for the apply task to finish (as opposed to real-time apply).   

By the way, in the chart you sent below, I notice the rightmost vertical axis show the number 1, 2, 3, 4...    Please make sure it is not 100, 200, 300, ....  or 1,000, 2,000, 3,000, ....

Satid Singkorapoom's profile image
Satid Singkorapoom

One last point I would like to make.  If you select option 8 from iCluster Main Menu to go to  Real Time Overall Latency screen and then press F11  to see Real Time Object Latency screen (as shown below).  You should record the values of 3 columns : Curr Trans Sent,  Curr Trans Applied, and Trans to Process at the line of your replication group name (SMEU3 in this example).     Note these values at the start and end of each nightly batch run process and you should get the idea if you stop iCluster apply task during the batch run period or not and if so, also see how many transactions to do the apply to catch up at DR system.  

Mark Watts's profile image
ROCKETEER Mark Watts

Hi Krittamate,

There is a performance tuning post in the Forum you may find helpful.  The post is titled: "iCluster performance discussion. Topics to consider and tune."  Below is an except that is particularly important however also monitoring the latest updates from the Rocket Lab can also provide gains in performance so do move to the most currect version of iCluster and sign-up for notifications when announcements are made. 

  • Add a shared memory pool and set the XDMCLUSTER SBS to use it.  The backup node will benefit the most from this tip however I like to set this up in advance so that a switchover can perform well and is less complex regardless of which node is in the BACKUP role.  Even if there is nothing running on the backup node besides iCluster and there should be nothing competing with iCluster for memory in SYSBAS, iCluster apply performance is enhanced by making memory directly available to the XDMCLUSTER SBS.  How much should I allocate to it?  Allocate as much as is available to the shared pool.  We are running v7.4 IBM i and QPFRADJ is activated on the backup node.  The OS seems to be responsive to memory demands when needed making it easy for administration.
  • Allow the IBM i system to determine the paging characteristics for the pool in which iCluster is running using CHGSHRPOOL (<pool  id>) PAGING(*CALC)".  This is aligned with the previous tip and is the recommended setting for paging for the shared pool.

Use this link to manage your announcement delivery preferences:  https://www.rocketsoftware.com/manage-your-email-preferences