Rocket iCluster

Expand all | Collapse all

iCluster Performance - The need for speed

  • 1.  iCluster Performance - The need for speed

    ROCKETEER
    Posted 01-27-2021 12:02
    Edited by Mark Watts 04-06-2021 13:07


    When discussing Rocket iCluster performance, there are many dependencies, as you are likely aware.  It's not only the efficiency of the application code and logic, but also the environment that is created, configured for the solution to run in.  When there are two machines communicating and a handshake for them to exchange data, any component that is unreliable can affect efficiency.  To continue the discussion of Performance, let's take on the next 'component' for discussion.  In this Forum post we will talk about I/O efficiency and matching capability of the primary node and backup nodes in an iCluster relationship.

    It is fairly common for the discussion about replication to move to "How much processing capability does our HA/DR hardware need to run iCluster and provide a mirror of the production applications?".  Most of the time, if that was the intent alone, the answer is about 25%-30% of production processing capability can produce a backup copy on the secondary system of production activity with little or no latency.  However, there are two paths to the response to arrive at the correct recommendations.
    1. If you only intend to make an IBM i instance to perform backups (to avoid stopping the production application to capture a complete backup) or you intend to share the mirrored data with an interactive or report job user to analyze the data, then the minimal calculation would apply.  If you are building a virtual LPAR with this in mind, you can easily adjust your allocation of resources later to better match the requirements.
    2. If you instead are creating an actual HA/DR target that will be used in the event there is an unexpected outage of production IBM i resources or plan to regularly switch production activity to the HA instance to allow for service or maintenance on the opposite instance, you would typically size the target system for being able to resume the business fully capable with the user community unaware of a hardware platform change rather than only considering replication apply performance as the target benchmark for hardware sizing.
    3. Let's also point out that the Power System hardware does not need to be identical system models in a replication cluster.  Although this post is focused on I/O capability, a secondary system that has a similar CPW capability can serve as DR target for disaster recovery even if the age or generation of the systems are different.  The point is, after you size your DR target, if the occasion to switch to this environment occurs, the performance capability of this system that it delivers to a production work load is all you have.  You may be thinking "in that situation I'll call my IBM rep and turn on enablement for activating additional stand-by processors".  This is true, but we recall that on the IBM i, CPU, Memory and Disk units all contribute or detract from the overall performance of the applications that run on the system.  There is the possibility that more cores is not all you need to equate the performance of the production system's capability.

    But wait, there's more.  Understanding the requirement also requires us to evaluate the IBM i instance I/O capability.  IBM i with Single Level Storage has its performance impacted by the way we configure our disks.  What are the simplest key factors?
    • Disk controllers with large cache perform better than those without.
    • The standard guidelines for available free space on production and backup disk subsystems of course apply.
    • Are there other IBM i tuning techniques that should be considered?  Yes, keep the systems healthy, both Primary and Backup. 
      • Use command WRKDSKSTS and observe the stats during peak times.  This information can be just as important as examining the WRKACTJOB display trying to determine what is impacting performance.
      • Also observe the I/O balance and take action to not only balance storage across disk units but also balance disk activity
    • More disk units perform better than fewer disk units.  This bullet is not more storage, more disk arms.
    • Yes, even if they are virtual.
    • Yes, even if they are on a SAN - my IBM insider says "more/smaller virtual drives perform better that fewer large drives for IBM i running on Power for the same amount of storage".
    • it seems to me the IBM i with Single Level Storage was invented, waiting for the availability and affordability of Solid State Drives.  SSDs are a game changer for IBM i performance.
    • SSD's being now the topic of discussion, let's think for a moment what could be the effect of having SSD storage on the Production Server and spinning drives on the BACKUP node.
      • If the workload on production is producing update rates that utilize the I/O capability of the production role in the cluster that exceeds the backup node's I/O capability, (assuming our bandwidth has the ability to deliver the same update rates to the backup system), it is no mystery why there can be significant latency in the apply processes of a replication group definition regardless of what software tuning strategies that have been applied.  Although they both fly, jets are faster than prop planes AND IBM i systems with SSD storage are faster than the same system with spinning drives.  If you configure the DR system with less expensive disk units, it can be budget friendly however, expectations may need to be adjusted that peak usage will result in latency.  If later in a 24 hour period the systems are quiet enough for the backup node to 'catch up' and start the next day again at zero latency, that may be a workable solution for some deployments.  The ideal solution would have minimal or no latency so that the DR system is 'always' the best possible recovery representation of the data. 

    In conclusion:
    • CPU demand for replication is typically very low during normal replication, I/O capability awareness is a factor not always considered as part of the quotient.
    • When considering the DR hardware footprint, sizing the DR system for the day you need to rely on it in the event of a disaster for fully capable production work loads is the best possible scenario.
    • Both memory capacity and technology along with disk capacity and technology contribute to optimal I/O performance for both production apps and iCluster replication performance.
    • For this specific use case, using replication to build reporting or a query instance of the production data (that is not intended to serve as production work loads), 25 to 30% of production capability is a good starting point for sizing the replication target. Adjust as necessary as individual application work loads will vary on this point.  Yours could be less or (rarely but possible) more.
    • Please consult with your IBM sales rep or IBM Power hardware reseller to appropriately size your systems as always.

    #iClusterPerformance
    #IBMi

    ------------------------------
    Mark Watts
    Rocket Software
    ------------------------------​​​​


  • 2.  RE: iCluster Performance - The need for speed

    Posted 02-02-2021 08:13
    Nice article Mark!

    I was just wondering if you had any customers yet running with the new NVMe flash SSD drives, and if so did they notice any drastic reductions in replication latency or any apply performance increases? You can read about them here: https://www.itjungle.com/2020/11/09/how-much-does-nvm-express-flash-really-boost-ibm-i-performance/

    Thanks.

    ------------------------------
    Mike Warkentin
    MidRange Computer Group
    ------------------------------



  • 3.  RE: iCluster Performance - The need for speed

    ROCKETEER
    Posted 02-02-2021 10:03
    Hi Mike,

    Thanks so much for the feedback and for providing a link to a very interesting article.  That is totally on-topic and helpful stuff.  Frankly, I seldom get an opportunity to examine hardware selection details at my desk (outside of problem determination) but other folks on this Forum post likely do.  One thing I've noticed is that disk unit technology refreshes rarely happen independently. Instead, disk refreshes are usually part of the selections made when the next IBM Power system details are selected as part of a total technology refresh.  It would be very interesting indeed to hear about someone's experience of 'disk unit only' refreshes that included the newest SSD tech.  (Really, we love technology so any SSD or tech stories that improved performance would be fun to read.)  If you (or any Forum members) have any anecdotes to share, I'm certain all of us would be very interested to hear about the experience. Feel free to write your own article posts that contribute to the Forum.

    ------------------------------
    Mark Watts
    Rocket Software
    ------------------------------