RocketCE for Power

 View Only

Getting started with Matrix-multiply assist (MMA) capabilities of Power 10

  • 1.  Getting started with Matrix-multiply assist (MMA) capabilities of Power 10

    ROCKETEER
    Posted 11-12-2021 06:34
    Edited by David Andrews 11-12-2021 06:33
    The matrix-multiply assist (MMA) facility was introduced by the Power Instruction Set Architecture (ISA) v3.1. The related instructions implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. To efficiently accelerate MMA operations, the Power10 processor core implements a dense math engine (DME) microarchitecture which effectively provides an accelerator for cognitive computing, machine learning and artificial intelligence inferencing workloads. The DME encapsulates compute efficient pipelines, a physical register file, and associated data-flow that keeps resulting accumulator data local to the compute units.

    Each MMA pipeline performs outer-product matrix operations, reading from and writing back a 512-bit accumulator register. Power10 implements the MMA accumulator architecture without adding an additional architected state. Each architected 512-bit accumulator register is backed by four 128-bit Vector Scalar eXtension (VSX) registers. Code leveraging the MMA instructions is already included in OpenBLAS and Eigen libraries and can be built using the most recent versions of GNU Compiler Collection (GCC) compiler.

    The latest version of OpenBLAS can be downloaded at: https://github.com/xianyi/OpenBLAS OpenBLAS is used by Python-NumPy library, PyTorch and other frameworks which makes it easy to leverage the performance benefit of the Power10 MMA accelerator for AI workloads.

    More detailed information on the implementation of the Power10 processor's high throughput math engine is given by the paper A matrix math facility for Power ISA processors found at: https://arxiv.org/pdf/2104.03142 The fundamental MMA architecture principles with detailed instruction set usage, register file management concepts, and various supporting facilities are explained in the IBM Redbooks publication Matrix-Multiply Assist (MMA) Best Practices Guide Redpaper found at: https://www.redbooks.ibm.com/abstracts/redp5612.html 


    We can verify MMA capability of the conda package using the below command. This command will list the dependencies and it should have libopneblas version as 0.3.17.
    conda search -c rocketce pytorch-base=1.8.1 --info
    - libopenblas >=0.3.17,<1.0a0

    ------------------------------
    Rajesh Nukala
    Rocket Internal - All Brands
    ------------------------------