The matrix-multiply assist (MMA) facility was introduced by the Power Instruction Set Architecture (ISA) v3.1. The related instructions implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. To efficiently accelerate MMA operations, the Power10 processor core implements a dense math engine (DME) microarchitecture which effectively provides an accelerator for cognitive computing, machine learning and artificial intelligence inferencing workloads. The DME encapsulates compute efficient pipelines, a physical register file, and associated data-flow that keeps resulting accumulator data local to the compute units.
Each MMA pipeline performs outer-product matrix operations, reading from and writing back a 512-bit accumulator register. Power10 implements the MMA accumulator architecture without adding an additional architected state. Each architected 512-bit accumulator register is backed by four 128-bit Vector Scalar eXtension (VSX) registers. Code leveraging the MMA instructions is already included in OpenBLAS and Eigen libraries and can be built using the most recent versions of GNU Compiler Collection (GCC) compiler.
The latest version of OpenBLAS can be downloaded at: https://github.com/xianyi/OpenBLAS
OpenBLAS is used by Python-NumPy library, PyTorch and other frameworks which makes it easy to leverage the performance benefit of the Power10 MMA accelerator for AI workloads.
More detailed information on the implementation of the Power10 processor's high throughput math engine is given by the paper A matrix math facility for Power ISA processors found at: https://arxiv.org/pdf/2104.03142
The fundamental MMA architecture principles with detailed instruction set usage, register file management concepts, and various supporting facilities are explained in the IBM Redbooks publication Matrix-Multiply Assist (MMA) Best Practices Guide Redpaper found at: https://www.redbooks.ibm.com/abstracts/redp5612.html
We can verify MMA capability of the conda package using the below command. This command will list the dependencies and it should have libopneblas version as 0.3.17.conda search -c rocketce pytorch-base=1.8.1 --info- libopenblas >=0.3.17,<1.0a0
Rocket Internal - All Brands