Getting started with Matrix-multiply assist (MMA) capabilities of Power 10
The matrix-multiply assist (MMA) facility was introduced by the Power Instruction Set Architecture (ISA) v3.1. The related instructions implement numerical linear algebra operations on small matrices and are meant to accelerate computation-intensive kernels, such as matrix multiplication, convolution and discrete Fourier transform. To efficiently accelerate MMA operations, the Power10 processor core implements a dense math engine (DME) microarchitecture which effectively provides an accelerator for cognitive computing, machine learning and artificial intelligence inferencing workloads. The DME encapsulates compute efficient pipelines, a physical register file, and associated data-flow that keeps resulting accumulator data local to the compute units. Each MMA pipeline performs outer-product matrix operations, reading from and writing back a 512-bit accumulator register. Power10 implements the MMA accumulator architecture without adding an additional architected state. Each architected 5