Tensor Core equivalent Likely to Get Embedded in AMD rDNA3

Published by

teaser

AMD software programmers have begun to distribute new fixes for the forthcoming GFX11 architecture, also known as RDNA3. According to a recent patch, AMD is working on their own instructions that can operate on matrices.



AMD's RDNA3 graphics IP is on its way, and we're learning more about the new architecture. AMD developers added a new instruction to the LLVM compiler's back end called Wave Matrix Multiply-Accumulate (WMMA). This instruction will be available on the GFX11 RDNA3 GPU architecture. Tensor Core has been present on NVIDIA GPUs since the Volta design, and AMD WMMA might be viewed as a response to that. NVIDIA has released a sophisticated instruction to boost NVIDIA's AI-based super-resolution technology known as DLSS.


Volta-Tensor-Core_30fps_FINAL_994x559.gif


Intel also has its own XMX/DPAS instructions that work on matrixes and can be utilized to accelerate upcoming XeSS technologies. AMD will be able to handle 16x16x16-dimensional tensors in FP16 and BF16 forms with WMMA. AMD is implementing new methods for handling matrix multiply-accumulate operations with these instructions. This is similar to what NVIDIA does with Tensor Cores.

“rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.

rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.”

Tensor Core equivalent Likely to Get Embedded in AMD rDNA3


Share this content
Twitter Facebook Reddit WhatsApp Email Print