RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models
  • Lee, Sang Min
  • Kim, Hanjoon
  • Yeon, Jeseung
  • Kim, Minho
  • Park, Changjae
  • ... Han, Ki Jin
  • 외 25명
Citations

SCOPUS

1

초록

There is a need for an AI accelerator optimized for large language models (LLMs) that combines high memory bandwidth and dense compute power while minimizing power consumption. Traditional architectures [1]-[4] typically map tensor contractions, which is the core computational task in machine learning models, onto matrix multiplication units. However, this approach often falls short in fully leveraging the parallelism and data locality inherent in tensor contractions. In this work, tensor contraction is used as a primitive instead of matrix multiplication, enabling massive parallelism and time-axis pipelining similar to vector processors. Large coarse-grained PEs can be split into smaller compute units called slices, as illustrated in Fig. 16.2.1. Depending on the setup of the fetch network connecting the slices, these slices can function either as one large processing element or as small and independent compute units. Input data are continuously fetched in a pipelined manner through the fetch network, allowing high throughput and efficient data reuse. Since the operation units compute deterministically as configured, accurate cost models for performance and energy can be developed for optimization. The chip specifications are also shown in Fig. 16.2.1. © 2025 IEEE.

키워드

Matrix AlgebraParallel ArchitecturesPipeline Processing SystemsProblem Oriented LanguagesComputational TaskData LocalityHigh Memory BandwidthLanguage ModelMachine Learning ModelsMatrix MultiplicationPowerPower EfficientTensor ContractionTraditional ArchitectureTensors
제목
RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models
저자
Lee, Sang MinKim, HanjoonYeon, JeseungKim, MinhoPark, ChangjaeBae, ByeongwookCha, YojungChoe, WooyoungChoi, JongukChoi, YounggeunHan, Ki JinHwang, SeokhaJang, KiseokJeon, JaewooJeong, HyunminJung, YeonsuKim, HyewonKim, SewonKim, SuhyungKim, WonKim, YongseungKim, YoungsikKwon, HyukdongLee, Jeong KiLee, JuyunLee, KyungjaeLee, SeokhoNoh, MinwooPark, JunyoungSeo, JiminPaik, June
DOI
10.1109/ISSCC49661.2025.10904727
발행일
2025-02
유형
Conference paper
저널명
2025 IEEE International Solid-State Circuits Conference (ISSCC)
페이지
284 ~ 286