RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models

Lee, Sang Min; Kim, Hanjoon; Yeon, Jeseung; Kim, Minho; Park, Changjae; Bae, Byeongwook; Cha, Yojung; Choe, Wooyoung; Choi, Jonguk; Choi, Younggeun; Han, Ki Jin; Hwang, Seokha; Jang, Kiseok; Jeon, Jaewoo; Jeong, Hyunmin; Jung, Yeonsu; Kim, Hyewon; Kim, Sewon; Kim, Suhyung; Kim, Won; Kim, Yongseung; Kim, Youngsik; Kwon, Hyukdong; Lee, Jeong Ki; Lee, Juyun; Lee, Kyungjae; Lee, Seokho; Noh, Minwoo; Park, Junyoung; Seo, Jimin; Paik, June

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Sang Min	-
dc.contributor.author	Kim, Hanjoon	-
dc.contributor.author	Yeon, Jeseung	-
dc.contributor.author	Kim, Minho	-
dc.contributor.author	Park, Changjae	-
dc.contributor.author	Bae, Byeongwook	-
dc.contributor.author	Cha, Yojung	-
dc.contributor.author	Choe, Wooyoung	-
dc.contributor.author	Choi, Jonguk	-
dc.contributor.author	Choi, Younggeun	-
dc.contributor.author	Han, Ki Jin	-
dc.contributor.author	Hwang, Seokha	-
dc.contributor.author	Jang, Kiseok	-
dc.contributor.author	Jeon, Jaewoo	-
dc.contributor.author	Jeong, Hyunmin	-
dc.contributor.author	Jung, Yeonsu	-
dc.contributor.author	Kim, Hyewon	-
dc.contributor.author	Kim, Sewon	-
dc.contributor.author	Kim, Suhyung	-
dc.contributor.author	Kim, Won	-
dc.contributor.author	Kim, Yongseung	-
dc.contributor.author	Kim, Youngsik	-
dc.contributor.author	Kwon, Hyukdong	-
dc.contributor.author	Lee, Jeong Ki	-
dc.contributor.author	Lee, Juyun	-
dc.contributor.author	Lee, Kyungjae	-
dc.contributor.author	Lee, Seokho	-
dc.contributor.author	Noh, Minwoo	-
dc.contributor.author	Park, Junyoung	-
dc.contributor.author	Seo, Jimin	-
dc.contributor.author	Paik, June	-
dc.date.accessioned	2025-04-08T04:30:12Z	-
dc.date.available	2025-04-08T04:30:12Z	-
dc.date.issued	2025-02	-
dc.identifier.issn	0193-6530	-
dc.identifier.issn	2376-8606	-
dc.identifier.uri	https://scholarworks.dongguk.edu/handle/sw.dongguk/58072	-
dc.description.abstract	There is a need for an AI accelerator optimized for large language models (LLMs) that combines high memory bandwidth and dense compute power while minimizing power consumption. Traditional architectures [1]-[4] typically map tensor contractions, which is the core computational task in machine learning models, onto matrix multiplication units. However, this approach often falls short in fully leveraging the parallelism and data locality inherent in tensor contractions. In this work, tensor contraction is used as a primitive instead of matrix multiplication, enabling massive parallelism and time-axis pipelining similar to vector processors. Large coarse-grained PEs can be split into smaller compute units called slices, as illustrated in Fig. 16.2.1. Depending on the setup of the fetch network connecting the slices, these slices can function either as one large processing element or as small and independent compute units. Input data are continuously fetched in a pipelined manner through the fetch network, allowing high throughput and efficient data reuse. Since the operation units compute deterministically as configured, accurate cost models for performance and energy can be developed for optimization. The chip specifications are also shown in Fig. 16.2.1. © 2025 IEEE.	-
dc.format.extent	3	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE	-
dc.title	RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ISSCC49661.2025.10904727	-
dc.identifier.scopusid	2-s2.0-105000830515	-
dc.identifier.bibliographicCitation	2025 IEEE International Solid-State Circuits Conference (ISSCC), pp 284 - 286	-
dc.citation.title	2025 IEEE International Solid-State Circuits Conference (ISSCC)	-
dc.citation.startPage	284	-
dc.citation.endPage	286	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Matrix Algebra	-
dc.subject.keywordAuthor	Parallel Architectures	-
dc.subject.keywordAuthor	Pipeline Processing Systems	-
dc.subject.keywordAuthor	Problem Oriented Languages	-
dc.subject.keywordAuthor	Computational Task	-
dc.subject.keywordAuthor	Data Locality	-
dc.subject.keywordAuthor	High Memory Bandwidth	-
dc.subject.keywordAuthor	Language Model	-
dc.subject.keywordAuthor	Machine Learning Models	-
dc.subject.keywordAuthor	Matrix Multiplication	-
dc.subject.keywordAuthor	Power	-
dc.subject.keywordAuthor	Power Efficient	-
dc.subject.keywordAuthor	Tensor Contraction	-
dc.subject.keywordAuthor	Traditional Architecture	-
dc.subject.keywordAuthor	Tensors	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > Department of Electronics and Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Han, Ki Jin photo

Han, Ki Jin: College of Engineering (Department of Electronics and Electrical Engineering)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

30, Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea+82-2-2260-3114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE