您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 综合性期刊 > Nature > 2025 > 8055期

A mixed-precision memristor and SRAM compute-in-memory AI processor

成果类型：

Article

署名作者：

Khwa, Win-San; Wen, Tai-Hao; Hsu, Hung-Hsi; Huang, Wei-Hsing; Chang, Yu-Chen; Chiu, Ting-Chien; Ke, Zhao-En; Chin, Yu-Hsiang; Wen, Hua-Jin; Hsu, Wei-Ting; Lo, Chung-Chuan; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Ho, Mon-Shu; Lele, Ashwin Sanjay; Teng, Shih-Hsin; Chou, Chung-Cheng; Chih, Yu-Der; Chang, Tsung-Yung Jonathan; Chang, Meng-Fan

署名单位：

Taiwan Semiconductor Manufacturing Company; Taiwan Semiconductor Manufacturing Company

刊物名称：

Nature

ISSN/ISSBN：

0028-2856

DOI：

10.1038/s41586-025-08639-2

发表日期：

2025-03-20

关键词：

cmos macro

摘要：

Artificial intelligence (AI) edge devices(1-12) demand high-precision energy-efficient computations, large on-chip model storage, rapid wakeup-to-response time and cost-effective foundry-ready solutions. Floating point (FP) computation provides precision exceeding that of integer (INT) formats at the cost of higher power and storage overhead. Multi-level-cell (MLC) memristor compute-in-memory (CIM)(13-15) provides compact non-volatile storage and energy-efficient computation but is prone to accuracy loss owing to process variation. Digital static random-access memory (SRAM)-CIM16-22 enables lossless computation; however, storage is low as a result of large bit-cell area and model loading is required during inference. Thus, conventional approaches using homogeneous CIM architectures and computation formats impose a trade-off between efficiency, storage, wakeup latency and inference accuracy. Here we present a mixed-precision heterogeneous CIM AI edge processor, which supports the layer-granular/kernel-granular partitioning of network layers among on-chip CIM architectures (that is, memristor-CIM, SRAM-CIM and tiny-digital units) and computation number formats (INT and FP) based on sensitivity to error. This layergranular/kernel-granular flexibility allows simultaneous optimization within the two-dimensional design space at the hardware level. The proposed hardware achieved high energy efficiency (40.91 TFLOPS W(-1)for ResNet-20 with CIFAR-100 and 28.63 TFLOPS W-1 for MobileNet-v2 with ImageNet), low accuracy degradation (<0.45% for ResNet-20 with CIFAR-100 and for MobilNet-v2 with ImageNet) and rapid wakeup-to-response time (373.52 mu s).