Next-gen high-speed memory for AI and HPC

Posted on Updated on

At the International Electron Devices Meeting (IEDM) 2022, there was a session on next-generation high-speed memory for AI and HPC.

John Wuu, AMD, presented an overview of memory solutions for HPC and AI. Demand for HPC and AI performance continues to accelerate. They will be complemented by advancements in memories. Also, memories need to be lower cost with lower process complexity, and higher density/capacity. They should have faster access time, and be physically closer to processors.

They should have higher bandwidth through on-die integration or 2.5D/3D. They should also have lower power with lower stand-by and R/W power, and smaller footprint. There are opportunities in new memory technologies and packaging innovations in future.

High-speed SRAMs for future HPC and AI was presented by Hidehiro Fujiwara, TSMC. We have multiple design challenges and solutions. Area scaling can be achieved with DTCO or design-technology-co-optimization. VMIN can be used for write assist and dual rail design. There is the RC issue, with Routing optimization for global signal and architecture change (folded BL).

Multiport SRAMs are suitable for image processing and GPU. There is PPA tradeoff between 8T2P cell and 6T with double pumping, and custom bitcell realizes more multiport SRAM. We also have data movement and MAC optimization in CIM. Digital approach leverages technology scaling directory. New field leads to many opportunities for research.

Kyomin Sohn, VP of Technology, DRAM Design Team, Samsung Electronics, presented the next-generation DRAM solution for HPC and AI. Memory of HPC and AI system requires more bandwidth and capacity. We have proposed potential memory hierarchy. New memory solutions will have more capacity and expandability. There’s LLC DRAM and CXL-DRAM.

We are processing in/near DRAM solutions to overcome memory wall. They include HBM-PIM, LPDDR-PIM, AXDIMM, and CXL-PNM. Let’s re-architect memory hierarchy!

Naveen Verma, Princeton University, presented on future prospects of in- and near-memory computing. SoA AI models require solving compute and data-movement bottlenecks. IMC is distinctly suited for this. In-memory computing (IMC) instates fundament energy/throughput vs. SNR trade-offs. These drive macro technologies and approaches.

IMC faces architectural challenges for programmability and efficient execution. Parallelism overheads must be addressed. Early works (especially at architectural level are illuminating), challenges and possible solutions — more work is needed.

Prof. Shimeng Yu, Georgia Institute of Technology, presented on the high-speed emerging memories for AI hardware accelerators. For AI inference engine, global buffer requires MB level high-speed memory. SRAM is a natural choice that enjoys CMOS scaling to leading edge node (3nm and beyond). Emerging memories that satisfy the high-speed, long endurance, low power requirements narrow down the candidates to 2T gain cell (with oxide channel), STT-MRAM (cache version), and BEOL FeFET (tailored as eDRAM).

To outperform SRAM global buffer in density and performance, emerging memories need to employ monolithic 3D integration scheme, where the memory tier on top uses a relaxed design rule, and the logic tier on bottom keeps the leading-edge node design rule. Reducing the write voltage for BEOL FeFET down to 1V or below is desired to make it fully logic compatible. Improving mobility of oxide channel for 2T gain cell is desired to improve its write/read speed under logic compatible voltage.

Global buffer for training with GB-level intermediate data also needs architectural and technological solutions. Dual mode FeRAM is proposed to leverage the data lifetime in the training pipeline stages.

Geert Van der Plas and Eric Beyne, 3D system integration program, imec, presented on 3D system integration for memory compute applications. There is vision and grand challenges for 3D system integration. These include high performance, heat spreading, and heat removal. We have improved power delivery network architecture. Different 3D technology flavors need to be combined to realize the most effective 3D system-level integration.