Emerging Directions in Optical Computing and Information Processing

June 26 and 27, 2025 – Virtual Conference

Energy Scaling of Optical Transformers

Peter McMahon

Abstract

Transformers are the dominant neural-network architecture for language modeling. I will present results from a simulation and experimental study we conducted to assess the prospects for using optical matrix-vector multipliers to reduce the energy consumption of inference with Transformers. We found that Transformers can operate with no loss of accuracy relative to a digital-electronic implementation with 8-bit arithmetic with a number of photons per multiply-accumulate that decreases with the size of the vector (embedding dimension, in language models). I will explain how we can conclude that large (>100x) improvements in overall system energy efficiency for Transformers may be possible by using scaled and carefully engineered optical hardware. Reference: M. Anderson et al. Optical Transformers. TMLR (2024), preprint: arXiv:2302.10360.

on Th, 10:30in for 30min

Agenda