Bringing AI to FPGAs: Running Transformer Models on Configurable Hardware

Researcher(s)

Daryl Tapel, Computer Engineering, University of Delaware

Faculty Mentor(s)

Chengmo Yang, Electrical and Computer Engineering, University of Delaware
Jhon Ordoñez, Electrical and Computer Engineering, University of Delaware

Abstract

Modern AI modules require large amounts of computational power when running on GPUs, making them less ideal for low-power applications. To address this problem, this research aims to run these modules on an FPGA board, which is more power-efficient and allows for customizable optimization. Specifically, this project focuses on transformers, which are deep learning models widely used in natural language processing, computer vision, and other domains. Transformers’ core operations include matrix multiplication, the GeLU activation function, and softmax, all of which are resource-intensive but essential for model accuracy. To evaluate these modules on the FPGA, High Level Synthesis (HLS) was used. Modules written in C/C++ can be converted into hardware components with built-in directives to improve speed and hardware resource usage. In order to further optimize the design for hardware, the costly floating-point operations were replaced with fixed-point arithmetic, which significantly lowered the logic and memory overhead while maintaining acceptable numerical precision for AI inference. Through this implementation, transformer-based computations were shown to achieve significantly improved power efficiency and real-time performance on FPGA hardware, offering a strong alternative to traditional GPU-based approaches for AI deployment in embedded or energy-constrained systems.