Preface

Javier

Last updated November 14, 2020 8:15am

Back to table of contents

Field programmable gate arrays (FPGAs) are making their way into data centers (DC). They serve to offload and accelerate service-oriented tasks such as web-page ranking, memory caching, deep learning, network encryption, video conversion and high-frequency trading.

However, FPGAs are not yet available at scale to general cloud users who want to accelerate their own workload processing. This puts the cloud deployment of compute-intensive workloads at a disadvantage compared with on-site infrastructure installations, where the performance and energy efficiency of FPGAs are increasingly being exploited.

cloudFPGA Field prgrammable gate arrays for the cloud 1.8 MB View full-size Download

IBM’s cloudFPGA platform solves this issue by offering FPGAs as an IaaS resource to cloud users. Using the cloudFPGA system, users can rent FPGAs—similarly to renting VMs in the cloud — thus paving the way for large-scale utilization of FPGAs in DCs.

The cloudFPGA system is built on three main pillars:

The use of standalone network-attached FPGAs,
A hyperscale infrastructure for deploying the above FPGAs at a large scale and in a cost-effective way, and
An accelerator service that integrates and manages the standalone network-attached FPGAs in the cloud.

Stand-alone network-attached FPGA

The concept of stand-alone network-attached FPGA builds on two main initiatives:

Changing the traditional way of attaching an FPGA to a CPU by moving from PCIe attachment to network attachment.
Promoting the FPGAs to the rank of remote peer processors by disaggregating them from the servers and provisioning them as independent and self-managed resources in the cloud.

The network attachment sets the FPGA free from the traditional CPU–FPGA attachment by connecting the FPGA directly to the DC network. As a result, the number of distributed FPGAs becomes independent of the number of servers.

Hyperscale infrastructure

To enable cloud users to rent, use, and release large numbers of FPGAs on the cloud, the FPGA resource must become plentiful in DCs.

The cloudFPGA infrastructure is the key enabler of such a large-scale deployment of FPGAs in DCs. It was designed from the ground up to provide the world’s highest-density and most energy-efficient rack unit of FPGAs.

The infrastructure combines a passive and an active water-cooling approach to pack 64 FPGAs into one 19"×2U chassis. Such a chassis is made up of two Sleds, each with 32 FPGAs and one 64-port 10GbE Ethernet switch providing 640 Gb/s bi-sectional bandwidth.

In all, 16 such chassis fit into a 42U rack for a total of 1024 FPGAs and 16 TB of DRAM.

Accelerator service

Today, the prevailing way to incorporate an FPGA into a server is to connect it to the CPU over a high-speed, point-to-point interconnect such as the PCIe bus, and to treat that FPGA resource as a co-processor worker under the control of the server CPU. However, because of this master-slave programming paradigm, such an FPGA is typically integrated into the cloud only as an option of the primary host compute resource to which it belongs. As a result, PCIe-attached FPGAs are usually made available in the cloud directly via virtual machines or containers.

In the cloudFPGA deployment, in contrast, a stand-alone network-attached FPGA can be requested independently of a host via the cloudFPGA Management Framework, which provides a RESTful API of integration in the datacenter management stack.