Benchmarks

Javier

Last updated November 16, 2020 9:40am

This section shows three different of benchmarks where the cloudFPGA platform is challenged in terms of:

Compute density (against GPU- and other FPGA-chassis)
Network performance
Workload acceleration (against CPUs and a PCIe-FPGA attached platform)

Compute density

As stated at IBM’s cloudFPGA website:

"The cloudFPGA infrastructure [...] was designed from the ground up to provide the world’s highest-density"

Hyperscale infrastructure 3.29 MB View full-size Download

which is only possible thanks to a combined passive and active water cooling system (see How do you squeeze 1000 FPGAs into a DC rack?):

"To get this high density, we integrate an FPGA module every 7.6 mm. As a comparison, this corresponds to a rough third of the PCIe pitch. Obviously, with such a small stride, there is no space for air-cooled heat-sinks and fans. Instead, we mount a copper-based heat spreader on top of the die of the FPGA to transport the thermal energy laterally away from the chip. The heat is transferred by conduction to the sides of the heat spreader where it is thermally contacted with two active liquid-cooled heat sinks, referred to as cooling rails. Water is circulating in the cooling rails to collect and extract the heat produced by the 32 FPGA modules, the Ethernet switch and the service processor of the sled."

cloudFPGA vs. GPU chassis
The NVIDIA Tesla P100 is one of the most popular GPUs. In the presentation An FPGA platform for hyperscalers, IBM is comparing their S822LC "Minsky" chassis—which boasts two Power8 CPUs and four NVIDIA Tesla P100 GPUs—with the cloudFPGA.

The reason why it is interesting to compare these two devices is that they both are 2U chassis (i.e. same physical space) and they have similar power consumption (i.e. between 2.3 and 2.5 kW). Here are the results:

Compute density—S822LC (aka Minsky) vs. cloudFPGA chassis 1.23 MB View full-size Download

which shows that the total performance of cloudFPGA is 25% more powerful for single- and half-precision operations.

It is important to notice that this performance is (as it could not be in another way) depending on the accelerator devices that the chassis are mounting, i.e. NVIDIA Tesla P100 vs. Xilinx Kintex Ultrascale 060 FPGA.

This clearly pops-up another of the greatest advantages of the cloudFPGA platform: its modularity. As IBM says in the Future Work section of Promoting FPGAs to became 1st class-citizens in datacenters, more FPGA-Modules can be developed to be used within the same cloudFPGA chassis.

If it would be the case that a new HBM-Module would be developed for the cloudFPGA platform, the computing power of the platform would be boosted to unprecedented limits.

cloudFPGA vs. FPGA chassis
Another example of the cloudFPGA’s compute density would be when comparing it against other FPGA chassis that are available on the market. A good candidate for that would be the AMD EPYC/Xilinx Alveo BOXX.

As we can also see on the presentation Promoting FPGAs to became 1st class-citizens in datacenters, the number of FPGAs that the cloudFPGA platform can provide is eight-times larger in half of the space:

AMD EPYC/Xilinx Alveo BOXX: 8 high-end FPGAs in 4U rack unit, vs.
cloudFPGA: 64 mid-end FPGAs in a 2U rack unit.

image.png

872 KB

View full size

Download

Xilinx chassis solution

image.png

755 KB

View full size

Download

cloudFPGA chassis solution

As mentioned in the introduction "high-end FPGAs (i.e. Alveo cards) are highly over-prized: while they offer only about 30% more resources than mid-range FPGAs, they are ten times (or even more) expensive."

Network performance

From the paper "Disaggregated FPGAs: network performance comparison against bare-metal servers, virtual machines and Linux containers" we can see that (cloudFPGA) FPGAs:

Are from far the ones achieving best RTT (us) results using UDP (left), and
Have the highest payload throughput (Gb/s) even reaching the theoretical maximum (right).

Comparison with bare-metal servers, virtual machines, and Linux containers 210 KB View full-size Download

Workload acceleration

In the paper Network-attached FPGAs for datacenter applications, IBM is implemented and and ported a distributed text-analytics application onto cloudFPGA. They compared the performance of cloudFPGA with a SW-only implementation and an implementation accelerated with PCIe-attached FPGAs. The results show that the network-attachedFPGAs outperform both other implementations by large margins:

Compute density

image.png

image.png

Network performance

Workload acceleration