"To get this high density, we integrate an FPGA module every 7.6 mm. As a comparison, this corresponds to a rough third of the PCIe pitch. Obviously, with such a small stride, there is no space for air-cooled heat-sinks and fans. Instead, we mount a copper-based heat spreader on top of the die of the FPGA to transport the thermal energy laterally away from the chip. The heat is transferred by conduction to the sides of the heat spreader where it is thermally contacted with two active liquid-cooled heat sinks, referred to as cooling rails. Water is circulating in the cooling rails to collect and extract the heat produced by the 32 FPGA modules, the Ethernet switch and the service processor of the sled."
cloudFPGA vs. GPU chassis The NVIDIA Tesla P100 is one of the most popular GPUs. In the presentation An FPGA platform for hyperscalers, IBM is comparing their S822LC "Minsky" chassis—which boasts two Power8 CPUs and four NVIDIA Tesla P100 GPUs—with the cloudFPGA.
The reason why it is interesting to compare these two devices is that they both are 2U chassis (i.e. same physical space) and they have similar power consumption (i.e. between 2.3 and 2.5 kW). Here are the results:
Compute density—S822LC (aka Minsky) vs. cloudFPGA chassis
1.23 MB
View full-sizeDownload which shows that the total performance of cloudFPGA is 25% more powerful for single- and half-precision operations.
It is important to notice that this performance is (as it could not be in another way) depending on the accelerator devices that the chassis are mounting, i.e. NVIDIA Tesla P100 vs. Xilinx Kintex Ultrascale 060 FPGA.
This clearly pops-up another of the greatest advantages of the cloudFPGA platform: its modularity. As IBM says in the Future Work section of Promoting FPGAs to became 1st class-citizens in datacenters, more FPGA-Modules can be developed to be used within the same cloudFPGA chassis.
If it would be the case that a new HBM-Module would be developed for the cloudFPGA platform, the computing power of the platform would be boosted to unprecedented limits.
cloudFPGA vs. FPGA chassis Another example of the cloudFPGA’s compute density would be when comparing it against other FPGA chassis that are available on the market. A good candidate for that would be the AMD EPYC/Xilinx Alveo BOXX.
As mentioned in the introduction "high-end FPGAs (i.e. Alveo cards) are highly over-prized: while they offer only about 30% more resources than mid-range FPGAs, they are ten times (or even more) expensive."
Network performance
From the paper "Disaggregated FPGAs: network performance comparison against bare-metal servers, virtual machines and Linux containers" we can see that (cloudFPGA) FPGAs:
Are from far the ones achieving best RTT (us) results using UDP (left), and
Have the highest payload throughput (Gb/s) even reaching the theoretical maximum (right).
Comparison with bare-metal servers, virtual machines, and Linux containers
210 KB
View full-sizeDownload
Workload acceleration
In the paper Network-attached FPGAs for datacenter applications, IBM is implemented and and ported a distributed text-analytics application onto cloudFPGA. They compared the performance of cloudFPGA with a SW-only implementation and an implementation accelerated with PCIe-attached FPGAs. The results show that the network-attachedFPGAs outperform both other implementations by large margins: