FPGA

From Grid5000
Jump to navigation Jump to search

As of August 2022, Grid'5000 features 2 nodes, each equipped with one AMD/Xilinx FPGA. This document gives specific information related to how those FPGA are usable in Grid'5000.

AMD Xilinx Alveo U200

Hardware description

The Grenoble site of Grid'5000 hosts 2 servers (Servan cluster) equipped with an AMD/Xilinx FPGA plugged on PCIe. FPGA are AMD/Xilinx Alveo U200, referenced on the Xilinx catalog as Datacenter Accelerator cards.

Detailed specifications are provided here (cards with passive thermal cooling).

Technically, the installation of those FPGAs in the Servan nodes has the following characteristics:

  • JTAG
    • JTAG programming is provided on Xilinx Alveo U200 via a USB port on the card. In the Grid'5000 installation, it is cabled back to the hosting machine itself. Thus, programming the FPGA hosted e.g. in servan-1 can be done (e.g. with Vivado) from e.g. servan-1 itself.
  • Ethernet ports
    • Both Ethernet ports of each FPGA are cabled to the site network along with all servers of the site.
      Grenoble site network
    • The FPGA ethernet ports are not shown as NICs in the operating system of the hosting machine (unless FPGA is programmed as to do so).
    • Ports are cabled to 100Gbps ports on Grenoble site router/switch. Switch ports are configured with Auto-Negotiation disabled and Speed forced to 100Gbps (not working otherwise, as far as we tested).
    • Kavlan is supported on the FPGA Ethernet ports just like any NIC of a server of the site (including the servan servers' own NICs). FPGA ports are named servan-1-fpga0, servan-1-fpga1, servan-2-fpga0, servan-2-fpga1 in kavlan. IP addresses are provided via DHCP to the FPGA ports in kavlan where the DHCP service is available.
    • Note: using the 100Gbps capability of the FPGA ports requires acquiring a free-of-charge Xilinx licence.
  • Wattmeter
    • Each servan node energy consumption is measured by a wattmeter. Measures are available in Kwollect. (Work-in-progress)
    • Energy consumption can also be retrieved using Xilinx tools (e.g. xbutil, xbtop) from the host operating system when the FPGA is running XRT (see below).
  • Licenses
    • FPGA software stack, IP, etc are subject to licenses (EULA to be signed, etc). See Xilinx FAQ. Grid'5000 does not provide licenses. It is let to the end-user to obtain the required licenses (some are free of charge).

Using the FPGA

Programming

FPGA can be used in serval ways:

  • either using higher-level abstractions, e.g. using Xilinx's Vitis.
  • or using lower-level abstractions, e.g. using Xilinx's Vivado.

When used with the higher level abstractions with Vitis, the FPGA card is managed by the XRT framework, and the card shows as a datacenter accelerator card. However, it is sometimes necessary to program the card at a lower level, such as for instance becoming a network card (NIC). In such a case, the card is fully reprogrammed, so that even its PCI id changes. Hence users have to decide at what level they want to program the FPGA.

Regarding the programming of the operations of the FPGA (i.e. deploying a program on the FPGA) with Vivado, several options are also available:

  • Via PCI-e.
    • PCI-e programming may not be available as it requires the FPGA to possibly already operate PCI-e support for programming.
  • Via JTAG, by flashing the program on the board-embedded non-volatile memory that lives beside the FPGA (using a .mcs file).
    • Flashing the non-volatile memory requires a subsequent cold reboot of the hosting server to make the FPGA utilize the flashed program. It makes the programming persistent, which means flashing a factory golden image will be required to revert the FPGA to its original operating mode.
  • Via JTAG, directly in the FPGA's volatile memory (using a bitstream, .bit file).
    • By programming the volatile memory, the FPGA will run the program straight away. A warm reboot may be required to make a program (e.g. if modifying the PCI-e) functional. A cold reboot will revert the FPGA to run the program installed in the non-volatile memory of the board.

As a result, it is strongly recommended to prefer programming the FPGA via JTAG in the VOLATILE memory, so that the new programming is NOT persistent, and the FPGA returns back to its default operating mode after a cold reboot, typically after the reservation/job.

FPGA software stack

The servan nodes just like all Grid'5000 nodes are running Debian stable by default. No support is provided in that default operating system environment.

AMD/Xilinx supports a limited list of OS to operate the FPGA, see here. A kadeploy Ubuntu 20.04 image is provided by Pierre Neyron, which includes the Xilinx tools. Anyone may deploy it using:

kadeploy -u pneyron -e ubuntu2004-fpga

That ubuntu2004-fpga system environment is built using a kameleon recipe available in https://gitlab.inria.fr/neyron/ubuntu2004-fpga.

Because of the high disk space required (~50GB), Vitis and Vivado are installed on a NFS shared storage (/tools/ in the deployed Ubuntu system). Those tools are subject to end-user license agreements (EULA). Access to the tools in the NFS shared storage can be requested to Pierre Neyron.

Developping code for the FPGA

Getting help on how to code a program for the FPGA is out of the scope of this document.