Automated deployment, scaling and management of FPGA clusters: the easy way

Tuesday, August 27th, 2019

Recently major cloud and HPC providers like Amazon, Alibaba, Huawei and Nimbix have started deploying FPGAs in their data centers. FPGA vendors like Xilinx have released novel and powerful FPGA cards like the Alveo FPGA family of cards (U50, U200 and U250) that provide a great solution for applications that need high performance and low latency. Intel has also announced last month the N5005 programmable accelerator card (PAC) designed for accelerating compute-intensive workloads like AI inferencing, big data and streaming analytics, network security and image transcoding.

Also system vendors likes HPE (ProLiant DL380 Gen10) and Dell EMC offer servers with multiple FPGA cards (e.g. up to 8 FPGA cards) that can be used in applications that need high performance without consuming excessive amount of power.

However, currently there are limited cases of wide utilization of FPGA clusters. One of the main barrier towards the widespread deployment of FPGA clusters was the lack of an efficient framework for the automated deployment, scaling and management of FPGA clusters. In the domain of the CPUs there are several frameworks that allow the easy deployment and management of CPU clusters like kubernetes, Mesos, Yarn, etc. However, in the domain of FPGA there wasn’t any framework that would allows the easy deployment of FPGA clusters.

In case one user wanted to deploy several FPGAs (either on the same server or on multiple servers), the user would have to configure the FPGAs, distribute the data manually and to collect back the results from each device; a tedious and error-prone process.

Also, if multiple users or applications/threads wanted to have access to the resources of an FPGA cluster, a serialization mechanism is required that will make sure that there will be no conflicts and the requests to the FPGA resources will be made in a proper way.

To solve this issues, InAccel has released the Coral FPGA resource manager.

Coral is a scalable, reliable and fault-tolerant distributed acceleration system responsible for monitoring, virtualizing and orchestrating clusters of FPGAs. Coral also introduces high-level abstractions by exposing FPGAs as a single pool of accelerators to any application developer that she can easily invoke through simple API calls. Finally, Coral runs as a microservice and is able to run on top of other state-of-the-art resource managers like Hadoop YARN and Kubernetes.

Coral has the following primary goals:

Serve as a universal orchestrator for FPGA resources and acceleration requests.
Improve scalability and maximize performance of deployed accelerators, ensuring the secure sharing of the available resources.
Abstract away cumbersome parallel programming languages (like OpenCL) without compromising flexibility.
Encompass bitstream management and protection capabilities though a central bitstream artifactory based on jfrog.

Coral Resource manager serves as an abstraction layer on top of the OpenCL runtime system. The OpenCL runtime system is using the drivers from the FPGA vendors and can support multiple kernels in the same FPGA.

FPGA resource manager

InAccel Coral API

Coral provides a straight-forward and simplified API that allows the deployment of the accelerators in the same way as software function calls. That way we avoid complex processes based on OpenCL about the configuration of the bitstreams, the management and the allocation of the buffers and the kernels.

The source code presented below shows the advantages of this approach. Even though our API was designed to be simple and intuitive, let’s break it up into pieces and analyze every piece of it to gain a holistic understanding.

Let’s assume we are willing to add the respective elements of two arrays and store their sum in a third array.

// Allocate four vectors
    inaccel::vector<float>  a(size), b(size),
                            c(size);// Send a synchronous request for 'addition' kernel to Coral FPGA 
// Resource Manager
// Request arguments must comply with the accelerator's 
// specific argument list
    inaccel::Request add_req    {'com.inaccel.math.vector.addition'};
    add_req.Arg(a).Arg(b).Arg(c).Arg(size);
    inaccel::Coral::Submit(add_req);

It represents an accelerator request that will be later transmitted to Coral for execution. The name of the request (which is passed as constructor argument) should follow the naming convention explained below.

Through the API call inaccel::Coral::Submit with your request as argument you can execute the specific request. By invoking that function you transmit your accelerator request to Coral for scheduling and execution.

The inaccel::Coral::Submit executes the accelerator request synchronously (i.e blocks until completion). Otherwise, you can force asynchronous submission by invoking inaccel::Coral::SubmitAsync .

The main advantage of the proposed approach is the simplicity of the scalability. When Coral start, the user has only to specify in how many FPGAs the IP cores will be deployed by using a simple interface as it shown below. The user just need to specify how many FPGA cards will be used in this case.

inaccel coral start --fpgas=all

If the user wants to deploy it to all the available FPGAs then a simple command is used. The Coral resource manager will identify the cards and will download the right bistreams for the specific cards that are used.

Coral allow users to deploy their applications to multiple FPGA cards on the same server. If a user needs to deploy applications to multiple server nodes (each of them having multiple FPGAs) the user only needs to install Coral to all of the available servers. The Coral is provided with all the required APIs that allows to be connected with typical distributed systems frameworks like Kubernetes.

Besides the instant scalability that Coral provides, the main advantage is the seamless virtualization of the available resources that it offers. Multiple applications or multi-thread applications can utilize the Coral resource manager by simple invoking the functions that they need to accelerate. Coral enables the sharing of the available FPGA resources when accelerators are requested from multiple applications or threads. Therefore, the software developers can speedup their applications without the complexity of typical heterogeneous platforms.

Coral FPGA resource manager is available for free in the FPGA community. The users can request a license and can start deploying their FPGA cluster on cloud or on-prem instantly. The free version allows to deploy up to 8 FPGAs per cluster. An enterprise version is also available with unlimited number of FPGAs, detailed monitoring tool and full support.

Automated deployment, scaling and management of FPGA clusters: the easy way

Automated deployment, scaling and management of FPGA clusters: the easy way

Leave a Reply Cancel Reply