According to professor emeritus of the University of California, Berkeley, David A. Patterson, the only path left to keep increasing the performance and the energy efficiency of computing systems is through domain-specific architectures. FPGAs have been widely used in the embedded system world to provide domain-specific architectures especially in the domain of network and communications systems. FPGAs can offer high performance (e.g. 20x speedup compared to multi-core CPUs), flexibility (through the reconfiguration as opposed to AISCs/ASSP) and high energy efficiency compared to CPUs and GPUs.
One of the main barriers for the widespread deployment was the high programming complexity. The domain specific accelerators had to be written in hardware description languages like VHDL and Verilog making the process of mapping software functions to hardware a rather difficult task affecting negatively the utilization of FPGAs. Over the last few years, however, it became easier to develop specialized architecture from high-level languages. FPGA vendors like Xilinx and Intel offer now new EDA tools that allow developers to create specialized architectures using OpenCL or C/C++ (HLS). This step makes much easier the development of specialized accelerators. FPGAs can now be used in several applications like machine learning, quantitative finance, image processing, data analytics, databases and in general as an accelerator for the most-computationally intensive tasks.
The second barrier for the widespread adoption in the data center and HPC world was the lack of an efficient framework for the easy integration, management, deployment and scaling of the FPGA resources. Currently most of the provided frameworks require the user to explicitly instantiate and invoke any FPGA resources using the OpenCL API, meaning that a user should care about the underlying hardware structure, the connectivity of the resources (for example in which memory banks is an FPGA kernel connected to) etc. Moreover, using the current toolbox it is not easy to scale one application to multiple FPGAs or multiple kernels. Similarly, it is not easy at all to have multiple users/applications/processes to utilize one or more FPGAs. On the other hand, one of the main reasons that led to the wide adoption of cloud computing is the broad availability of frameworks that allow for easy management, deployment, scaling and virtualization of the computing resources. Towards this end, InAccel developed a unique framework that enables the orchestration of the FPGA acceleration tasks making the utilization of every FPGA cluster as easy as it is in the software world.
The following figure depicts how InAccel Coral FPGA resource manager allows the easy deployment and the seamless sharing of the FPGA resources.
- Orchestration: In the domain of software, OS schedules the requests and performs memory management. Users do not have to explicitly specify in which thread or CPU the applications are going to be executed neither have to for the memory transfers to the appropriate memories. Similarly, InAccel Coral serves as an OS of the FPGA resources. It manages, schedules and performs any memory transactions with the FPGAs. A user is not anymore required to select specific FPGAs for his workloads to be executed or explicitly invoke memory transfers.
- Invocation: In software a function is called using its name and a list of arguments. In the same way, InAccel FPGA resource manager allows the off-loading of specific tasks to the FPGA just by a simple request invocation. InAccel’s unified API among C++, Python, Java and Scala, provides all this functionality enabling one-liner software-like invocation of hardware functions.
- Functions: In the software world, tasks are packed into functions that users can invoke using the function’s required input and output arguments. In the FPGA world, we use the word kernel to describe the IP core that is used to accelerate the specific task. Each FPGA can host several kernels. The kernels can be the same (multiple copies of the same kernel) or different ones. The main idea is that multiple kernels act like multiple specialized CPUs for the specific function.
- Executables/Libraries: In the software world we have the binaries and the libraries. In the FPGA world the binary that is used to configure the FPGA to do a specific task is called bitstream. The main difference is that for different types of FPGAs you must generate a different bitstream. Each bitstream can include one or more kernels.
- Packaging: In the software world we use package managers like npm, pip and maven for the easy distribution and deployment of the software packages. At InAccel we have developed a unique CLI for managing bitstream artifacts that not only allows a user to manage and deploy bitstreams to an FPGA cluster but also provides all the functionality for managing an FPGA cluster, viewing logs, encrypting bitstreams etc.
- Distribution: In software, we use distribution systems for libraries (e.g. repositories). The main novelty in InAccel is that we have decoupled the use of bitstreams (IP cores) from the software functions. We have developed a bitstream repository where one can host the bitstreams (for several kernels/functions and FPGAs). This makes much easier the deployment and utilization of the FPGAs from frameworks like C/C++, Python and Java since a software developer does not need to know anything about bitstreams. InAccel FPGA manager automatically fetches, configures and manages the FPGA cluster based on the function invoked by the software developer.