Easy deployment and scaling of Vitis AI accelerators using InAccel

Thursday, December 9th, 2021

Vitis AI is Xilinx’s development stack for hardware-accelerated AI inference on Xilinx FPGA platforms, including both edge devices and data-center FPGA cards. It consists of optimized IP, tools, libraries, and pre-trained models. It allows data scientists to fully utilize the potential of AI acceleration on Xilinx FPGAs.

Vitis AI provides a comprehensive set of pre-optimized models that are ready to be deployed on Xilinx FPGA devices. It also provides a powerful open source quantizer that supports pruned and unpruned model quantization, calibration, and fine tuning.

To deploy your application on top of Vitis AI you need to integrate the Vitis AI runtime engine (VART). This process is translated to several steps such as creating a VART runner, getting information about the input and output tensors of the model, executing the DPU task, etc. Although the workflow sounds simple, bringing up a fleet of FPGA DPUs and running multiple/concurrent Vitis-AI workloads, in real case scenarios, can be a nightmare.

To ease the deployment, scaling, and management of Vitis AI applications, InAccel develops the InAccel Vitis AI runtime, which is an abstraction layer, that hides the integration complexity from the software developers and ML engineers, but also simplifies the deployment and maintenance of FPGA-powered AI services. It also offers FPGA provisioning and auto-scaling capabilities that can meet every production-grade requirement.

InAccel Vitis AI consists of the following key components:

InAccel (HuggingFace) Spaces — ML apps that demonstrate the capabilities of the whole InAccel Vitis AI platform… in your browser.
InAccel (HuggingFace) Models — A version-controlled “mirror” of the Xilinx AI Model Zoo, for automating integration, deployment and delivery of FPGA models.
Xilinx AI Model Zoo — A comprehensive set of pre-optimized models that are ready to deploy on Xilinx devices.
Xilinx AI Optimizer — An optional model optimizer that can prune a model by up to 90%. It is separately available with commercial licenses.
Xilinx AI Quantizer — A powerful quantizer that supports model quantization, calibration, and fine tuning.
Xilinx AI Compiler — Compiles the quantized model to a high-efficient instruction set and data flow.
Xilinx AI Profiler — Perform an in-depth analysis of the efficiency and utilization of AI inference implementation.
InAccel Coral API — Offers high-level C/C++, Java, Python and Rust APIs for Vitis-AI applications. It is based on a client-server architecture and uses InAccel Coral to transform FPGA resources to a single pool of Vitis-AI runners.
InAccel Coral — Orchestration framework that allows the distributed acceleration of large data sets across clusters of FPGA resources using simple programming models. It is designed to scale up from single devices to hundreds of FPGAs, each offering local computation and storage.
InAccel Vitis-AI Runtime — An InAccel Coral-compliant runtime atop the open source Xilinx VART library.
Xilinx DPU — Efficient and scalable IP cores can be customized to meet the needs of diverse AI applications.

Using InAccel Vitis AI, in the host code you just allocate enough memory for the input and output tensors, with an InAccel allocator, and submit/wait requests, each time changing the input tensor data.

Before executing an InAccel Vitis AI application, you first have to package your custom models (*.xmodel + bitstream.json) and install them in your local InAccel repository. Vitis-AI uses the same bitstream (dpu.xclbin) for all kind of models (the neural network is described in the *.xmodel file), for that reason in the bitstream.json file we just describe the DPU runner details for a specific DPU xmodel.

For example, for the yolov3_adas_pruned xmodel the bitstream.json file looks like this:

{
 'name': 'yolov3_adas_pruned_0_9.xmodel',
 'bitstreamId': 'vitis.ai.darknet',
 'version': '1.3.1',
 'description': 'dk_yolov3_cityscapes_256_512_0.9_5.46G_1.3 (detection)',
 'platform': {
  'vendor': 'xilinx',
  'name': 'u50',
  'version': 'vitis-ai/1.3'
 },
 'kernels': [
  {
   'name': [
    'runner'
   ],
   'kernelId': 'yolov3-cityscapes256x512',
   'arguments': [
    {
     'type': 'float*',
     'name': 'input',
     'access': 'r'
    },
    {
     'type': 'float*',
     'name': 'output',
     'access': 'w'
    }
   ]
  }
 ]
}

https://huggingface.co/inaccel/yolov3_adas_pruned_0_9/blob/main/u50/1.3.1/bitstream.json

The main concepts of the InAccel Runtime are the resource, the memory, the buffer and the compute unit. In order to design and implement the new InAccel Coral-compliant runtime for Vitis-AI we had to decide what each component would represent.

At InAccel Vitis AI Runtime each of them holds the following information:

A resource represents the FPGA DPU which can be reprogrammed with compatible xmodels.
A memory encapsulates the available device-side address space for DPU input/output data.
A buffer includes the host-side memory address of a tensor, as well as its size.
A compute unit stores information related to a DPU runner.

The resource holds all the graph related information located inside the xmodel file. With this knowledge a new VART runner can be represented by a compute unit. If there are multiple requests at the same time, multiple compute units, each containing a VART Runner, are spawned. When the creation of the compute unit is completed, we proceed to the creation of the input and output tensors taking into account the pre-allocated memory at the host code from the buffers. Finally the request is executed in the FPGA DPU.

Using InAccel Vitis AI runtime, the development of FPGA inference applications becomes much simpler, since you can integrate any model (e.g. YOLOv3 object detection) in 4 simple steps:

1. Allocate input and output tensors using the InAccel allocator.

inaccel::vector<float> input(input_size);inaccel::vector<float> output(output_size);

2. Create an acceleration request for your target model.

inaccel::request yolov3(“vitis.ai.darknet.yolov3-cityscapes256x512”);

3. Populate its argument list.

yolov3.arg(input).arg(output);

4. Submit and wait for the request to be completed.

inaccel::submit(yolov3).get();

The InAccel Vitis AI runtime allows also flawless integration with platforms like Hugging Face, which means that you can deploy and run resnet50 and yolov3 models directly in your browser:

https://huggingface.co/inaccel

Xilinx, Alveo, Vitis and Vitis AI are all trademarks of Xilinx Inc.

Easy deployment and scaling of Vitis AI accelerators using InAccel

Easy deployment and scaling of Vitis AI accelerators using InAccel

Leave a Reply Cancel Reply