Vitis AI provides several C++ and Python examples to demonstrate the use of the unified cloud-edge runtime programming APIs.
Note: The purpose of the sample codes is to help users to get started with our new runtime (VART), but not to demonstrate high performance.It is suggested that if you want to get familiarized with the unified APIs, you should use the VART examples. These examples are only to understand the APIs and do not provide high performance. These APIs are compatible between the edge and cloud, though cloud boards may have different software optimizations such as batching and on the edge would require multi-threading to achieve higher performance. Users who desire higher performance should reference the Vitis AI Library samples and demo software.
If you want to do some optimizations by themselves to achieve high performance, there are some suggestions:
- Rearrange the thread pipeline structure so that every dpu thread has its own "DPU" runner object.
- Optimize display thread so that when DPU FPS is higher than display rate, skipping some frames. 200FPS is too much for video display.
- pre-decoding. The video file might be h264 encoded, the decoder is much slower than DPU and eat a lot of CPU resource. The video file has to be decoded and transform into raw format beforehand.
- Batch mode needs special consideration, it will cause video frame jittering. Anyway, it is only for U50. ZCU102 has no batch mode support.
- OpenCV cv::imshow is slow, so you need to use libdrm.so. This is only for local display, not through X server.
- Other optimizations.
The following table below describes these Vitis AI examples.
|1||resnet50||ResNet50||Caffe||Image classification with Vitis AI unified C++ APIs.|
|2||resnet50_mt_py||ResNet50||TensorFlow||Multi-threading image classification with Vitis AI unified Python APIs.|
|3||inception_v1_mt_py||Inception-v1||TensorFlow||Multi-threading image classification with Vitis AI unified Python APIs.|
|4||pose_detection||SSD, Pose detection||Caffe||Pose detection with Vitis AI unified C++ APIs.|
|5||video_analysis||SSD||Caffe||Traffic detection with Vitis AI unified C++ APIs.|
|6||adas_detection||YOLO-v3||Caffe||ADAS detection with Vitis AI unified C++ APIs.|
|7||segmentation||FPN||Caffe||Semantic segmentation with Vitis AI unified C++ APIs.|
The typical code snippet to deploy models with Vitis AI unified C++ high-level APIs is as follows:
auto runners = vitis::ai::DpuRunner::create_dpu_runner("vitis_rundir"); auto runner = runners; // populate input/output tensors auto job_data = runner->execute_async(inputs, outputs); runner->wait(job_data.first, -1); // process outputs
The typical code snippet to deploy models with Vitis AI unified Python high-level APIs is shown below:
runner = Runner('vitis_rundir') # populate input/output tensors jid = runner.execute_async(fpgaInput, fpgaOutput) runner.wait(jid) # process fpgaOutput