Programming with VOE - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

ONNX Runtime

Vitis AI Execution Provider (Vitis AI EP) offers hardware-accelerated AI inference with AMD's DPU. It enables users to run the quantized ONNX model on the target board directly. The current Vitis AI EP inside ONNX Runtime enables Neural Network model inference acceleration on embedded devices, including Zynq UltraScale+ MPSoC, Versal, Versal AI Edge and Kria cards.

Vitis AI ONNXRuntime Engine (VOE) serves as the implementation library of Vitis AI EP.

Figure 1. VOE Overview


  • Supports ONNX Opset version 18, ONNX Runtime 1.16.0 and ONNX version 1.13
  • C++ and Python API (supported Python version 3)
  • Supports incorporation of other execution providers, such as ACL EP, to accelerate the inference with AMD DPU in addition to the Vitis AI EP
  • Supports computation on the ARM64 Cortex®-A72 cores and the supported target is VEK280 in VAI3.5


  • Versatility: You can deploy subgraphs on the AMD DPU while using other execution providers like Arm® NN and Arm® ACL for additional operators. This flexibility enables the deployment of models that might not be directly supported on target boards.
  • Improved performance: By leveraging specialized execution providers such as the AMD DPU for specific operations and using other providers for the remaining operators, you can achieve optimized performance for their models.
  • Expanded model support: Enhancing ONNX Runtime enables the deployment of models with operators that the DPU does not natively support. By incorporating additional execution providers, you can execute many models, including those from the ONNX model zoo.

Runtime Options

Vitis AI ONNX Runtime integrates a compiler responsible for compiling the model graph and weights into a micro-coded executable. This executable is deployed on the target accelerator.

During the initiation of the ONNX Runtime session, the model is compiled, and this compilation process must be completed before the first inference pass. The compilation duration might vary, but it could take a few minutes. After the model is compiled, the model executable is cached. For subsequent inference runs, you can use the cached executable model.

Several runtime variables can be set to configure the inference session, as listed in the following table. The config_file variable is not optional and must be set to point to the configuration file's location. The cacheDir and cacheKey variables are optional.

Table 1. Runtime Variables
Runtime Variable Default Value Details


"" required, the configuration file path, the configuration file vaip_config.json is contained in Vitis_ai_2023.1-r3.5.0.tar.gz


optional, cache directory



optional, cache key, used to distinguish between different models.

The final cache directory is {cacheDir}/{cacheKey}. In addition, environment variables can be set to customize the Vitis AI EP.

Table 2. Environment Variables

Environment Variable

Default Value




Whether to use the cache, if it is 0, it will ignore the cached executable, and the model will be recompiled.



optional, configure cache path

Install and Deploy

Vitis AI 3.5 offers over ten deployment examples based on ONNX Runtime. You can find the examples at The following steps describe how to use VOE to deploy the ONNX model:

  1. Prepare the quantized model in ONNX format. Use Vitis AI Quantizer to quantize the model and output the quantized model in ONNX format.
  2. Download the ONNX runtime package vitis_ai_2023.1-r3.5.0.tar.gz and install it on the target board.
    tar -xzvf vitis_ai_2023.1-r3.5.0.tar.gz -C /
    Then, download the voe-0.1.0-py3-none-any.whl and onnxruntime_vitisai-1.16.0-py3-none-any.whl. Ensure the device is online and install them online.
    pip3 install voe*.whl
    pip3 install onnxruntime_vitisai*.whl
  3. Vitis AI 3.5 supports ONNX Runtime C++ API and Python API. For details on ONNX Runtime API, refer to The following is an ONNX model deployment code snippet based on the C++ API:

    C++ example

    // ...
    #include <experimental_onnxruntime_cxx_api.h>
    // include user header files
    // ...
    auto onnx_model_path = "resnet50_pt.onnx"
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "resnet50_pt");
    auto session_options = Ort::SessionOptions();
    auto options = std::unorderd_map<std::string,std::string>({});
    options["config_file"] = "/etc/vaip_config.json";
    // optional, eg: cache path : /tmp/my_cache/abcdefg // Replace abcdefg with your model name, eg. onnx_model_md5
    options["cacheDir"] = "/tmp/my_cache";
    options["cacheKey"] = "abcdefg"; // Replace abcdefg with your model name, eg. onnx_model_md5
    // Create an inference session using the Vitis AI execution provider
    session_options.AppendExecutionProvider("VitisAI", options);
    auto session = Ort::Experimental::Session(env, model_name, session_options);
    auto input_shapes = session.GetInputShapes();
    // preprocess input data
    // ...
    // Create input tensors and populate input data
    std::vector<Ort::Value> input_tensors;
    input_tensors.push_back(Ort::Experimental::Value::CreateTensor<float>(, input_data.size(), input_shapes[0]));
    auto output_tensors = session.Run(session.GetInputNames(), input_tensors,
    // postprocess output data
    // ...

    To leverage the Python APIs, use the following example for reference:

    import onnxruntime
    # Add other imports
    # ...
    # Load inputs and do preprocessing
    # ...
    # Create an inference session using the Vitis-AI execution provider
    session = onnxruntime.InferenceSession(
    input_shape = session.get_inputs()[0].shape
    input_name = session.get_inputs()[0].name
    # Load inputs and do preprocessing by input_shape
    input_data = [...]
    result =[], {input_name: input_data})
  4. Create a file or copy one from the Vitis AI Library ONNX examples and modify it. Then, build the program:
    result=0 && pkg-config --list-all | grep opencv4 && result=1
    if [ $result -eq 1 ]; then
    	OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv4)
    	OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv)
    lib_x=" -lglog -lunilog -lvitis_ai_library-xnnpp -lvitis_ai_library-model_config -lprotobuf -lxrt_core -lvart-xrt-device-handle -lvaip-core -lxcompiler-core -labsl_city -labsl_low_level_hash -lvart-dpu-controller -lxir -lvart-util -ltarget-factory -ljson-c" 
    lib_onnx=" -lonnxruntime" 
    lib_opencv=" -lopencv_videoio -lopencv_imgcodecs -lopencv_highgui -lopencv_imgproc -lopencv_core " 
    if [[ "$CXX"  == *"sysroot"* ]];then
     inc_x="-I=/usr/include/onnxruntime -I=/install/Release/include/onnxruntime -I=/install/Release/include -I=/usr/include/xrt"
     link_x="  -L=/install/Release/lib"
     inc_x=" -I/usr/include/onnxruntime  -I/usr/include/xrt"
     link_x="  "
    name=$(basename $PWD)
    $CXX -O2 -fno-inline -I. \
     ${inc_x} \
     ${link_x} \
     -o ${name}_onnx -std=c++17 \
     $PWD/${name}_onnx.cpp \
     ${OPENCV_FLAGS} \
     ${lib_opencv} \
     ${lib_x} \
  5. Copy the executable program and the quantized ONNX model to the target. Then, run the program.
    Note: For the ONNX model deployment, the input model is the quantized ONNX model. If the environmental variable WITH_XCOMPILER is on, it first performs the model compiling online when you run the program. It might take some time to compile the model.