Programming with VOE - 3.0 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.0 English

Vitis AI ONNX Runtime Engine, short for VOE, is a new feature in Vitis AI 3.0. It allows user to directly run the quantized ONNX model on the target board. VitisAI EP is provided to accelerate the inference with Xilinx DPU. The following is the overview of VOE in Vitis AI.

Figure 1. VOE Overview

In Vitis AI 3.0, there are more than 10 deployment examples based on ONNX runtime are provided. Users can find the examples in The following shows how to use VOE to deploy the ONNX model step by step.

  1. Prepare the quantized model in ONNX format. Users need to use the Vitis-AI quantizer to quantize the model and output the quantized model in ONNX format.
  2. Download the ONNX runtime package vitis_ai_2022.2-r3.0.0.tar.gz and install it on the target board.
    tar -xzvf vitis_ai_2022.2-r3.0.0.tar.gz -C /
  3. Use the ONNX Runtime C++ API to create the application program. For the details of ONNX Runtime API, refer to The following shows the segmentation model deployment code snippet based on the C++ API.

    C++ example

    //Create a session
    //Select a set of execution provides(EP) if any, "VITISAI_EP" is selected
    env = Ort::Env(ORT_LOGGING_LEVEL_WARNING, "Segmentation"); 
    session_options = Ort::SessionOptions(); CheckStatus(OrtSessionOptionsAppendExecutionProvider_VITISAI(session_options,"")); std::string model_name_(model_name); 
    session = std::unique_ptr<Ort::Experimental::Session>( new Ort::Experimental::Session(env, model_name_, session_options)); 
    //Do the pre-process and set the input
    cv::Mat resize_image; 
    auto height = input_shapes[0][2]; 
    auto width = input_shapes[0][3]; 
    auto size = cv::Size((int)width, (int)height); 
    cv::resize(image[0], resize_image, size); 
    if (input_tensors.size()) 
    { input_tensors[0] = Ort::Experimental::Value::CreateTensor<float>    (, input_tensor_values.size(), input_shapes[0]); } 
    { input_tensors.push_back( Ort::Experimental::Value::CreateTensor<float>(, input_tensor_values.size(), input_shapes[0])); } 
    //Run the session
    output_tensors = session->Run(session->GetInputNames(), input_tensors, session->GetOutputNames()); 
    output_tensor_ptr[0] = output_tensors[0].GetTensorMutableData<float>(); 
    //Get the output and do the post-process
    auto oc = output_shapes[0][1]; 
    auto oh = output_shapes[0][2]; 
    auto ow = output_shapes[0][3]; 
    auto hwc = permute(output_tensor_ptr[0], oc, oh, ow); 
    cv::Mat result(oh, ow, CV_8UC1); 
    max_index_c(, oc, oh * ow,; 
  4. Create a file as shown below, or copy one from the Vitis AI Library ONNX examples and modify it. Then, build the program.
    result=0 && pkg-config --list-all | grep opencv4 && result=1
    if [ $result -eq 1 ]; then
    	OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv4)
    	OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv)
    lib_x=" -lglog -lunilog -lvitis_ai_library-xnnpp -lvitis_ai_library-model_config -lprotobuf -lxrt_core -lvart-xrt-device-handle -lvaip-core -lxcompiler-core -labsl_city -labsl_low_level_hash -lvart-dpu-controller -lxir -lvart-util -ltarget-factory -ljson-c" 
    lib_onnx=" -lonnxruntime" 
    lib_opencv=" -lopencv_videoio -lopencv_imgcodecs -lopencv_highgui -lopencv_imgproc -lopencv_core " 
    if [[ "$CXX"  == *"sysroot"* ]];then
     inc_x="-I=/usr/include/onnxruntime -I=/install/Release/include/onnxruntime -I=/install/Release/include -I=/usr/include/xrt"
     link_x="  -L=/install/Release/lib"
     inc_x=" -I/usr/include/onnxruntime  -I/usr/include/xrt"
     link_x="  "
    name=$(basename $PWD)
    $CXX -O2 -fno-inline -I. \
     ${inc_x} \
     ${link_x} \
     -o ${name}_onnx -std=c++17 \
     $PWD/${name}_onnx.cpp \
     ${OPENCV_FLAGS} \
     ${lib_opencv} \
     ${lib_x} \
  5. Copy the executable program and the quantized ONNX model to the target. Then, run the program.
    Note: For the ONNX model deployment, the input model is the quantized ONNX model. It will do the model compiling online first when you run the program. It may take some time during compiling the model.