Developing with the ONNX Runtime Engine API - 3.0 English

Vitis AI Library User Guide (UG1354)

Document ID
UG1354
Release Date
2023-01-12
Version
3.0 English

This section describes how to deploy the quantized ONNX model on the Edge board.

In Vitis AI 3.0, the Vitis AI ONNX Runtime Engine (VOE) is supported and Vitis AI EP is provided to accelerate the inference with Xilinx DPU. The following is the overview of the ONNX Runtime Engine in Vitis AI.

Figure 1. ONNX Runtime Overview

More than 10 deployment examples that are based on the ONNX Runtime are provided in Vitis AI 3.0. You can find the examples in the samples_onnx folder. To deploy the ONNX model using Vitis AI, follow these steps:

  1. Git clone the corresponding Vitis AI Library from https://github.com/Xilinx/Vitis-AI.
  2. Install the cross-compilation system on the host side. Refer to Installation for instructions.
  3. Prepare the quantized model in ONNX format. Use the Vitis AI Quantizer to quantize the model and output the quantized model in the ONNX format.
  4. Download the ONNX runtime package vitis_ai_2022.2-r3.0.0.tar.gz and install it on the target board.
    tar -xzvf vitis_ai_2022.2-r3.0.0.tar.gz -C /
  5. Use the ONNX Runtime API to create the application program. C++ APIs of ONNX Runtime are supported. To know more about the ONNX Runtime API, see the ONNX Runtime API docs. The programming flow is shown below.
    //Create a session
    //Select a set of execution provides(EP) if any, "VITISAI_EP" is selected
    Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "resnet50_pt"); auto session_options = Ort::SessionOptions(); 
    session_options.EnableProfiling("profile_resnet50_pt");
    CheckStatus( OrtSessionOptionsAppendExecutionProvider_VITISAI(session_options, json_config.c_str())); auto session = Ort::Experimental::Session(env, model_name, session_options); 
    
    //Do the pre-process and set the input
    preprocess_resnet50(g_image_files, input_tensor_values, input_shape); std::vector<Ort::Value> input_tensors; input_tensors.push_back(Ort::Experimental::Value::CreateTensor<float>( input_tensor_values.data(), input_tensor_values.size(), input_shape)); 
    
    //Run the session
    auto output_tensors = session.Run(session.GetInputNames(), input_tensors, session.GetOutputNames()); 
    
    //Get the output and do the post-process
    auto output_shape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape(); postprocess_resnet50(g_image_files, output_tensors[0]); 
    
  6. Create a build.sh file as shown below, or copy one from the Vitis AI Library ONNX examples and modify it.
    result=0 && pkg-config --list-all | grep opencv4 && result=1
    if [ $result -eq 1 ]; then
    	OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv4)
    else
    	OPENCV_FLAGS=$(pkg-config --cflags --libs-only-L opencv)
    fi
    
    lib_x=" -lglog -lunilog -lvitis_ai_library-xnnpp -lvitis_ai_library-model_config -lprotobuf -lxrt_core -lvart-xrt-device-handle -lvaip-core -lxcompiler-core -labsl_city -labsl_low_level_hash -lvart-dpu-controller -lxir -lvart-util -ltarget-factory -ljson-c" 
    lib_onnx=" -lonnxruntime" 
    lib_opencv=" -lopencv_videoio -lopencv_imgcodecs -lopencv_highgui -lopencv_imgproc -lopencv_core " 
    inc_x=" -I=/usr/include/onnxruntime -I=/install/Release/include/onnxruntime -I=/install/Release/include -I=/usr/include/xrt " 
    link_x=" -L=/install/Release/lib" 
    
    name=$(basename $PWD)
    
    CXX=${CXX:-g++}
    $CXX -O2 -fno-inline -I. \
     ${inc_x} \
     ${link_x} \
     -o ${name}_onnx -std=c++17 \
     $PWD/${name}_onnx.cpp \
     ${OPENCV_FLAGS} \
     ${lib_opencv} \
     ${lib_x} \
     ${lib_onnx} 
  7. Cross-compile the program.
    sh -x build.sh
  8. Copy the executable program and the quantized ONNX model to the target board using the scp command.
  9. Execute the program on the target board. Before running the program, ensure that the target board has the Vitis AI Library installed, and prepare the images you want to test.
    ./resnet50_onnx <Onnx model> <image>
    Note: For the ONNX model deployment, the input model is the quantized ONNX model. The model is compiled online when you run the program.