The overall model quantization flow is detailed in the following figure.
The Vitis AI quantizer takes a floating-point model as input (frozen GraphDef file for TensorFlow version, prototxt and caffemodel for Caffe version, and performs pre-processing (folds batchnorms and removes useless nodes), and then quantizes the weights/biases and activations to the given bit width.
To capture activation statistics and improve the accuracy of quantized models, the Vitis AI quantizer needs to run several iterations of inference to calibrate the activations. A calibration image dataset input is therefore required. Generally, the quantizer works well with 100–1000 calibration images. This is because there is no need for back propagation, the un-labeled dataset is sufficient.
After calibration, the quantized model is transformed into a DPU deployable model (named deploy_model.pb for vai_q_tensorflow or deploy.prototxt/deploy.caffemodel for vai_q_caffe), which follows the data format of a DPU. This model can then be compiled by the Vitis AI compiler and deployed to the DPU. The quantized model cannot be taken in by the standard vision Caffe or TensorFlow framework.