Quantizing the Model Using vai_q_tensorflow - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

Run the following commands to quantize the model:

$vai_q_tensorflow quantize \
				--input_frozen_graph  frozen_graph.pb \
				--input_nodes    ${input_nodes} \
				--input_shapes   ${input_shapes} \
				--output_nodes   ${output_nodes} \
				--input_fn  input_fn \
				[options]

The input_nodes and output_nodes arguments are the name list of input nodes of the quantize graph. They serve as the start and end points of quantization. The main graph between them is quantized if it is quantizable, as shown in the following figure.

Figure 1. Quantization Flow for TensorFlow

It is recommended to set –input_nodes as the last nodes of the pre-processing part and -output_nodes as the last nodes of the main graph because some operations in the pre-and- post-processing parts are not quantizable. It might cause errors when the model is compiled by the Vitis AI compiler and deployed to the DPU.

The input nodes might not be the same as the placeholder nodes of the graph. The placeholder nodes should be set as input nodes if the frozen graph does not contain in-graph pre-processing.

The input_fn should be consistent with the placeholder nodes.

[options] stands for optional parameters. The most commonly used options are:

weight_bit
Bit width for quantized weight and bias (the default value is 8).
activation_bit
Bit width for quantized activation (the default value is 8).
method
Quantization methods, including 0 for non-overflow, 1 for min-diffs, and 2 for min-diffs with normalization. The non-overflow approach ensures that no values are saturated during quantization. The results can be affected by outliers. The min-diffs method allows saturation for quantization to achieve a lower quantization difference. It is more robust to outliers and usually results in a narrower range than the non-overflow method.