vai_q_caffe Usage - 2.0 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
2.0 English

The vai_q_caffe quantizer takes a floating-point model as an input model and uses a calibration dataset to generate a quantized model. In the following command line, [options] stands for optional parameters.

vai_q_caffe quantize -model float.prototxt -weights float.caffemodel [options]

The options supported by vai_q_caffe are shown in the following table. The three most commonly used options are weights_bit, data_bit, and method.

Table 1. vai_q_caffe Options List
Name Type Optional Default Description
model String Required - Floating-point prototxt file (such as float.prototxt).
weights String Required - The pre-trained floating-point weights (such as float.caffemodel).
weights_bit Int32 Optional 8 Bit width for quantized weight and bias.
data_bit Int32 Optional 8 Bit width for quantized activation.
method Int32 Optional 1 Quantization methods, including 0 for non-overflow and 1 for min-diffs.

The non-overflow method ensures that no values are saturated during quantization. It is sensitive to outliers.

The min-diffs method allows saturation for quantization to achieve a lower quantization difference. It is more robust to outliers and usually results in a narrower range than the non-overflow method.

calib_iter Int32 Optional 100 Maximum iterations for calibration.
auto_test - Optional Absent Adding this option will perform testing after calibration using a test dataset specified in the prototxt file. To turn on this option, the floating-point prototxt file must be a workable prototxt for accuracy calculation both in TRAIN and TEST mode.
test_iter Int32 Optional 50 Maximum iterations for testing.
output_dir String Optional quantize_results Output directory for the quantized results.
gpu String Optional 0 GPU device ID for calibration and test.
ignore_layers String Optional none List of layers to ignore during quantization.
ignore_layers_file String Optional none Protobuf file which defines the layers to ignore during quantization, starting with ignore_layers
sigmoided_layers String Optional none List of layers before sigmoid operation, to be quantized with optimization for sigmoid accuracy
input_blob String Optional data Name of input data blob
keep_fixed_neuron Bool Optional FALSE Retain FixedNeuron layers in the deployed model. Set this flag if your targeting hardware platform is DPUCAHX8H


vai_q_caffe quantize -model float.prototxt -weights float.caffemodel -gpu 0
Quantize with Auto-Test
vai_q_caffe quantize -model float.prototxt -weights float.caffemodel -gpu 0 -auto_test -test_iter 50
Quantize with the Non-Overflow Method
vai_q_caffe quantize -model float.prototxt -weights float.caffemodel -gpu 0 -method 0
Finetune the Quantized Model
vai_q_caffe finetune -solver solver.prototxt -weights quantize_results/float_train_test.caffemodel -gpu 0
Deploy Quantized Model
vai_q_caffe deploy -model quantize_results/quantize_train_test.prototxt -weights quantize_results/float_train_test.caffemodel -gpu 0