The vai_q_caffe quantizer takes a floating-point model as an input model and uses a calibration dataset to generate a quantized model. In the following command line, [options] stands for optional parameters.
vai_q_caffe quantize -model float.prototxt -weights float.caffemodel [options]
The options supported by vai_q_caffe are shown in the following table. The three most commonly used options are weights_bit, data_bit, and method.
Floating-point prototxt file (such as
|weights||String||Required||-||The pre-trained floating-point weights (such as float.caffemodel).|
|weights_bit||Int32||Optional||8||Bit width for quantized weight and bias.|
|data_bit||Int32||Optional||8||Bit width for quantized activation.|
Quantization methods, including 0 for non-overflow and 1 for min-diffs.
The non-overflow method ensures that no values are saturated during quantization. It is sensitive to outliers.
The min-diffs method allows saturation for quantization to achieve a lower quantization difference. It is more robust to outliers and usually results in a narrower range than the non-overflow method.
|calib_iter||Int32||Optional||100||Maximum iterations for calibration.|
|auto_test||Bool||Optional||FALSE||Run test after calibration, test dataset required.|
|test_iter||Int32||Optional||50||Maximum iterations for testing.|
|output_dir||String||Optional||quantize_results||Output directory for the quantized results.|
|gpu||String||Optional||0||GPU device ID for calibration and test.|
|ignore_layers||String||Optional||none||List of layers to ignore during quantization.|
|ignore_layers_file||String||Optional||none||Protobuf file which defines the layers to ignore during quantization, starting with ignore_layers|
|sigmoided_layers||String||Optional||none||List of layers before sigmoid operation, to be quantized with optimization for sigmoid accuracy|
|input_blob||String||Optional||data||Name of input data blob|
|keep_fixed_neuron||Bool||Optional||FALSE||Remain FixedNeuron layers in the deployed model. Set this flag if your targeting hardware platform is DPUv3|