vitis_quantize.VitisQuantizer.quantize_model - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

This function performs the float model's post-training quantization (PTQ), including model optimization, weights quantization, and activation post-training quantization.


vitis_quantize.VitisQuantizer.quantize_model(
    calib_dataset=None,
    calib_batch_size=None,
    calib_steps=None,
    verbose=0,
    add_shape_info=False,
    **kwargs)

Arguments

calib_dataset
A tf.data.Dataset, keras.utils.Sequence, or np.numpy object. It is the representative dataset for calibration. You can use the whole or part of eval_dataset, train_dataset, or other datasets as calib_dataset.
calib_steps
Int. The total number of steps for calibration. Ignored with the default value of None. If calib_dataset is a tf.data dataset, generator, or keras.utils.Sequence instance and steps are Nonem, calibration runs until the dataset is exhausted. Array inputs do not support this argument.
calib_batch_size
Int. The number of samples per batch for calibration. If the calib_dataset is a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the calib_dataset is in the form of a numpy.array object, the default batch size is set to 32.
verbose
Int. The verbosity of the logging. Greater verbose value generates more detailed logging. The default value is 0.
add_shape_info
bool. Indicates whether to add shape inference information for custom layers. It must be set to True for models with custom layers.
**kwargs
dict. The user-defined configurations of the quantize strategy. It overrides the default built-in quantize strategy. Detailed user-defined configurations are listed below.

Arguments in **kwargs

**kwargs in this API is a dict of the user-defined configurations of quantize strategy. It overrides the default built-in quantize strategy. For example, setting bias_bit=16 lets the tool quantize all the biases with 16-bit quantizers. The following are detailed user-defined configurations:

separate_conv_act
A bool object, whether to separate activation functions from the Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers. The default value is True.
fold_conv_bn
A bool object, whether to fold the batch norm layers into previous Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers.
convert_bn_to_dwconv
Named fold_bn in Vitis AI 2.0 and previous versions.
A bool object, whether to convert the standalone BatchNormalization layer into DepthwiseConv2D layers.
convert_sigmoid_to_hard_sigmoid
Named replace_sigmoid in Vitis AI 2.0 previous versions.
A bool object, whether to replace the Activation(activation='sigmoid') and Sigmoid layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and scheduled on the CPU.
convert_relu_to_relu6
Named replace_relu6 in Vitis AI 2.0 and previous versions.
A bool object, whether to replace the ReLU6 layers with ReLU layers.
include_cle
A bool object, whether to implement Cross-Layer Equalization before quantization.
cle_steps
A int object, the iteration steps to do Cross-Layer Equalization.
cle_to_relu6
Named forced_cle in Vitis AI 2.0 and previous versions.
A bool object, whether to do forced Cross-Layer Equalization for ReLU6 layers.
include_fast_ft
A bool object determining whether to perform fast fine-tuning or not. Fast fine-tuning adjusts the weights layer by layer with the calibration dataset and can get better accuracy for some models. Fast fine-tuning is turned off by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn it on to improve the performance if you meet accuracy issues.
fast_ft_epochs
An int object, the iteration epochs to do fast fine-tuning for each layer.
output_format
A string object, indicates what format to save the quantized model. Options are: '' for skip saving, 'h5' for saving .h5 file, 'tf' for saving saved_model file, 'onnx' for saving .onnx file. The default value is ''.
onnx_opset_version
An int object, the ONNX opset version. Take effect only when output_format is 'onnx.' The default value is 11.
output_dir
A string object, indicates the directory to save the quantized model. The default value is './quantize_results.'
convert_datatype
A string object, which indicates the target data type for the float model. Options are 'float16', 'bfloat16', 'float32', and 'float64'. The default value is 'float16'.
input_layers
A list(string) object, names of the start layers to be quantized. Layers before these layers in the model will not be optimized or quantized. For example, this argument can skip some pre-processing layers or stop quantizing the first layer. The default value is [].
output_layers
A list(string) object, names of the end layers to be quantized. Layers after these layers in the model will not be optimized or quantized. For example, this argument can skip some post-processing layers or stop quantizing the last layer. The default value is [].
ignore_layers
A List(string) object, names of the layers to be ignored during quantization. For example, this argument can be used to skip quantizing some sensitive layers to improve accuracy. The default value is [].
input_bit
An int object, the bit width of all inputs. The default value is 8.
input_method
An int object, the method to calculate scale factors in quantizing all inputs. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, and 3 for Percentile. The default value is 0.
input_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all inputs. The default value is True.
input_per_channel
A bool object, whether to do per-channel or per-tensor quantization for all inputs. The default value is False.
input_round_mode
An int object, the rounding mode used to quantify all inputs. Options are 0 for HALF_TO_EVEN, 1 for HALF_UP, and 2 for HALF_AWAY_FROM_ZERO. The default value is 1.
input_unsigned
A bool object, whether to use unsigned integer quantization for all inputs. It is usually used for non-negative numeric inputs (ranging from 0 to 1) when input_unsigned is true. The default value is False.
weight_bit
An int object, the bit width of all weights. The default value is 8.
weight_method
An int object, the method to calculate scale factors in quantizing all weights. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, and 3 for Percentile. The default value is 1.
weight_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all weights. The default value is True.
weight_per_channel
A bool object, whether to do per-channel or per-tensor quantization for all weights. The default value is False.
weight_round_mode
An int object, the rounding mode used to quantify all weights. Options are 0 for HALF_TO_EVEN, 1 for HALF_UP, and 2 for HALF_AWAY_FROM_ZERO. The default value is 0.
weight_unsigned
A bool object, whether to use unsigned integer quantization for all weights. It is usually used when weight_symmetry is false. The default value is False.
bias_bit
An int object, the bit width of all biases. The default value is 8.
bias_method
An int object, the method to calculate scale factors in quantizing all biases. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, and 3 for Percentile. The default value is 0.
bias_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all biases. The default value is True.
bias_per_channel
A bool object, whether to do per-channel or per-tensor quantization for all biases. The default value is False.
bias_round_mode
An int object, the rounding mode used to quantify all biases. Options are 0 for HALF_TO_EVEN, 1 for HALF_UP, and 2 for HALF_AWAY_FROM_ZERO. The default value is 0.
bias_unsigned
A bool object, whether to use unsigned integer quantization for all bias. It is usually used when bias_symmetry is false. The default value is False.
activation_bit
An int object, the bit width of all activations. The default value is 8.
activation_method
An int object, the method to calculate scale factors in quantizing all activations. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, and 3 for Percentile. The default value is 1.
activation_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all activations. The default value is True.
activation_per_channel
A bool object, whether to do per-channel or per-tensor quantization for all activations. The default value is False.
activation_round_mode
An int object, the rounding mode used to quantify all activations. Options are 0 for HALF_TO_EVEN, 1 for HALF_UP, and 2 for HALF_AWAY_FROM_ZERO. The default value is 1.
activation_unsigned
A bool object, whether to use unsigned integer quantization for all activations. It is usually used for non-negative numeric activations (such as ReLU or ReLU6) when activation_symmetry is true. The default value is False.