vitis_quantize.VitisQuantizer.quantize_model - 2.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2022-06-15
Version
2.5 English

This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.


vitis_quantize.VitisQuantizer.quantize_model(
    calib_dataset=None,
    calib_batch_size=None,
    calib_steps=None,
    verbose=0,
    add_shape_info=False,
    **kwargs)

Arguments

calib_dataset
A tf.data.Dataset, keras.utils.Sequence, or np.numpy object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset.
calib_steps
An int object, the total number of steps for calibration. Ignored with the default value of None. If "calib_dataset" is a tf.data dataset, generator, or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
calib_batch_size
An int object, the number of samples per batch for calibration. If the "calib_dataset" is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a numpy.array object, the default batch size is 32.
verbose
An int object, the verbosity of the logging. Greater verbose value will generate more detailed logging. Default to 0.
add_shape_info
An bool object, whether to add shape inference information for custom layers. Must be set True for models with custom layers.
**kwargs
A dict object, the user-defined configurations of quantize strategy. It will override the default built-in quantize strategy. Detailed user-defined configurations are listed below.

Arguments in **kwargs

**kwargs in this API is a dict of the user-defined configurations of quantize strategy. It will override the default built-in quantize strategy. For example, setting "bias_bit=16" will let the tool to quantize all the biases with 16bit quantizers. Detailed user-defined configurations are listed below.

separate_conv_act
A bool object, whether to separate activation functions from the Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers. Default to True.
fold_conv_bn
A bool object, whether to fold the batch norm layers into previous Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers.
convert_bn_to_dwconv
Named fold_bn in Vitis-AI 2.0 and previous versions.
A bool object, whether to convert the standalone BatchNormalization layer into DepthwiseConv2D layers.
convert_sigmoid_to_hard_sigmoid
Named replace_sigmoid in Vitis-AI 2.0 previous versions.
A bool object, whether to replace the Activation(activation='sigmoid') and Sigmoid layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU.
convert_relu_to_relu6
Named replace_relu6 in Vitis-AI 2.0 and previous versions.
A bool object, whether to replace the ReLU6 layers with ReLU layers.
include_cle
A bool object, whether to do Cross-Layer Equalization before quantization.
cle_steps
A int object, the iteration steps to do Cross-Layer Equalization.
cle_to_relu6
Named forced_cle in Vitis-AI 2.0 and previous versions.
A bool object, whether to do forced Cross-Layer Equalization for ReLU6 layers.
include_fast_ft
A bool object, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues.
fast_ft_epochs
An int object, the iteration epochs to do fast fine-tuning for each layer.
input_layers
A list(string) object, names of the start layers to be quantized. Layers before these layers in the model will not be optimized or quantized. For example, this argument can be used to skip some pre-processing layers or stop quantizing the first layer. Default to [].
output_layers
A list(string) object, names of the end layers to be quantized. Layers after these layers in the model will not be optimized or quantized. For example, this argument can be used to skip some post-processing layers or stop quantizing the last layer. Default to [].
ignore_layers
A List(string) object, names of the layers to be ignored during quantization. For example, this argument can be used to skip quantizing some sensitive layers to improve accuracy. Default to [].
input_bit
An int object, the bit width of all inputs. Default to 8.
input_method
An int object, the method to calculate scale factors in quantization of all inputs. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 0.
input_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all inputs. Default to True.
input_per_channel
A bool object, whether to do per-channel or per-tensor quantization for all inputs. Default to False.
input_round_mode
An int object, the rounding mode used in quantization of all inputs. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 1.
weight_bit
An int object, the bit width of all weights. Default to 8.
weight_method
An int object, the method to calculate scale factors in quantization of all weights. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 1.
weight_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all weights. Default to True.
weight_per_channel
An bool object, whether to do per-channel or per-tensor quantization for all weights. Default to False.
weight_round_mode
An int object, the rounding mode used in quantization of all weights. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 0.
bias_bit
An int object, the bit width of all biases. Default to 8.
bias_method
An int object, the method to calculate scale factors in quantization of all biases. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 0.
bias_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all biases. Default to True.
bias_per_channel
An bool object, whether to do per-channel or per-tensor quantization for all biases. Default to False.
bias_round_mode
An int object, the rounding mode used in quantization of all biases. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 0.
activation_bit
An int object, the bit width of all activations. Default to 8.
activation_method
An int object, the method to calculate scale factors in quantization of all activations. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 1.
activation_symmetry
A bool object, whether to do symmetry or asymmetry quantization for all activations. Default to True.
activation_per_channel
An bool object, whether to do per-channel or per-tensor quantization for all activations. Default to False.
activation_round_mode
An int object, the rounding mode used in quantization of all activations. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 1.