vitis_quantize.VitisQuantizer.quantize_model - 2.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
2.5 English

This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.



A, keras.utils.Sequence, or np.numpy object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset.
An int object, the total number of steps for calibration. Ignored with the default value of None. If "calib_dataset" is a dataset, generator, or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
An int object, the number of samples per batch for calibration. If the "calib_dataset" is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a numpy.array object, the default batch size is 32.
An int object, the verbosity of the logging. Greater verbose value will generate more detailed logging. Default to 0.
An bool object, whether to add shape inference information for custom layers. Must be set True for models with custom layers.
A dict object, the user-defined configurations of quantize strategy. It will override the default built-in quantize strategy. Detailed user-defined configurations are listed below.

Arguments in **kwargs

**kwargs in this API is a dict of the user-defined configurations of quantize strategy. It will override the default built-in quantize strategy. For example, setting "bias_bit=16" will let the tool to quantize all the biases with 16bit quantizers. Detailed user-defined configurations are listed below.

A bool object, whether to separate activation functions from the Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers. Default to True.
A bool object, whether to fold the batch norm layers into previous Conv2D/DepthwiseConv2D/TransposeConv2D/Dense layers.
Named fold_bn in Vitis-AI 2.0 and previous versions.
A bool object, whether to convert the standalone BatchNormalization layer into DepthwiseConv2D layers.
Named replace_sigmoid in Vitis-AI 2.0 previous versions.
A bool object, whether to replace the Activation(activation='sigmoid') and Sigmoid layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU.
Named replace_relu6 in Vitis-AI 2.0 and previous versions.
A bool object, whether to replace the ReLU6 layers with ReLU layers.
A bool object, whether to do Cross-Layer Equalization before quantization.
A int object, the iteration steps to do Cross-Layer Equalization.
Named forced_cle in Vitis-AI 2.0 and previous versions.
A bool object, whether to do forced Cross-Layer Equalization for ReLU6 layers.
A bool object, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues.
An int object, the iteration epochs to do fast fine-tuning for each layer.
A list(string) object, names of the start layers to be quantized. Layers before these layers in the model will not be optimized or quantized. For example, this argument can be used to skip some pre-processing layers or stop quantizing the first layer. Default to [].
A list(string) object, names of the end layers to be quantized. Layers after these layers in the model will not be optimized or quantized. For example, this argument can be used to skip some post-processing layers or stop quantizing the last layer. Default to [].
A List(string) object, names of the layers to be ignored during quantization. For example, this argument can be used to skip quantizing some sensitive layers to improve accuracy. Default to [].
An int object, the bit width of all inputs. Default to 8.
An int object, the method to calculate scale factors in quantization of all inputs. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 0.
A bool object, whether to do symmetry or asymmetry quantization for all inputs. Default to True.
A bool object, whether to do per-channel or per-tensor quantization for all inputs. Default to False.
An int object, the rounding mode used in quantization of all inputs. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 1.
An int object, the bit width of all weights. Default to 8.
An int object, the method to calculate scale factors in quantization of all weights. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 1.
A bool object, whether to do symmetry or asymmetry quantization for all weights. Default to True.
An bool object, whether to do per-channel or per-tensor quantization for all weights. Default to False.
An int object, the rounding mode used in quantization of all weights. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 0.
An int object, the bit width of all biases. Default to 8.
An int object, the method to calculate scale factors in quantization of all biases. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 0.
A bool object, whether to do symmetry or asymmetry quantization for all biases. Default to True.
An bool object, whether to do per-channel or per-tensor quantization for all biases. Default to False.
An int object, the rounding mode used in quantization of all biases. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 0.
An int object, the bit width of all activations. Default to 8.
An int object, the method to calculate scale factors in quantization of all activations. Options are: 0 for Non_Overflow, 1 for Min_MSE, 2 for Min_KL, 3 for Percentile. Default to 1.
A bool object, whether to do symmetry or asymmetry quantization for all activations. Default to True.
An bool object, whether to do per-channel or per-tensor quantization for all activations. Default to False.
An int object, the rounding mode used in quantization of all activations. Options are: 0 for HALF_TO_EVEN, 1 for HALF_UP, 2 for HALF_AWAY_FROM_ZERO. Default to 1.