This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.
vitis_quantize.VitisQuantizer.quantize_model(
calib_dataset=None,
calib_batch_size=None,
calib_steps=None,
verbose=0,
add_shape_info=False,
**kwargs)
Arguments
- calib_dataset
- A
tf.data.Dataset
,keras.utils.Sequence
, ornp.numpy
object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset. - calib_steps
- An int object, the total number of steps for calibration.
Ignored with the default value of None. If "calib_dataset" is a
tf.data
dataset, generator, orkeras.utils.Sequence
instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs. - calib_batch_size
- An int object, the number of samples per batch for
calibration. If the "calib_dataset" is in the form of a dataset, generator,
or
keras.utils.Sequence
instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of anumpy.array
object, the default batch size is 32. - verbose
- An
int
object, the verbosity of the logging. Greater verbose value will generate more detailed logging. Default to 0. - add_shape_info
- An
bool
object, whether to add shape inference information for custom layers. Must be set True for models with custom layers. - **kwargs
- A dict object, the user-defined configurations of quantize strategy. It will override the default built-in quantize strategy. Detailed user-defined configurations are listed below.
Arguments in **kwargs
**kwargs
in this API is a dict of the user-defined configurations of
quantize strategy. It will override the default built-in quantize strategy. For
example, setting "bias_bit=16" will let the tool to quantize all the biases with
16bit quantizers. Detailed user-defined configurations are listed below.
- separate_conv_act
- A
bool
object, whether to separate activation functions from theConv2D/DepthwiseConv2D/TransposeConv2D/Dense
layers. Default toTrue
. - fold_conv_bn
- A
bool
object, whether to fold the batch norm layers into previousConv2D/DepthwiseConv2D/TransposeConv2D/Dense
layers. - convert_bn_to_dwconv
- Named
fold_bn
in Vitis-AI 2.0 and previous versions. - convert_sigmoid_to_hard_sigmoid
- Named
replace_sigmoid
in Vitis-AI 2.0 previous versions. - convert_relu_to_relu6
- Named
replace_relu6
in Vitis-AI 2.0 and previous versions. - include_cle
- A
bool
object, whether to do Cross-Layer Equalization before quantization. - cle_steps
- A
int
object, the iteration steps to do Cross-Layer Equalization. - cle_to_relu6
- Named
forced_cle
in Vitis-AI 2.0 and previous versions. - include_fast_ft
- A
bool
object, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues. - fast_ft_epochs
- An
int
object, the iteration epochs to do fast fine-tuning for each layer. - output_format
- A string object, indicates what format to save the quantized model. Options are: '' for skip saving, 'h5' for saving .h5 file, 'tf' for saving saved_model file, 'onnx' for saving .onnx file. Default to ''.
- onnx_opset_version
- An int object, the ONNX opset version. Take effect only when output_format is 'onnx'. Default to 11.
- output_dir
- A string object, indicates the directory to save the quantized model in. Default to './quantize_results'.
- convert_datatype
- A string object, indicates the target data type for the float model. Options are 'float16', 'bfloat16', 'float32', and 'float64'. Default value is 'float16'.
- input_layers
- A
list(string)
object, names of the start layers to be quantized. Layers before these layers in the model will not be optimized or quantized. For example, this argument can be used to skip some pre-processing layers or stop quantizing the first layer. Default to[]
. - output_layers
- A
list(string)
object, names of the end layers to be quantized. Layers after these layers in the model will not be optimized or quantized. For example, this argument can be used to skip some post-processing layers or stop quantizing the last layer. Default to[]
. - ignore_layers
- A
List(string)
object, names of the layers to be ignored during quantization. For example, this argument can be used to skip quantizing some sensitive layers to improve accuracy. Default to[]
. - input_bit
- An
int
object, the bit width of all inputs. Default to 8. - input_method
- An
int
object, the method to calculate scale factors in quantization of all inputs. Options are: 0 forNon_Overflow
, 1 forMin_MSE
, 2 forMin_KL
, 3 forPercentile
. Default to 0. - input_symmetry
- A
bool
object, whether to do symmetry or asymmetry quantization for all inputs. Default toTrue
. - input_per_channel
- A
bool
object, whether to do per-channel or per-tensor quantization for all inputs. Default toFalse
. - input_round_mode
- An
int
object, the rounding mode used in quantization of all inputs. Options are: 0 forHALF_TO_EVEN
, 1 forHALF_UP
, 2 forHALF_AWAY_FROM_ZERO
. Default to 1. - input_unsigned
- An
bool
object, whether to use unsigned integer quantization for all inputs. It is usually used for non-negative numeric inputs (such as range from 0 to 1) when input_unsigned is true. Default toFalse
. - weight_bit
- An
int
object, the bit width of all weights. Default to 8. - weight_method
- An
int
object, the method to calculate scale factors in quantization of all weights. Options are: 0 forNon_Overflow
, 1 forMin_MSE
, 2 forMin_KL
, 3 forPercentile
. Default to 1. - weight_symmetry
- A
bool
object, whether to do symmetry or asymmetry quantization for all weights. Default toTrue
. - weight_per_channel
- An
bool
object, whether to do per-channel or per-tensor quantization for all weights. Default toFalse
. - weight_round_mode
- An
int
object, the rounding mode used in quantization of all weights. Options are: 0 forHALF_TO_EVEN
, 1 forHALF_UP
, 2 forHALF_AWAY_FROM_ZERO
. Default to 0. - weight_unsigned
- An
bool
object, whether to use unsigned integer quantization for all weights. It is usually used when weight_symmetry is false. Default toFalse
. - bias_bit
- An
int
object, the bit width of all biases. Default to 8. - bias_method
- An
int
object, the method to calculate scale factors in quantization of all biases. Options are: 0 forNon_Overflow
, 1 forMin_MSE
, 2 forMin_KL
, 3 forPercentile
. Default to 0. - bias_symmetry
- A
bool
object, whether to do symmetry or asymmetry quantization for all biases. Default toTrue
. - bias_per_channel
- An
bool
object, whether to do per-channel or per-tensor quantization for all biases. Default toFalse
. - bias_round_mode
- An
int
object, the rounding mode used in quantization of all biases. Options are: 0 forHALF_TO_EVEN
, 1 forHALF_UP
, 2 forHALF_AWAY_FROM_ZERO
. Default to 0. - bias_unsigned
- An
bool
object, whether to use unsigned integer quantization for all bias. It is usually used when bias_symmetry is false. Default toFalse
. - activation_bit
- An
int
object, the bit width of all activations. Default to 8. - activation_method
- An
int
object, the method to calculate scale factors in quantization of all activations. Options are: 0 forNon_Overflow
, 1 forMin_MSE
, 2 forMin_KL
, 3 forPercentile
. Default to 1. - activation_symmetry
- A
bool
object, whether to do symmetry or asymmetry quantization for all activations. Default toTrue
. - activation_per_channel
- An
bool
object, whether to do per-channel or per-tensor quantization for all activations. Default toFalse
. - activation_round_mode
- An
int
object, the rounding mode used in quantization of all activations. Options are: 0 forHALF_TO_EVEN
, 1 forHALF_UP
, 2 forHALF_AWAY_FROM_ZERO
. Default to 1. - activation_unsigned
- An
bool
object, whether to use unsigned integer quantization for all activations. It is usually used for non-negative numeric activations (such as ReLU or ReLU6) when activation_symmetry is true. Default toFalse
.