This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.
vitis_quantize.VitisQuantizer.quantize_model( calib_dataset=None, calib_batch_size=None, calib_steps=None, verbose=0, fold_conv_bn=True, fold_bn=True, replace_sigmoid=True, replace_relu6=True, include_cle=True, cle_steps=10, forced_cle=False, include_fast_ft=False, fast_ft_epochs=10, add_shape_info=False)
np.numpyobject, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset.
- An int object, the total number of steps for calibration. Ignored with the
default value of None. If "calib_dataset" is a
tf.datadataset, generator, or
keras.utils.Sequenceinstance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
- An int object, the number of samples per batch for calibration. If the
"calib_dataset" is in the form of a dataset, generator, or
keras.utils.Sequenceinstances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a
numpy.arrayobject, the default batch size is 32.
boolobject, whether to fold the batch norm layers into previous
boolobject whether to convert the standalone batch norm layer into DepthwiseConv2D layers.
boolobject, whether to replace the Activation(activation='sigmoid') layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU.
boolobject, whether to replace the ReLU6 layers with ReLU layers.
boolobject, whether to do Cross-Layer Equalization before quantization.
intobject, the iteration steps to do Cross-Layer Equalization.
boolobject, whether to do forced Cross-Layer Equalization for ReLU6 layers.
boolobject, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues.
- An int object, the iteration epochs to do fast fine-tuning for each layer.
- An bool object, whether to add shape inference information for custom layers. Must be set True for models with custom layers.