TensorFlow 2.x - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

Quantization API

quantize_strategy = 'pof2s', 
custom_quantize_strategy = None,
calib_dataset = None, 
calib_steps = None, 
calib_batch_size = None, 
save_path = './vai_wego/quantized.h5', 
verbose = 0, 
add_shape_info = False, 
dump = False, 
dump_output_dir = './vai_dump/') 

This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.


A tf.keras.Model float object to be quantized.
A string object of the quantize strategy type. Available values are pof2s, pof2s_tqt, fs , and fsx. pof2s is the default strategy that uses power-of-2 scale quantizer and the Straight-Through-Estimator. pof2s_tqt is a strategy introduced in Vitis AI 1.4 which uses Trained-Threshold in power-of-2 scale quantizers and may generate better results for QAT. fs is a new quantize strategy introduced in Vitis AI 2.5 that does float scale quantization for inputs and weights of Conv2D, DepthwiseConv2D, Conv2DTranspose, and Dense layers. fsx quantize strategy does quantization for more layer types than fs quantize strategy, such as Add, MaxPooling2D, and AveragePooling2D. Moreover, it also quantizes the biases and activations.
  • pof2s_tqt strategy should only be used in QAT and be used together with init_quant=True to get the best performance.
  • fs and fsx strategy are designed for target devices with floating-point supports. DPU does not have floating-point support now, so models quantized with these quantize strategies cannot be deployed to them.
A string object, the file path of custom quantize strategy JSON file.
A tf.data.Dataset, keras.utils.Sequence, or np.numpy object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset.
An int object, the total number of steps for calibration. Ignored with the default value of None. If calib_dataset is a tf.data dataset, generator, or keras.utils.Sequence instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
An int object, the number of samples per batch for calibration. If the "calib_dataset" is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of a numpy.array object, the default batch size is 32.
A string object, the directory to save the quantized model.
An int object, the verbosity of the logging. Greater verbose value will generate more detailed logging. The default value is 0.
A bool object, whether to add shape inference information for custom layers. Must be set to True for models with custom layers.
A flag to enable/disable dump. If dump=False, dump is disabled, if dump=True, dump is enabled.
A string object, the directory to save the dump results.

For more information on how to use on-the-fly quantization in WeGO TensorFlow 2.x, see WeGO examples.