TensorFlow 2.x - 3.5 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2023-09-28

Version

3.5 English

Quantization API

vitis_vai.quantize(
input_float, 
quantize_strategy = 'pof2s', 
custom_quantize_strategy = None,
calib_dataset = None, 
calib_steps = None, 
calib_batch_size = None, 
save_path = './vai_wego/quantized.h5', 
verbose = 0, 
add_shape_info = False, 
dump = False, 
dump_output_dir = './vai_dump/')

This function performs the float model's post-training quantization (PTQ), including model optimization, weight quantization, and activation post-training quantization.

Parameters

input_float

A tf.keras.Model float object to be quantized.

quantize_strategy

A string object of the quantize strategy type. Available values are pof2s, pof2s_tqt, fs , and fsx. pof2s is the default strategy that uses a power-of-2 scale quantizer and the Straight-Through-Estimator. pof2s_tqt is a strategy introduced in Vitis AI 1.4 that uses Trained-Threshold in power-of-2 scale quantizers and can yield better QAT results. fs is a quantize strategy introduced in Vitis AI 2.5 that does float scale quantization for inputs and weights of Conv2D, DepthwiseConv2D, Conv2DTranspose, and Dense layers. On the other hand, fsx extends the quantize strategy for more layer types than the fs quantize strategy, including Add, MaxPooling2D, and AveragePooling2D, and also includes biases and activations in quantization.

Note:

pof2s_tqt strategy should only be used in QAT with init_quant=True for the best performance.
fs and fsx strategies are designed for target devices with floating-point supports. DPU does not have floating-point support now, so models quantized with these quantize strategies cannot be deployed to them.

custom_quantize_strategy

A string object. The file path of the custom quantize strategy JSON file.

calib_dataset

A tf.data.Dataset, keras.utils.Sequence, or np.numpy object. The representative dataset for calibration. You can use the whole or part of eval_dataset, train_dataset, or other datasets as calib_dataset.

calib_steps

An int object. The total number of steps for calibration. Ignored with the default value of None. If calib_dataset is a tf.data dataset, generator, or keras.utils.Sequence instance and steps are None, calibration runs until the dataset is exhausted. Array inputs do not support this argument.

calib_batch_size

An int object. The number of samples per batch for calibration. If the calib_dataset is in the form of a dataset, generator, or keras.utils.Sequence instances, the batch size is controlled by the dataset itself. If the calib_dataset is in the form of a numpy.array object, the default batch size is set to 32.

save_path

A string object. The directory to save the quantized model.

verbose

An int object. The verbosity of the logging. Greater verbose value generates more detailed logging. The default value is 0.

add_shape_info

A bool object. Determines whether to add shape inference information for custom layers. It must be set to True for models with custom layers.

dump

A flag to enable/disable dump. If dump=False, dump is disabled, and if dump=True, dump is enabled.

dump_output_dir

A string object. The directory to save the dump results.

For more information on how to use on-the-fly quantization in WeGO TensorFlow 2.x, see WeGO examples.