The quantization for DPU uses power-of-2 scales, symmetry, per-tensor quantizers and need some special processes to simulate DPU behaviors. For other devices supporting floating-point scales will need a different quantize strategy, so the float scale quantization is introduced.
- Do quantization for inputs and weights of
Denselayers. By default, it will not do Conv-BN folding.
- Do quantization for more layer types than
fsquantize strategy, such as
AveragePooling2D. Moreover, it also quantizes the biases and activations of
Denselayers. By default, it will do Conv-BN folding.
Note:Users can switch to use float scale quantization by setting
fsxstrategies are designed for target devices with floating-point supports. DPU does not have floating-point support now, so models quantized with these quantize strategies can not be deployed to them.
fsxin the construct function of
VitisQuantizer, example codes are showed as below:
model = tf.keras.models.load_model(‘float_model.h5’) from tensorflow_model_optimization.quantization.keras import vitis_quantize quantizer = vitis_quantize.VitisQuantizer(model, quantize_strategy='fs') quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10， **kwargs)
calib_datasetis used as a representative calibration dataset for calibration. You can use full or part of the
train_dataset, or other datasets.
calib_stepsis the total number of steps for calibration. It has a default value of None. If
tf.data dataset, generator, or
keras.utils.Sequenceinstance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs.
- calib_batch_size is the number of samples per batch for
calibration. If the "calib_dataset" is in the form of a dataset, generator, or
keras.utils.Sequenceinstances, the batch size is controlled by the dataset itself. If the
calib_datasetis in the form of a
numpy.arrayobject, the default batch size is 32.
- dict of the user-defined configurations of quantize strategy. It will
override the default built-in quantize strategy. For example, setting
bias_bit=16will let the tool to quantize all the biases with 16 bit quantizers. See vai_q_tensorflow2 Usage section for more information of the user-defined configurations.