vai_q_tensorflow2 Fast Finetuning - 2.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
2.5 English

Generally, there is a small accuracy loss after quantization, but for some networks such as MobileNets, the accuracy loss can be large. Fast finetuning uses the AdaQuant algorithm to adjust the weights and quantize parameters layer-by-layer with the unlabeled calibration dataset to improve accuracy for some models. It takes longer than normal PTQ (still much shorter than QAT as the calib_dataset is smaller than the training dataset). Fast finetuning is disabled, by default. It can be turned on to improve the performance if you meet accuracy issues. A recommended workflow is to first try PTQ without fast finetuning and then try quantization with fast finetuning if the accuracy is not acceptable. QAT is another method to improve the accuracy, but it takes more time and needs the training dataset. You can activate fast finetuning by setting include_fast_ft=True during post-training quantization.

quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=None, calib_batch_size=None, include_fast_ft=True, fast_ft_epochs=10) 


  • include_fast_ft indicates whether to do fast finetuning or not.
  • fast_ft_epochs indicates the number of finetuning epochs for each layer.