Generally, there is a small accuracy loss after quantization, but for some networks such as MobileNets, the accuracy loss can be large. In this situation, first try fast finetune. If fast finetune still does not yield satisfactory results, QAT can be used to further improve the accuracy of the quantized models.
The AdaQuant algorithm 1 uses a small set of unlabeled data. It not only calibrates the activations but also finetunes the weights. The Vitis AI quantizer implements this algorithm and under the alias "fast finetuning". Though slightly slower, fast finetuning can achieve better performance than quantize calibration. Similar to QAT, each run of fast finetuning may produce a different result.
Fast finetuning does not train the model, and only needs a limited number of iterations. For classification models on the Imagenet dataset, 5120 images are enough in experiment. Data annotation information is not needed in fast finetuning flow, so data without annotation can be input and it still works fine. Fast finetuning only needs some modification based on the model evaluation script. There is no need to set up the optimizer for training. To use fast finetuning, a function for model forwarding iteration is needed and will be called during fast finetuning. Re-calibration with the original inference code is recommended.
You can find a complete example in the open source example.
# fast finetune model or load finetuned parameter before test if fast_finetune == True: ft_loader, _ = load_data( subset_len=5120, train=False, batch_size=batch_size, sample_method='random', data_dir=args.data_dir, model_name=model_name) if quant_mode == 'calib': quantizer.fast_finetune(evaluate, (quant_model, ft_loader, loss_fn)) elif quant_mode == 'test': quantizer.load_ft_param()
python resnet18_quant.py --quant_mode calib --fast_finetune
python resnet18_quant.py --quant_mode test --fast_finetune
python resnet18_quant.py --quant_mode test --fast_finetune --subset_len 1 --batch_size 1 --deploy
- Itay Hubara et.al., Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming, arXiv:2006.10518, 2020.