The TensorFlow2 quantizer supports two different approaches to quantize a deep learning model:
- Post-training quantization (PTQ)
- PTQ is a technique to convert a pre-trained float model into a quantized model with little degradation in model accuracy. A representative dataset is needed to run a few batches of inference on the float model to obtain the distributions of the activations. This is also called quantize calibration.
- Quantization aware training (QAT)
- QAT models the quantization errors in both the forward and backward passes during model quantization. For QAT, starting from a float-point pre-trained model with good accuracy is recommended over starting from scratch.