On-the-fly Quantization in WeGO - 3.0 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.0 English
In the original WeGO workflow, since WeGO will only support a quantized INT8 model as its input, a separate quantization flow should be executed first by leveraging the Vitis AI quantizer explicitly to quantize the float32 model into an INT8 model. It creates the need to perform extra tasks for the users such performing conda environment switch operations between quantizer and WeGO, figuring out the relationship between Vitis AI quantizer and WeGO. To improve the ease of use and make the entire process from quantization to deployment smoother, WeGO has integrated Vitis AI quantizer into its flow, enabling on-the-fly quantization when a float32 model is offered as WeGO’s input. Besides the original WeGO API for compilation, a new API is introduced in WeGO for quantization purposes and the quantizer details are transparent to the end users. The quantization integration in WeGO is in early stage, so there are some limitations:
  1. Only PTQ (Post Training Quantization) is supported in the integration flow now. If accuracy is far from expected, fine-tuning or QAT (Quantization Aware Training) must be used to improve the accuracy by following the native Vitis AI quantization flow.
  2. Only CPUs are adopted for quantization in WeGO and currently GPUs devices are not supported. This may introduce some issues when quantizing large models, which will consume a lot of time.