In the original WeGO workflow, since WeGO will only support a quantized INT8
model as its input, a separate quantization flow should be executed first by leveraging
the Vitis AI quantizer explicitly to quantize the float32 model into an INT8 model. It
creates the need to perform extra tasks for the users such performing conda environment
switch operations between quantizer and WeGO, figuring out the relationship between
Vitis AI quantizer and WeGO. To improve the ease of use and make the entire process from
quantization to deployment smoother, WeGO has integrated Vitis AI quantizer into its
flow, enabling on-the-fly quantization when a float32 model is offered as WeGO’s input.
Besides the original WeGO API for compilation, a new API is introduced in WeGO for
quantization purposes and the quantizer details are transparent to the end users. The
quantization integration in WeGO is in early stage, so there are some limitations:
- Only PTQ (Post Training Quantization) is supported in the integration flow now. If accuracy is far from expected, fine-tuning or QAT (Quantization Aware Training) must be used to improve the accuracy by following the native Vitis AI quantization flow.
- Only CPUs are adopted for quantization in WeGO and currently GPUs devices are not supported. This may introduce some issues when quantizing large models, which will consume a lot of time.