Some default quantize strategies are provided, but sometimes users need to modify quantize configurations for different targets or to get better performance. For example, some target devices may need the biases to be quantized into 32 bit and some may need to quantize only part of the model. This part shows how to configure the quantizer to meet your needs.
Quantize Strategy
quantize_strategy
. Internally,
each quantize_strategy
is a JSON file containing
below configurations:- pipeline_config
- These configurations control the work pipeline of the
quantize tool, including some optimizations during quantization, e.g.,
whether to fold Conv2D + BatchNorm layers, whether to perform
Cross-Layer-Equalization
algorithm and so on. It can be further divided intooptimize_pipeline_config
,quantize_pipeline_config
,refine_pipeline_config
andfinalize_pipeline_config
. - quantize_registry_config
- These configurations control what layer types are quantizable, where to insert the quantize ops and what kind of quantize op to be inserted. It includes some layer specific configurations and user-defined global configurations.
{
"layer_type": "tensorflow.keras.layers.Conv2D",
"quantizable_weights": ["kernel"],
"weight_quantizers": [
{
"quantizer_type": "Pof2SQuantizer",
"quantizer_params": {"bit_width": 8,"method":0, "round_mode": 1, "symmetry": true, "per_channel": true, "channel_axis": -1, "narrow_range": False}
],
"quantizable_biases": ["bias"],
"bias_quantizers": [
{
"quantizer_type": "Pof2SQuantizer",
"quantizer_params": {"bit_width": 8,"method":0, "round_mode": 1, "symmetry": true, "per_channel": false, "channel_axis": -1, "narrow_range": False}
],
"quantizable_activations": ["activation"],
"activation_quantizers": [
{
"quantizer_type": "FSQuantizer",
"quantizer_params": {"bit_width": 8, "method":2, "method_percentile":99.9999, "round_mode": 1, "symmetry": true, "per_channel": false, "channel_axis": -1}
]
}
As you can see, by using this quantize configuration, you quantize the
weight, bias and activations of the Conv2D layer. The weight and bias are using
`Pof2SQuantizer`(power-of-2 scale quantizer) and the activation are using
`FSQuantizer`(float scale quantizer). You can apply different quantizers for
different objects in one layer.Quantizer
here in configurations means the
quantize operation applied to each object. It consumes a float tensor and output
a quantized tensor. Please note that the quantization is 'fake', which means
that the input is quantized to int and then de-quantized to float. Using Built-in Quantize Strategy
-
pof2s
: power-of-2 scale quantization, mainly used for DPU targets now. Default quantize strategy of the quantizer. -
pof2s_tqt
: power-of-2 scale quantization with trained thresholds, mainly used for QAT in DPU now. -
fs
: float scale quantization, mainly used for devices supporting floating-point calculation, such as CPU/GPUs. -
fsx
: trained quantization threshold for power-of-2 scale quantization, mainly used for QAT for DPU now.
Users can switch between the built-in quantize strategies by assigning
quantize_strategy
argument in the contruct
function of VitisQuantizer
. Moreover, two handy
ways to configure the quantize strategy are provided.
Configure by kwargs
in
VitisQuantizer.quantize_model()
This is a easy way for users who need to override the default
pipeline configurations or do global modifications on the quantize operations. The
kwargs
here is a dict object which keys match
the quantize configurations in the JSON file. See vitis_quantize.VitisQuantizer.quantize_model for more information about available keys.
Example codes below shows how to use it.
model = tf.keras.models.load_model(‘float_model.h5’)
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model)
quantizer.quantize_model(calib_dataset,
input_layers=['conv2'],
bias_bit=32,
activation_bit=32,
weight_per_channel=True)
In this example, the quantizer is configured to quantize part of the model.
Layers before conv2
will be not be optimized or
quantized. Moreover, all the activations and biases to 32 bit instead of 8 bit are
quantized, and use per_channel quantization for all weights.
Configure by VitisQuantizer.set_quantize_strategy()
For advanced users who want full control of the quantize tool, this API is provided to set new quantize strategies JSON file. Users can first dump the current configurations to JSON file and make modifications on the it. This allows users to override the default configurations, make more fine-grained quantizer configurations or extend the quantize config to make more layer types quantizable. Then the user can set the new JSON file to the quantizer to apply these modifications.
quantizer = VitisQuantizer(model)
# Dump the current quantize strategy
quantizer.dump_quantize_strategy(dump_file='my_quantize_strategy.json', verbose=0)
# Make modifications of the dumped file 'my_quantize_strategy.json'
# Then, set the modified json to the quantizer and do quantization
quantizer.set_quantize_strategy(new_quantize_strategy='my_quantize_strategy.json')
quantizer.quantize_model(calib_dataset)
verbose
is an int
type argument which controls the verbosity of
the dumped JSON file. Greater verbose value will dump more detailed quantize
strategy. Setting verbose to value greater or equal to 2 will dump the full
quantize strategy.