vai_q_tensorflow Quantize Finetuning

vai_q_tensorflow Quantize Finetuning - 1.3 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2021-02-03

Version

1.3 English

Quantize finetuning is similar to float model finetuning but in the former, the vai_q_tensorflow APIs are used to rewrite the float graph to convert it to a quantized graph before the training starts. The typical workflow is as follows:

Preparation: Before finetuning, prepare the following files:

Table 1. Input Files for vai_q_tensorflow Quantize Finetuning
No.	Name	Description
1	Checkpoint files	Floating-point checkpoint files to start from. Can be omitted if train from scratch.
2	Dataset	The training dataset with labels.
3	Train Scripts	The python scripts to run float train/finetuning of the model.

Evaluate the float model (Optional): Evaluate the float checkpoint files first before doing quantize finetuning to check the correctness of the scripts and dataset. The accuracy and loss values of the float checkpoint can also be a baseline for the quantize finetuning.
Modify the training scripts: To create the quantize training graph, modify the training scripts to call the function after the float graph is built. The following is an example:
```
# train.py

# ...

# Create the float training graph
model = model_fn(is_training=True)

# *Set the quantize configurations
from tensorflow.contrib import decent_q
q_config = decent_q.QuantizeConfig(input_nodes=['net_in'],
                                   output_nodes=['net_out'], 
                                   input_shapes=[[-1, 224, 224, 3]])
# *Call Vai_q_tensorflow api to create the quantize training graph
decent_q.CreateQuantizeTrainingGraph(config=q_config)

# Create the optimizer 
optimizer = tf.train.GradientDescentOptimizer()

# start the training/finetuning, you can use sess.run(), tf.train, tf.estimator, tf.slim and so on
# ...
```
The QuantizeConfig contains the configurations for quantization.

Some basic configurations like input_nodes, output_nodes, input_shapes need to be set according to your model structure.

Other configurations like weight_bit, activation_bit, method have default values and can be modified as needed. See vai_q_tensorflow Usage for detailed information of all the configurations.

input_nodes/output_nodes

They are used together to determine the subgraph range you want to quantize. The pre-processing and post-processing part are usually not quantizable and should be out of this range. The input_nodes and output_nodes should be the same for the float training graph and the float evaluation graph to match the quantization operations between them.
Note: Operations with multiple output tensors (such as FIFO) are currently not supported. In such a case, you can add a tf.identity node to make a alias for the input_tensor to make a single output input node.

input_shapes

The shape list of input_nodes, must be a 4-dimension shape for each node, comma separated, for example, [[1,224,224,3] [1, 128, 128, 1]]; support unknown size for batch_size, for example, [[-1,224,224,3]].

Evaluate the quantized model and generate the deploy model: After quantize finetuning, generate the deploy model. Before that, you need to evaluate the quantized graph with a checkpoint file. This can be done by calling the following function after building the float evaluation graph. As the deploy process needs to run based on the quantize evaluation graph, so they are often called together.

# eval.py

# ...

# Create the float evaluation graph
model = model_fn(is_training=False)

# *Set the quantize configurations
from tensorflow.contrib import decent_q
q_config = decent_q.QuantizeConfig(input_nodes=['net_in'],
                                   output_nodes=['net_out'], 
                                   input_shapes=[[-1, 224, 224, 3]])
# *Call Vai_q_tensorflow api to create the quantize evaluation graph
decent_q.CreateQuantizeEvaluationGraph(config=q_config)
# *Call Vai_q_tensorflow api to freeze the model and generate the deploy model
decent_q.CreateQuantizeDeployGraph(checkpoint="path to checkpoint folder", config=q_config)

# start the evaluation, users can use sess.run, tf.train, tf.estimator, tf.slim and so on
# ...

Generated Files

After you have performed the previous steps, the following files are generated file in the ${output_dir}:

Table 2. Generated File Information
Name	TensorFlow Compatible	Usage	Description
quantize_train_graph.pb	Yes	Train	The quantize train graph.
quantize_eval_graph_{suffix}.pb	Yes	Evaluation with checkpoint	The quantize evaluation graph with quantize information frozen inside. No weights inside, should be used together with the checkpoint file in evaluation.
quantize_eval_model_{suffix}.pb	Yes	1. Evaluation; 2. Dump; 3. Input to VAI compiler (DPUCAHX8H)	The frozen quantize evaluation graph, weights in the checkpoint and quantize information are frozen inside. It can be used to evaluate the quantized model on the host or to dump the outputs of each layer for cross check with DPU outputs. XIR compiler uses it as input.
deploy_model_{suffix}.pb	No	Input to VAI compiler (DPUCZDX8G)	The deploy model, operations and quantize information are fused. DNNC compiler uses it as input.

The suffix contains the iteration information from the checkpoint file and the date information to make it clear to combine it to checkpoints files. For example, if the checkpoint file is "model.ckpt-2000.*" and the date is 20200611, then the suffix is "2000_20200611000000".