vai_q_tensorflow Usage - 2.0 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2022-01-20
Version
2.0 English

The options supported by vai_q_tensorflow are shown in the following tables.

Table 1. vai_q_tensorflow Options
Name Type Description
Common Configuration
--input_frozen_graph String TensorFlow frozen inference GraphDef file for the floating-point model. It is used for quantize calibration.
--input_nodes String Specifies the name list of input nodes of the quantize graph, used together with –output_nodes, comma separated. Input nodes and output nodes are the start and end points of quantization. The subgraph between them is quantized if it is quantizable.
--output_nodes String Specifies the name list of output nodes of the quantize graph, used together with –input_nodes, comma separated. Input nodes and output nodes are the start and end points of quantization. The subgraph between them is quantized if it is quantizable.
--input_shapes String Specifies the shape list of input nodes. Must be a 4-dimension shape for each node, comma separated, for example 1,224,224,3; support unknown size for batch_size, for example ?,224,224,3. In case of multiple input nodes, assign the shape list of each node separated by :, for example, ?,224,224,3:?,300,300,1.
--input_fn String Provides the input data for the graph used with the calibration dataset. The function format is module_name.input_fn_name (for example, my_input_fn.input_fn). The -input_fn should take an int object as input which indicates the calibration step, and should return a dict`(placeholder_node_name, numpy.Array)` object for each call, which is then fed into the placeholder operations of the model.

For example, assign –input_fn to my_input_fn.calib_input, and write calib_input function in my_input_fn.py as:

def calib_input_fn:
# read image and do some preprocessing
    return {“placeholder_1”: input_1_nparray, “placeholder_2”: input_2_nparray}
Note: You do not need to do in-graph pre-processing again in input_fn because the subgraph before –input_nodes remains during quantization. Remove the pre-defined input functions (including default and random) because they are not commonly used. The pre-processing part which is not in the graph file should be handled in the input_fn.
Quantize Configuration
--weight_bit Int32 Specifies the bit width for quantized weight and bias.

Default: 8

--activation_bit Int32 Specifies the bit width for quantized activation.

Default: 8

--nodes_bit String Specifies the bit width of nodes. Node names and bit widths form a pair of parameters joined by a colon; the parameters are comma separated. When specifying the conv op name, only vai_q_tensorflow will quantize the weights of conv op using the specified bit width. For example, 'conv1/Relu:16,conv1/weights:8,conv1:16'.
--method Int32 Specifies the method for quantization.
  • 0: Non-overflow method in which no values are saturated during quantization. Sensitive to outliers.
  • 1: Min-diffs method that allows saturation for quantization to get a lower quantization difference. Higher tolerance to outliers. Usually ends with narrower ranges than the non-overflow method.
  • 2: Min-diffs method with strategy for depthwise. It allows saturation for large values during quantization to get smaller quantization errors. A special strategy is applied for depthwise weights. It is slower than method 0 but has higher endurance to outliers.

Default value: 1

--nodes_method String Specifies the method of nodes. Node names and method form a pair of parameters joined by a colon; the parameter pairs are comma separated. When specifying the conv op name, only vai_q_tensorflow will quantize weights of conv op using the specified method, for example, 'conv1/Relu:1,depthwise_conv1/weights:2,conv1:1'.
--calib_iter Int32 Specifies the iterations of calibration. Total number of images for calibration = calib_iter * batch_size.

Default value: 100

--ignore_nodes String Specifies the name list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization.
--skip_check Int32 If set to 1, the check for float model is skipped. Useful when only part of the input model is quantized.

Range: [0, 1]

Default value: 0

--align_concat Int32 Specifies the strategy for the alignment of the input quantizeposition for concat nodes.
  • 0: Aligns all the concat nodes
  • 1: Aligns the output concat nodes
  • 2: Disables alignment

Default value: 0

--simulate_dpu Int32 Set to 1 to enable the simulation of the DPU. The behavior of DPU for some operations is different from TensorFlow. For example, the dividing in LeakyRelu and AvgPooling are replaced by bit-shifting, so there might be a slight difference between DPU outputs and CPU/GPU outputs. The vai_q_tensorflow quantizer simulates the behavior of these operations if this flag is set to 1.

Range: [0, 1]

Default value: 1

--adjust_shift_bias Int32 Specifies the strategy for shift bias check and adjustment for DPU compiler.
  • 0: Disables shift bias check and adjustment
  • 1: Enables with static constraints
  • 2: Enables with dynamic constraints

Default value: 1

--adjust_shift_cut Int32 Specifies the strategy for shift cut check and adjustment for DPU compiler.
  • 0: Disables shift cut check and adjustment
  • 1: Enables with static constraints

Default value: 1

--arch_type String Specifies the arch type for fix neuron. 'DEFAULT' means quantization range of both weights and activations are [-128, 127]. 'DPUCADF8H' means weights quantization range is [-128, 127] while activation is [-127, 127]
--output_dir String Specifies the directory in which to save the quantization results.

Default value: “./quantize_results”

--max_dump_batches Int32 Specifies the maximum number of batches for dumping.

Default value: 1

--dump_float Int32 If set to 1, the float weights and activations are dumped.

Range: [0, 1]

Default value: 0

--dump_input_tensors String Specifies the input tensor name of Graph when graph entrance is not a placeholder. Add a placeholder to the dump_input_tensor, so that input_fn can feed data.
--scale_all_avgpool Int32 Set to 1 to enable scale output of AvgPooling op to simulate DPU. Only kernel_size <= 64 will be scaled. This operation does not affect the special case such as kernel_size=3,5,6,7,14

Default value: 1

--do_cle Int32
  • 1: Enables implement cross layer equalization to adjust the weights distribution
  • 0: Skips cross layer equalization operation

Default value: 0

--replace_relu6 Int32 Available only for do_cle=1
  • 1: Allows you to ReLU6 with ReLU
  • 0: Skips replacement.

Default value: 1

Session Configurations
--gpu String Specifies the IDs of the GPU device used for quantization separated by commas.
--gpu_memory_fraction Float Specifies the GPU memory fraction used for quantization, between 0-1.

Default value: 0.5

Others
--help Shows all available options of vai_q_tensorflow.
--version Shows the version information for vai_q_tensorflow .

Examples

show help: vai_q_tensorflow --help
quantize: 
vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb \
                          --input_nodes inputs \
                          --output_nodes predictions \
                          --input_shapes ?,224,224,3 \
                          --input_fn my_input_fn.calib_input
dump quantized model: 
vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb \
                      --input_fn my_input_fn.dump_input

Refer to Xilinx Model Zoo for more TensorFlow model quantization examples.