vai_q_tensorflow Usage - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English

The following table shows the vai_q_tensorflow options.

Table 1. vai_q_tensorflow Options
Name Type Description
Common Configuration
--input_frozen_graph String TensorFlow frozen inference GraphDef file for the floating-point model. It is used for post-training quantization.
--input_nodes String Specifies the name list of input nodes of the quantize graph to be used with –output_nodes, separated by commas. Input nodes and output nodes are the starting and ending points of quantization. The subgraph between them is quantized if it is quantizable.
--output_nodes String Specifies the name list of output nodes of the quantize graph, combined with –input_nodes, separated by commas. Input nodes and output nodes are the starting and ending points of quantization. The subgraph between them is quantized if it is quantizable.
--input_shapes String Specifies the shape list of input nodes. It must be a four-dimensional shape for each node, separated by commas. For example, 1,224,224,3. Supports unknown size for batch_size, for example, 224,224,3. In case of multiple input nodes, assign the shape list of each node separated by, for example,? 224,224,3:? 300,300,1.
--input_fn String

Provides input data for the graph when used with the calibration dataset. The function follows the module_name.input_fn_name format (for example, my_input_fn.input_fn). The -input_fn should take an int object as input, representing the calibration step, and return a dict of ("placeholder_node_name, numpy.Array") pairs as an object for each call. The object is then fed into the placeholder operations of the model.

For instance, you can assign -input_fn to my_input_fn.calib_input and create the calib_input function in my_input_fn.py as follows:

def calib_input_fn:
# read the image and do some preprocessing
    return {"placeholder_1": input_1_nparray, "placeholder_2": input_2_nparray}
Note: You do not need to perform in-graph pre-processing again in input_fn because the subgraph before –input_nodes remains during quantization. Remove the pre-defined input functions (including default and random) because they are not commonly used. The pre-processing part, which is not in the graph file, should be handled in input_fn.
Quantize Configuration
--weight_bit Int32 Specifies the bit width for quantized weight and bias.

Default value: 8

--activation_bit Int32 Specifies the bit width for quantized activation.

Default value: 8

--nodes_bit String Specifies the bit width of nodes. Node names and bit widths form a pair of parameters joined by a colon; the parameters are comma separated. When specifying the conv op name, only vai_q_tensorflow quantizes the weights of conv op using the specified bit width. For example, conv1/Relu:16,conv1/weights:8,conv1:16.
--method Int32 Specifies the method for quantization.
  • 0: Non-overflow method in which no values are saturated during quantization. Sensitive to outliers.
  • 1: Min-diffs method that enables saturation for quantization to get a lower quantization difference. Higher tolerance to outliers. Usually ends with narrower ranges than the non-overflow method.
  • 2: Min-diffs method with the strategy for depthwise. It enables saturation for large values during quantization to get smaller quantization errors. A particular strategy is applied for depthwise weights. It is slower than method 0 but has higher endurance to outliers.

Default value: 1

--nodes_method String Specifies the method of nodes. Node names and methods form a pair of parameters joined by a colon; the parameter pairs are comma separated. When specifying the conv op name, only vai_q_tensorflow quantizes weights of conv op using the specified method, for example, 'conv1/Relu:1,depthwise_conv1/weights:2,conv1:1'.
--calib_iter Int32 Specifies the calibration iterations. A total number of images for calibration = calib_iter * batch_size.

Default value: 100

--ignore_nodes String Specifies the list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization.
--skip_check Int32 If set to 1, the check for the floating-point model is skipped. Useful when only part of the input model is quantized.

Range: [0, 1]

Default value: 0

--align_concat Int32 Specifies the strategy for aligning the input quantize position for concat nodes.
  • 0: Aligns all the concat nodes
  • 1: Aligns the output concat nodes
  • 2: Disables alignment

Default value: 0

--align_pool Int32 Specifies the strategy for aligning the input quantize position for maxpool/avgpool nodes.
  • 0: Aligns all the maxpool/avgpool nodes
  • 1: Aligns the output maxpool/avgpool nodes
  • 2: Disables alignment

Default value: 0

--simulate_dpu Int32 Set to 1 to enable DPU simulation. The behavior of DPU for some operations is different from TensorFlow. For example, the dividing in LeakyRelu and AvgPooling are replaced by bit-shifting, so there might be a slight difference between DPU outputs and CPU/GPU outputs. The vai_q_tensorflow quantizer simulates the behavior of these operations if this flag is set to 1.

Range: [0, 1]

Default value: 1

--adjust_shift_bias Int32 Specifies the strategy for shift bias check and adjustment for the DPU compiler.
  • 0: Disables shift bias check and adjustment
  • 1: Enables with static constraints
  • 2: Enables with dynamic constraints

Default value: 1

--adjust_shift_cut Int32 Specifies the shift cut check and adjustment strategy for the DPU compiler.
  • 0: Disables shift cut check and adjustment
  • 1: Enables with static constraints

Default value: 1

--arch_type String Specifies the arch type for fixed neuron. DEFAULT means the quantization range of weights and activations is [-128, 127]. 'DPUCADF8H' means the weights quantization range is [-128, 127] while activation is [-127, 127]
--output_dir String Specifies the directory to save the quantization results.

Default value: “./quantize_results”

--max_dump_batches Int32 Specifies the maximum number of batches for dumping.

Default value: 1

--dump_float Int32 If set to 1, the float weights and activations are dumped.

Range: [0, 1]

Default value: 0

--dump_input_tensors String Specifies the Graph's input tensor name when the graph entrance is not a placeholder. Add a placeholder to the dump_input_tensor so that input_fn can feed data.
--scale_all_avgpool Int32 Set to 1 to enable scale output of AvgPooling op to simulate DPU. Only kernel_size <= 64 is scaled. This operation does not affect special cases such as kernel_size=3,5,6,7,14

Default value: 1

--do_cle Int32
  • 1: Enables implementation of cross-layer equalization to adjust the distribution of the weight
  • 0: Skips cross-layer equalization operation

Default value: 0

--replace_relu6 Int32 Available only for do_cle=1
  • 1: Replace ReLU6 with ReLU
  • 0: Skips replacement

Default value: 1

--replace_sigmoid Int32
  • 1: Replace sigmoid with hard-sigmoid
  • 0: Skips replacement

Default value: 0

--replace_softmax Int32
  • 1: Replace softmax with hard-softmax
  • 0: Skips replacement

Default value: 0

--convert_datatype Int32
  • 4: Do BN folding and convert to data type fp32
  • 3: Do BN folding and convert to data type bfloat16
  • 2: Do BN folding and convert to data type double
  • 1: Do BN folding and convert to data type fp16
  • 0: Skips conversion

Default value: 0

--output_format String Indicates the format to save the quantized model, pb for saving tensorflow frozen pb, and onnx for saving the ONNX model.

Default value: 'pb'

Session Configurations
--gpu String Specifies GPU device IDs used for quantization, separated by commas.
--gpu_memory_fraction Float Specifies the GPU memory fraction used for quantization, between 0-1.

Default value: 0.5

Others
--help Shows all available vai_q_tensorflow options.
--version Shows the vai_q_tensorflow version information.

Examples

show help: vai_q_tensorflow --help
quantize: 
vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb \
                          --input_nodes inputs \
                          --output_nodes predictions \
                          --input_shapes ?,224,224,3 \
                          --input_fn my_input_fn.calib_input
dump quantized model: 
vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb \
                      --input_fn my_input_fn.dump_input

Refer to AMD Model Zoo for more TensorFlow model quantization examples.