The options supported by vai_q_tensorflow
are shown in the following tables.
Name | Type | Description |
---|---|---|
Common Configuration | ||
--input_frozen_graph | String | TensorFlow frozen inference GraphDef file for the floating-point model, used for quantize calibration. |
--input_nodes | String | The name list of input nodes of the quantize
graph, used together with –output_nodes, comma separated. Input
nodes and output_nodes are the start and end points of quantization.
The subgraph between them is quantized if it is quantizable. It is recommended to set –input_nodes to be the last nodes of the preprocessing part and to set –output_nodes to be the last nodes before the post-processing part, because some operations in the pre- and postprocessing parts are not quantizable and might cause errors when compiled by the Vitis AI compiler if you need to deploy the quantized model to the DPU. The input nodes might not be the same as the placeholder nodes of the graph. |
--output_nodes | String | The name list of output nodes of the quantize
graph, used together with –input_nodes, comma separated. Input nodes
and output nodes are the start and end points of quantization. The
subgraph between them is quantized if it is quantizable. It is recommended to set –input_nodes to be the last nodes of the preprocessing part and to set –output_nodes to be the last nodes before the post-processing part, because some operations in the pre- and post-processing parts are not quantizable and might cause errors when compiled by the Vitis AI compiler if you need to deploy the quantized model to the DPU. |
--input_shapes | String | The shape list of input_nodes. Must be a 4-dimension shape for each node, comma separated, for example 1,224,224,3; support unknown size for batch_size, for example ?,224,224,3. In case of multiple input nodes, assign the shape list of each node separated by :, for example, ?,224,224,3:?,300,300,1. |
--input_fn | String | This function provides input data for the graph
used with the calibration dataset. The function format is
module_name.input_fn_name (for example, my_input_fn.input_fn). The
input_fn should take an int object as input which indicates the
calibration step, and should return a dict`(placeholder_node_name,
numpy.Array)` object for each call, which is then fed into the
placeholder operations of the model. For example, assign –input_fn to my_input_fn.calib_input, and write calib_input function in my_input_fn.py as:
Note: You do not need to do in-graph preprocessing
again in input_fn, because the subgraph before –input_nodes
remains during quantization. Remove the pre-defined input
functions (including default and random) because they are not
commonly used. The preprocessing part which is not in the graph
file should be handled in in the input_fn.
|
Quantize Configuration | ||
--weight_bit | Int32 | Bit width for quantized weight and bias. Default: 8 |
--activation_bit | Int32 | Bit width for quantized activation. Default: 8 |
--nodes_bit | String | Specify bit width of nodes, nodes name and bit width form a pair of parameter joined by a colon, and parameters are comma separated. When specify conv op name only vai_q_tensorflow will quantize weights of conv op using specified bit width.e.g 'conv1/Relu:16,conv1/weights:8,conv1:16' |
--method | Int32 | The method for quantization. 0: Non-overflow method. Makes sure that no values are saturated during quantization. Sensitive to outliers. 1: Min-diffs method. Allows saturation for quantization to get a lower quantization difference. Higher tolerance to outliers. Usually ends with narrower ranges than the non-overflow method. Choices: [0, 1] Default: 1 |
--nodes_method | String | Specify method of nodes, nodes name and method form a pair of parameter joined by a colon, and parameter pairs are comma separated. When specify conv op name only vai_q_tensorflow will quantize weights of conv op using specified method. e.g 'conv1/Relu:1,depthwise_conv1/weights:2,conv1:1' |
--calib_iter | Int32 | The iterations of calibration. Total number of
images for calibration = calib_iter * batch_size. Default: 100 |
--ignore_nodes | String | The name list of nodes to be ignored during quantization. Ignored nodes are left unquantized during quantization. |
--skip_check | Int32 | If set to 1, the check for float model is
skipped. Useful when only part of the input model is quantized. Choices: [0, 1] Default: 0 |
--align_concat | Int32 | The strategy for the alignment of the input
quantizeposition for concat nodes. Set to 0 to align all concat
nodes, 1 to align the output concat nodes, and 2 to disable
alignment. Choices: [0, 1, 2] Default: 0 |
--simulate_dpu | Int32 | Set to 1 to enable the simulation of the DPU. The
behavior of DPU for some operations is different from TensorFlow.
For example, the dividing in LeakyRelu and AvgPooling are replaced
by bit-shifting, so there might be a slight difference between DPU
outputs and CPU/GPU outputs. The vai_q_tensorflow quantizer
simulates the behavior for these operations if this flag is set to
1. Choices: [0, 1] Default: 1 |
--adjust_shift_bias | Int32 |
The strategy for shift bias check and adjustment for DPU compiler. Set to 0 to disable shift bias check and adjustment, 1 to enable with static constraints, 2 to enable with dynamic constraints. choices: [0, 1, 2] default: 1 |
--adjust_shift_cut | Int32 |
The strategy for shift cut check and adjustment for DPU compiler. Set to 0 to disable shift cut check and adjustment, 1 to enable with static constraints. choices: [0, 1] default: 1 |
--arch_type | String | Specify the arch type for fix neuron. 'DEFAULT' means quantization range of both weights and activations are [-128, 127]. 'DPUCADF8H' means weights quantization range is [-128, 127] while activation is [-127, 127] |
--output_dir | String | The directory in which to save the quantization
results. Default: “./quantize_results” |
--max_dump_batches | Int32 | The maximum number of batches for dumping. Default: 1 |
--dump_float | Int32 | If set to 1, the float weights and activations
will also be dumped. Choices: [0, 1] Default: 0 |
--dump_input_tensors | String | Specify input tensor name of Graph when graph entrance is not a placeholder. We will add a placeholder according to the dump_input_tensor, so that input_fn can feed data. |
Session Configurations | ||
--gpu | String | The ID of the GPU device used for quantization, comma separated. |
--gpu_memory_fraction | Float | The GPU memory fraction used for quantization,
between 0-1. Default: 0.5 |
Others | ||
--help | Show all available options of vai_q_tensorflow. | |
--version | Show vai_q_tensorflow version information. |
Examples
show help: vai_q_tensorflow --help
quantize:
vai_q_tensorflow quantize --input_frozen_graph frozen_graph.pb \
--input_nodes inputs \
--output_nodes predictions \
--input_shapes ?,224,224,3 \
--input_fn my_input_fn.calib_input
dump quantized model:
vai_q_tensorflow dump --input_frozen_graph quantize_results/quantize_eval_model.pb \
--input_fn my_input_fn.dump_input
Refer to Xilinx Model Zoo for more TensorFlow model quantization examples.