Quantization API
def quantize(
input_frozen_graph = "",
input_nodes = "",
input_shapes = "",
output_nodes = "",
input_fn = "",
method = 1,
calib_iter = 100,
output_dir = "./quantize_results",
**kargs)
This function will invoke
vai_q_tensorflow
command tool in WeGO TensorFlow r1.15 and converts the input floating-point model to
fixed-point model for DPU deployment acceleration. To be fully compatible with
native vai_q_tensorflow
quantizer, all parameters
received from this API will be forwarded to vai_q_tensorflow
command tool directly. This function will return a
quantized GraphDef
object or None
on failure. Note: Only PTQ is supported now for on-the-fly quantization in WeGO.
For more information on fast fine-tuning and QAT quantization, see vai_q_tensorflow Quantization Aware Training.
Parameters
- input_frozen_graph
- string: path to input frozen graph(.pb) (default: )
- input_nodes
- string: The comma-separated name list of input nodes of the subgraph to be quantized. Used together with output_nodes. When generating the model for deploy, only the subgraph between input_nodes and output_nodes will be included. Please set it to the beginning of the main body of the model to quantize, such as the nodes after data pre-processing and augmentation. (default: )
- input_shapes
- string: the comma-separated shape list of input_nodes. The shape must be
a 4-dimension shape for each node, comma separated, for example,
1,224,224,3
; Unknown size for batch size is supported, for example,?,224,224,3
; In case of multiple input_nodes, please assign the shape list of each node, separated by:
, for example,?,224,224,3:?,300,300,1
(default: ) - output_nodes
- string: the comma-speareted name list of output nodes of the subgraph to be quantized that is used together with input_nodes. When generating the model for deployment, only the subgraph between input_nodes and output_nodes will be included. Set it to the end of the main body of the model to quantize, such as the nodes before post-processing. (default: )
- input_fn
- string: the python importable function that provides the input data. The
format is
module_name.input_fn_name
, for example,my_input_fn.input_fn
. Theinput_fn
should take aint
object as input indicating the calibration step, and should return a dict (placeholder_node_name : numpy.Array
) object for each call, which will be fed into the model's placeholder nodes. (default: ) - method
- int32: {0,1,2}, default: 1. The method for quantization, options are:
- 0: non-overflow method. Ensure no values are saturated during quantization. It may get bad results in case of outliers.
- 1: min-diffs method. It allows saturation for large values during quantization to get smaller quantization errors. This method is slower than method 0 but has higher endurance to outliers.
- 2: min-diffs method with strategy for depthwise. It allows saturation for large values during quantization to get smaller quantization errors. Apply special strategy for depthwise weights, but implement method 1 to normal weights and activation. This method is slower than method 0 but has higher endurance to outliers.
- calib_iter
- int32: the iterations of calibration. The total number of images for calibration = calib_iter * batch_size (default: 100)
- output_dir
- string: the directory to save the quantization results (default: ./quantize_results).
Note: For more information on the on-the-fly quantization
examples for WeGO TensorFlow 1.x, see examples.