XIR Based Flow for DPUv3

Xilinx Intermediate Representation (XIR) is a graph based intermediate representation of the AI algorithms which is well designed for compilation and efficient deployment of the Domain-specific Processing Unit (DPU) on the powerful FPGA platform. It is composed of Op, Tensor, Graph and Subgraph libraries. In future, the Vitis™ AI quantizer, compiler, runtime and many other tools will use the XIR to transmit data. Also, an advanced user can achieve ML+X to release more energy of FPGA by extending the XIR to support customized IP in the Vitis AI flow. Currently, the DPUv3 is enabled by the XIR based flow. This section describes the DPUv3 compiler and steps to use common VAI_C interface to create compiled xmodel from the vai_quantizer outputs.

Figure 1. XIR Bases Flow

The XIR based compiler for DPUv3 takes the quantized TensorFlow or Caffe model as the input. It will first transform the input models into the XIR format as the foundation of the following processes. Most of the variations among different frameworks are eliminated and transferred to a unified representation in XIR. Then it applies various optimizations on the graph and break up the graph into several subgraphs on the basis of whether the OP can be executed on DPU. And some more architecture awared optimizations will be applied for each subgraph. For DPU subgraph, the compiler will generate the instruction stream and attach on it. Finally the optimized graph with necessary information and instructions for VART will be serialized to a compiled xmodel file.

Steps to compile Caffe or TensorFlow models for DPUv3 with VAI_C are as same as previous DPU. It is assumed that you have successfully installed the Vitis AI package including VAI_C and compressed your model with vai_quantizer.

Caffe

For caffe, vai_q_caffe is supposed to generate a PROTOTXT(deploy.prototxt) and a MODEL(deploy.caffemodel). Make sure you specify “-keep_fixed_neuron” option for vai_q_caffe which is essential for DPUv3 compiler. Then the following command is almost everything you need to do to get the compiled xmodel.

vai_c_caffe -p /PATH/TO/deploy.prototxt -c /PATH/TO/deploy.caffemodel -a /PATH/TO/arch/dpuv3e/arch.json -o /OUTPUTPATH -n netname}

The compiler will create three files in OUTPUTPATH directory. ‘netname_org.xmodel’ is the pre-compiled xmodel which is generated by compiler frontend. ‘netname.xmodel’ is the compiled xmodel which contains instructions and other necessary information. ‘meta.json’ is for runtime.

See Model Deployment Overview for more information on deploying the network on DPU with those files.

TensorFlow

For TensorFlow, vai_q_tensorflow is supposed to generate a pb file(quantize_eval_model.pb). Notice that there are two pb files generated by vai_q_tensorflow and ‘quantize_eval_model.pb’ is the proper one for DPUv3 compiler, which is different from DPUv2. The compilation command is similar.

vai_c_tensorflow -f /PATH/TO/quantize_eval_model.pb -a /PATH/TO/arch/dpuv3e/arch.json -o /OUTPUTPATH -n netname}

And the outputs will be as same as Caffe.

Currently Supported Operators

Xilinx is continuously improving DPUv3 IP and compiler to support more operators with better performance. Now DPUv3 can support OPs defined by Caffe and TensorFlow with some limitations as below.

Table 1. Currenlty Supported Operators
Typical Layers in CNN	Parameters	DPU Support
Convolution (Caffe: Convolution) (Tensorflow: Conv2d, SeparaleConv2D…)	Kernel size	W: [1, 8], H: [1, 8]
	Strides	W: [1, 4], H: [1, 4]
	Paddings	Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1]
	In/Out Size	Arbitary
	In/Out Channels	[1, 256 * channel_parallel]
	Activation	ReLU, LeakyReLU or ReLU6
	Dilation	Dilation * input_channel <= 256 * channel_parallel && stride ==1
	Group* (Caffe)	Group==1
Deconvolution (Caffe: Deconvolution) (Tensorflow: Conv2DTranspose)	Kernel size	W: [1, 8], H: [1, 8]
	Strides	W: [1, 4], H: [1, 4]
	Paddings	Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1]
	In/Out Size	Arbitary
	In/Out Channels	[1, 256 * channel_parallel]
	Activation	ReLU, LeakyReLU or ReLU6
Max Pooling (Caffe: Pooling) (Tensorflow: MaxPool2D)	Kernel size	W: [1, 8], H: [1, 8]
	Strides	W: [1, 4], H: [1, 4]
	Paddings	Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1]
Average Pooling (Caffe: Pooling) (Tensorflow: AveragePooling2D, Mean)	Kernel size	W: [1, 8], H: [1, 8]
	Strides	W: [1, 4], H: [1, 4]
	Paddings	Left, Right: [1, kernel_w-1] Top, Bottom: [1, kernel_h-1]
Element-wise Sum (Caffe: Eltwise) (Tensorflow: Add)	Input Size	Arbitrary
	Input Channel	[1, 256 * channel_parallel]
	Activation	ReLU or LeakyReLU
Concat (Caffe: Concat) (Tensorflow: Concatenate)	Number, Axis	Arbitrary
Concat (Caffe: Concat) (Tensorflow: Concatenate)	Out Channel	[1, 256 * channel_parallel]
Reorg* (Caffe)	Strides*	stride ^ 2 * input_channel <= 256 * channel_parallel
Reorg* (Caffe)	Scale, Reverse	Arbitrary
Fully Connection (Caffe: Inner Product) (Tensorflow: Matmul, Mul)	Input Channel	Input_channel < 2048 * channel_parallel
	Output Channel	Arbitrary
Group* and Reorg* are specific parameters in Caffe. The parameter channel_parallel is determined by the DPU configuration. The channel_parallel for DPUv3 is 16. Support both VALID and SAME pad_mode for operators in Tensorflow.

Operators listed above are commonly used in CNN models, and DPU can support many configurations of these operators.

Operators below are primitively defined in different deep learning frameworks. The compiler can automatically parse these operators and distribute them to DPU or CPU. These operators are partially supported by the tools, and they are listed here for your reference.

Table 2. Operators Information
Operators	Framework	Parameters	DPU Support
Const	Tensorflow	-	Arbitrary
Shape	Tensorflow	-	Arbitrary
Identity	Tensorflow	-	Arbitrary
Batchnorm+	Caffe	-	Arbitrary
Neg*	Tensorflow	-	Partially
Mul*	Tensorflow	-	Partially
Sub*	Tensorflow	-	Partially
Gstiling*	Caffe	reverse, stride	Partially
Permute*	Caffe	order	Partially
Flatten*	Caffe/TensorFlow	start_dim, end_dim	Partially
Squeeze*	Tensorflow	dims	Partially
Reshape*	Tensorflow	shape	Partially
Stack*	Tensorflow	axis	Partially
Matmul*	Tensorflow	transpose_a, transpose_b	Partially
Strided_Slice*	Tensorflow	begin, end, strides, begin_mask, end_mask, ellipsis_mask, new_axis_mask, shrink_axis_mask	Partially
Mean*	Tensorflow	dims, keep_dims	Avgpool-like configurations
Resize*	Tensorflow	scale, align_corners, mode	scale = 2, false, NEAREST
Pad*	Tensorflow	pad, pad_mode, constant_value	“Constant”and pad with 0, “SYMMETRIC”
Resize_nearest*	Tensorflow	align_corners	False
DeephiResize*	Caffe	scale, mode	Scale = 2, NEAREST
Upsample2D**	Tensorflow	align_corners	-
Resize_bilinear**	Tensorflow	align_corners	-
Space_to_batch**	Tensorflow	block_shape, Paddings	-
Batch_to_space**	Tensorflow	block_shape, Paddings	-
Prior_box**	Caffe	-	-
Softmax**	Tensorflow	axis	-

XIR Based Flow for DPUv3 - 1.1 English

Vitis AI User Guide (UG1414)

Caffe

TensorFlow

Currently Supported Operators