Operators Supported by PyTorch

Operators Supported by PyTorch - 3.5 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2023-09-28

Version

3.5 English

Table 1. Operators Supported by PyTorch
PyTorch		XIR		DPU Implementation
API	Attributes	OP name	Attributes	DPU Implementation
Parameter/tensor/zeros	data	const	data	Allocate memory for input data.
			shape
			data_type
Conv2d	in_channels	conv2d (groups = 1) / depthwise-conv2d (groups = input channel)		If groups == input channel, the convolution would be compiled into Depthwise-Convolution Engine. If groups == 1, the convolution would be mapped to Convolution Engine. Otherwise, it would be mapped to the CPU.
	out_channels
	kernel_size		kernel
	stride		stride
	padding		pad
	padding_mode('zeros')		pad_mode (FLOOR)
	groups
	dilation		dilation
ConvTranspose2d	in_channels	transposed-conv2d (groups = 1) / depthwise-transposed-conv2d (groups = input channel)		If groups == input channel, the convolution would be compiled into Depthwise-Convolution Engine. If groups == 1, the convolution would be mapped to Convolution Engine. Otherwise, it would be mapped to the CPU. For the output_padding feature, DPU is not supported yet, so if the value is not all 0, this operator will be assigned to CPU.
	out_channels
	kernel_size		kernel
	stride		stride
	padding		pad
	output_padding		output_padding
	padding_mode('zeros')		pad_mode (FLOOR)
	groups
	dilation		dilation
matmul		conv2d / matmul	transpose_a	The matmul would be transformed to conv2d and compiled to Convolution Engine. If the matmul fails to be transformed, it would be implemented by the CPU.
matmul		conv2d / matmul	transpose_b
MaxPool2d / AdaptiveMaxPool2d	kernel_size	maxpool2d	kernel	Pooling Engine
	stride		stride
	padding		pad
	ceil_mode		pad_mode
	output_size (adaptive)		global
AvgPool2d / AdaptiveAvgPool2d	kernel_size	avgpool2d	kernel	Pooling Engine
	stride		stride
	padding		pad
	ceil_mode		pad_mode
	count_include_pad		count_include_pad
			count_include_invalid (true)
	output_size (adaptive)		global
ReLU		relu		Activations would be fused to adjacent operations such as convolution.
LeakyReLU	negative_slope	leakyrelu	alpha
ReLU6		relu6
Hardtanh	min_val = 0
Hardtanh	max_val = 6
Hardsigmoid		hard-sigmoid
Hardswish		hardswish
ConstantPad2d / ZeroPad2d	padding	pad	paddings	First compiler will try to fuse "CONSTANT" padding into adjacent operations, for example, convolution and pooling. If no such operator exists, it can still be mapped to DPU when the padding dimension equals four and meets the hardware requirements.
	value = 0		constant_values
			mode ("CONSTANT")
add		add		If the add is an element-wise add, the add would be mapped to DPU Element-wise Add Engine. If the add is a channel-wise add, search for opportunities to fuse the add with adjacent operations such as convolutions. If they are shape-related operations, they would be removed during compilation. If they are components of a coarse-grained operation, they would be fused with adjacent operations. Otherwise, they would be compiled into CPU implementations. Mul can be mapped to Depthwise-Convolution Engine if one of its inputs is constant. If its two inputs are in the same shape, it may be mapped to Misc Engine as Element-wise multiplication. For some other mul operation that is part of special operators combination, this mul can be fused into these combinations. Otherwise, it will be mapped to the CPU.
sub / rsub		sub
mul		mul
neg		neg
sum	dim	reduction_sum	axis
sum	keepdim	reduction_sum	keep_dims
max	dim	reduction_max	axis
max	keepdim	reduction_max	keep_dims
mean	dim	reduction_mean	axis
mean	keepdim	reduction_mean	keep_dims
interpolate / upsample / upsample_bilinear / upsample_nearest	size	resize	size	If the mode of the resize is 'BILINEAR', align_corner=false, half_pixel_centers = false, size = 2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4 can be transformed to DPU implementations (pad+depthwise-transposed conv2d). If the resize mode is 'NEAREST' and the size is integers, the resize would be mapped to DPU implementations.
	scale_factor
	mode		mode
	align_corners		align_corners
			half_pixel_centers = !align_corners
transpose	dim0	transpose	order	These operations would be transformed to the reshape operation in some cases. Additionally, search for opportunities to fuse the dimension transformation operations into special load or save instructions of adjacent operations to reduce the overhead. Otherwise, they would be mapped to the CPU.
transpose	dim1	transpose
permute	dims
view/reshape	size	reshape	shape
flatten	start_dim	reshape/flatten	start_axis
flatten	end_dim	reshape/flatten	end_axis
squeeze	dim	reshape / squeeze	axis
cat	dim	concat	axis	Reduce the overhead resulting from the concat by special reading or writing strategies and allocating the on-chip memory carefully.
aten::slice*	dim	strided_slice		If the strided_slice is shape-related or is the component of a coarse-grained operation, it would be removed. Otherwise, the strided_slice would be compiled into CPU implementations.
	start		begin
	end		end
	step		strides
BatchNorm2d	eps	depthwise-conv2d / scale	epsilon	If the batch_norm is quantized and can be transformed to a depthwise-conv2d equivalently, it would be transformed to depthwise-conv2d and the compiler would search for compilation opportunities to map the batch_norm into DPU implementations. Otherwise, the batch_norm would be executed by the CPU.
			axis
			moving_mean
			moving_var
			gamma
			beta
softmax	dim	softmax	axis	They would only be compiled into CPU implementations.
Tanh		tanh
Sigmoid		sigmoid
PixelShuffle	upscale_factor	pixel_shuffle	scale	They would be transformed to tile if convolution were input.
			upscale=True
PixelUnshuffle	downscale_factor	pixel_shuffle	scale
			upscale=False
If the slice of tensor in PyTorch is written in the Python syntax, it is transformed into `aten::slice`.