Operators Supported by PyTorch - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2023-09-28
Version
3.5 English
Table 1. Operators Supported by PyTorch
PyTorch XIR DPU Implementation 
API Attributes OP name Attributes
Parameter/tensor/zeros   data const   data Allocate memory for input data.  
  shape
  data_type
Conv2d        in_channels conv2d (groups = 1) / depthwise-conv2d (groups = input channel)          If groups == input channel, the convolution would be compiled into Depthwise-Convolution Engine. If groups == 1, the convolution would be mapped to Convolution Engine. Otherwise, it would be mapped to the CPU.       
out_channels  
kernel_size kernel
stride stride
padding pad
padding_mode('zeros') pad_mode (FLOOR)
groups  
dilation dilation
ConvTranspose2d        in_channels transposed-conv2d (groups = 1) / depthwise-transposed-conv2d (groups = input channel)          If groups == input channel, the convolution would be compiled into Depthwise-Convolution Engine. If groups == 1, the convolution would be mapped to Convolution Engine. Otherwise, it would be mapped to the CPU. For the output_padding feature, DPU is not supported yet, so if the value is not all 0, this operator will be assigned to CPU.
out_channels  
kernel_size kernel
stride stride
padding pad
output_padding output_padding
padding_mode('zeros') pad_mode (FLOOR)
groups  
dilation dilation
matmul    conv2d / matmul  transpose_a The matmul would be transformed to conv2d and compiled to Convolution Engine. If the matmul fails to be transformed, it would be implemented by the CPU. 
  transpose_b
MaxPool2d / AdaptiveMaxPool2d     kernel_size maxpool2d     kernel Pooling Engine    
stride stride
padding pad
ceil_mode pad_mode
output_size (adaptive) global
AvgPool2d / AdaptiveAvgPool2d       kernel_size avgpool2d        kernel Pooling Engine      
stride stride
padding pad
ceil_mode pad_mode
count_include_pad count_include_pad
  count_include_invalid (true)
output_size (adaptive) global
ReLU   relu   Activations would be fused to adjacent operations such as convolution.    
LeakyReLU negative_slope leakyrelu alpha
ReLU6   relu6    
Hardtanh  min_val = 0  
max_val = 6  
Hardsigmoid   hard-sigmoid  
Hardswish   hardswish  
ConstantPad2d / ZeroPad2d  padding pad  paddings First compiler will try to fuse "CONSTANT" padding into adjacent operations, for example, convolution and pooling. If no such operator exists, it can still be mapped to DPU when the padding dimension equals four and meets the hardware requirements.
value = 0 constant_values
  mode ("CONSTANT")
add   add   If the add is an element-wise add, the add would be mapped to DPU Element-wise Add Engine. If the add is a channel-wise add, search for opportunities to fuse the add with adjacent operations such as convolutions. If they are shape-related operations, they would be removed during compilation. If they are components of a coarse-grained operation, they would be fused with adjacent operations. Otherwise, they would be compiled into CPU implementations. Mul can be mapped to Depthwise-Convolution Engine if one of its inputs is constant. If its two inputs are in the same shape, it may be mapped to Misc Engine as Element-wise multiplication. For some other mul operation that is part of special operators combination, this mul can be fused into these combinations. Otherwise, it will be mapped to the CPU.      
sub / rsub   sub  
mul   mul  
neg   neg  
sum dim reduction_sum axis
keepdim keep_dims
max  dim reduction_max  axis
keepdim keep_dims
mean  dim reduction_mean  axis
keepdim keep_dims
interpolate / upsample / upsample_bilinear / upsample_nearest     size resize     size If the mode of the resize is 'BILINEAR', align_corner=false, half_pixel_centers = false, size = 2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4 can be transformed to DPU implementations (pad+depthwise-transposed conv2d). If the resize mode is 'NEAREST' and the size is integers, the resize would be mapped to DPU implementations.    
scale_factor  
mode mode
align_corners align_corners
  half_pixel_centers = !align_corners
transpose  dim0 transpose  order These operations would be transformed to the reshape operation in some cases. Additionally, search for opportunities to fuse the dimension transformation operations into special load or save instructions of adjacent operations to reduce the overhead. Otherwise, they would be mapped to the CPU.      
dim1  
permute dims    
view/reshape size reshape shape
flatten  start_dim reshape/flatten  start_axis
end_dim end_axis
squeeze dim reshape / squeeze axis
cat dim concat axis Reduce the overhead resulting from the concat by special reading or writing strategies and allocating the on-chip memory carefully.
aten::slice*    dim strided_slice   If the strided_slice is shape-related or is the component of a coarse-grained operation, it would be removed. Otherwise, the strided_slice would be compiled into CPU implementations.   
start begin
end end
step strides
BatchNorm2d      eps depthwise-conv2d / scale      epsilon If the batch_norm is quantized and can be transformed to a depthwise-conv2d equivalently, it would be transformed to depthwise-conv2d and the compiler would search for compilation opportunities to map the batch_norm into DPU implementations. Otherwise, the batch_norm would be executed by the CPU.
  axis
  moving_mean
  moving_var
  gamma
  beta
softmax dim softmax axis They would only be compiled into CPU implementations. 
Tanh   tanh  
Sigmoid   sigmoid  
PixelShuffle upscale_factor pixel_shuffle scale They would be transformed to tile if convolution were input.
      upscale=True
PixelUnshuffle downscale_factor pixel_shuffle scale
      upscale=False
  1. If the slice of tensor in PyTorch is written in the Python syntax, it is transformed into aten::slice.