Parameter/tensor/zeros |
data |
const |
data |
Allocate memory for input
data. |
|
shape |
|
data_type |
Conv2d |
in_channels |
conv2d (groups = 1) /
depthwise-conv2d (groups = input channel) |
|
If groups == input channel, the
convolution would be compiled into Depthwise-Convolution Engine. If
groups == 1, the convolution would be mapped to Convolution Engine.
Otherwise, it would be mapped to the CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
ConvTranspose2d |
in_channels |
transposed-conv2d (groups = 1) /
depthwise-transposed-conv2d (groups = input channel) |
|
If groups == input channel, the convolution would be
compiled into Depthwise-Convolution Engine. If groups == 1, the
convolution would be mapped to Convolution Engine. Otherwise, it
would be mapped to the CPU. For the output_padding feature, DPU is
not supported yet, so, if the value is not all 0, this operator will
be assigned to CPU. |
out_channels |
|
kernel_size |
kernel |
stride |
stride |
padding |
pad |
output_padding |
output_padding |
padding_mode('zeros') |
pad_mode (FLOOR) |
groups |
|
dilation |
dilation |
matmul |
|
conv2d / matmul |
transpose_a |
The matmul would be transformed to
conv2d and compiled to Convolution Engine. If the matmul fails to be
transformed, it would be implemented by CPU. |
|
transpose_b |
MaxPool2d /
AdaptiveMaxPool2d |
kernel_size |
maxpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
output_size (adaptive) |
global |
AvgPool2d /
AdaptiveAvgPool2d |
kernel_size |
avgpool2d |
kernel |
Pooling Engine |
stride |
stride |
padding |
pad |
ceil_mode |
pad_mode |
count_include_pad |
count_include_pad |
|
count_include_invalid (true) |
output_size (adaptive) |
global |
ReLU |
|
relu |
|
Activations would be fused to adjacent operations such as
convolution. |
LeakyReLU |
negative_slope |
leakyrelu |
alpha |
ReLU6 |
|
relu6 |
|
Hardtanh |
min_val = 0 |
|
max_val = 6 |
|
Hardsigmoid |
|
hard-sigmoid |
|
Hardswish |
|
hardswish |
|
ConstantPad2d / ZeroPad2d |
padding |
pad |
paddings |
First compiler will try to fuse "CONSTANT" padding into
adjacent operations, e.g. convolution and pooling. If there is no
such operator, it still can be mapped to DPU when padding dimension
equals 4 and meets the hardware requirements. |
value = 0 |
constant_values |
|
mode ("CONSTANT") |
add |
|
add |
|
If the add is an element-wise add, the add would be
mapped to DPU Element-wise Add Engine. If the add is a channel-wise
add, search for opportunities to fuse the add with adjacent
operations such as convolutions. If they are shape-related
operations, they would be removed during compilation. If they are
components of a coarse-grained operation, they would be fused with
adjacent operations. Otherwise, they would be compiled into CPU
implementations. Mul can be mapped to Depthwise-Convolution Engine
if one of its input is constant. If its two inputs are in same
shape, it may be mapped to Misc Engine as Element-wise
multiplication. For some other mul operation that is a part of
special operators combination, then this mul can be fused into these
combination. Otherwise it will be mapped to CPU. |
sub / rsub |
|
sub |
|
mul |
|
mul |
|
neg |
|
neg |
|
sum |
dim |
reduction_sum |
axis |
keepdim |
keep_dims |
max |
dim |
reduction_max |
axis |
keepdim |
keep_dims |
mean |
dim |
reduction_mean |
axis |
keepdim |
keep_dims |
interpolate / upsample /
upsample_bilinear / upsample_nearest |
size |
resize |
size |
If the mode of the resize is
'BILINEAR', align_corner=false, half_pixel_centers = false, size =
2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4
can be transformed to DPU implementations (pad+depthwise-transposed
conv2d). If the mode of the resize is 'NEAREST' and the size are
integers, the resize would be mapped to DPU
implementations. |
scale_factor |
|
mode |
mode |
align_corners |
align_corners |
|
half_pixel_centers = !align_corners |
transpose |
dim0 |
transpose |
order |
These operations would be transformed to the reshape
operation in some cases. Additionally, search for opportunities to
fuse the dimension transformation operations into special load or
save instructions of adjacent operations to reduce the overhead.
Otherwise, they would be mapped to CPU. |
dim1 |
|
permute |
dims |
|
|
view/reshape |
size |
reshape |
shape |
flatten |
start_dim |
reshape / flatten |
start_axis |
end_dim |
end_axis |
squeeze |
dim |
reshape / squeeze |
axis |
cat |
dim |
concat |
axis |
Reduce the overhead resulting from the concat by
special reading or writing strategies and allocating the on-chip
memory carefully. |
aten::slice* |
dim |
strided_slice |
|
If the strided_slice is
shape-related or is the component of a coarse-grained operation, it
would be removed. Otherwise, the strided_slice would be compiled
into CPU implementations. |
start |
begin |
end |
end |
step |
strides |
BatchNorm2d |
eps |
depthwise-conv2d /
scale |
epsilon |
If the batch_norm is quantized and
can be transformed to a depthwise-conv2d equivalently, it would be
transformed to depthwise-conv2d and the compiler would search for
compilation opportunities to map the batch_norm into DPU
implementations. Otherwise, the batch_norm would be executed by
CPU. |
|
axis |
|
moving_mean |
|
moving_var |
|
gamma |
|
beta |
softmax |
dim |
softmax |
axis |
They would only be compiled into
CPU implementations. |
Tanh |
|
tanh |
|
Sigmoid |
|
sigmoid |
|
PixelShuffle |
upscale_factor |
pixel_shuffle |
scale |
They would be transformed to tile if there's convolution
as its input. |
|
|
|
upscale=True |
PixelUnshuffle |
downscale_factor |
pixel_shuffle |
scale |
|
|
|
upscale=False |
- If the slice of tensor in PyTorch is
written in the Python syntax, it is transformed into
aten::slice .
|