受 TensorFlow 支持的运算符

受 TensorFlow 支持的运算符 - 3.5 简体中文

Vitis AI 用户指南 (UG1414)

Document ID

UG1414

Release Date

2023-09-28

Version

3.5 简体中文

表 1. 受 TensorFlow 支持的运算符
TensorFlow		XIR		DPU 实现
OP 类型	属性	OP 名称	属性	DPU 实现
placeholder / inputlayer*	shape	data	shape	为输入数据分配存储器。
placeholder / inputlayer*	shape	data	data_type	为输入数据分配存储器。
const		const	data	为常量数据分配存储器。
			shape
			data_type
conv2d	filter	conv2d	kernel	卷积引擎。
	strides		stride
			pad([0, 0, 0, 0])
	padding		pad_mode（SAME 或 VALID）
	dilations		dilation
conv2d*	kernel_size	conv2d	kernel
	strides		stride
	padding		pad([0, 0, 0, 0])
	dilation_rate		dilation
	use_bias
	group		group
depthwiseconv2dnative	filter	depthwise-conv2d	kernel	逐通道卷积引擎。
	strides		stride
	explicit_paddings		pad
	padding		pad_mode（SAME 或 VALID）
	dilations		dilation
conv2dbackpropinput / conv2dtranspose*	filter	transposed-conv2d	kernel	卷积引擎。
	strides		stride
			pad([0, 0, 0, 0])
	padding		pad_mode（SAME 或 VALID）
	dilations		dilation
spacetobacthnd + conv2d + batchtospacend	block_shape	conv2d	dilation	满足 AMD 设定的特定要求时，Spacetobatch、Conv2d 和 Batchtospace 将被映射到卷积引擎。
	padding		pad
	filter		kernel
	strides		stride
	padding		pad_mode(SAME)
	dilations		dilation
	block_shape
	crops
matmul / dense*	transpose_a	conv2d / matmul	transpose_a	当等效 conv2d 满足硬件要求并且可映射到 DPU 后，matmul 将立即转换为 conv2d 运算。
matmul / dense*	transpose_b	conv2d / matmul	transpose_b	当等效 conv2d 满足硬件要求并且可映射到 DPU 后，matmul 将立即转换为 conv2d 运算。
maxpool / maxpooling2d* / globalmaxpool2d*	ksize	maxpool2d	kernel	池化引擎。当原始池化运算符需全局减法时，global 属性将设为 true。
	strides		stride
			pad([0, 0, 0, 0])
	padding		pad_mode（SAME 或 VALID）
			global
avgpool / averagepooling2d* / globalavgeragepooling2d*	pool_size	avgpool2d	kernel	池化引擎。当原始池化运算符需全局减法时，global 属性将设为 true。
	strides		stride
			pad([0, 0, 0, 0])
	padding		pad_mode（SAME 或 VALID）
			count_include_pad (false)
			count_include_invalid (true)
			global
mean	axis	avgpool / reduction_mean	axis	如果等效 avgpool 满足硬件要求并且可映射到 DPU，mean 运算将被转换为 avgpool。
mean	keep_dims	avgpool / reduction_mean	keep_dims	如果等效 avgpool 满足硬件要求并且可映射到 DPU，mean 运算将被转换为 avgpool。
relu		relu		激活将被融合到相邻运算（如 convolution）。
relu6		relu6
leakyrelu	alpha	leaky_relu	alpha
fixneuron / quantizelayer*	bit_width	fix	bit_width	编译期间，它将被分为 float2fix 和 fix2float，然后 float2fix 和 fix2float 运算将与相邻运算融合为低精度运算。
	quantize_pos		fix_point
			if_signed
			round_mode
identity		identity		Identity 将被移除。
add、addv2		add		如果 add 为逐元素加法，那么 add 将被映射到 DPU 逐元素加法引擎；如果 add 为逐通道加法，AMD 会伺机将 add 与相邻运算（例如，convolution）融合。
mul		mul		只要 Mul 所含任一输入为常量，即可映射到 Depthwise-Convolution Engine（逐通道卷积引擎）。如果其两个输入为相同形状，则可将其作为逐元素乘法映射到 Misc Engine（其他引擎）。如果另有某些 mul 运算属于特殊运算符组合的一部分，则此 mul 可融合到这些组合内。否则，它将被映射到 CPU。
concatv2 / concatenate*	axis	concat	axis	AMD 通过特殊的读取或写入策略和谨慎分配片上存储器来减少源于 concat 的开销。
pad / zeropadding2d*	paddings	pad	paddings	首个编译器将尝试把“CONSTANT”填充融合到相邻运算中，例如，卷积和池化。如不存在此类运算符，那么当填充维度等于 4 且满足硬件要求时，仍可将其映射到 DPU。对于“SYMMETRIC”填充，它将映射到 DPU。但 DPU 不支持“REFLECT”填充。
	mode		mode
			constant_values
shape		shape		shape 运算将被移除。
stridedslice	begin	strided_slice	begin	如果这些运算与 shape 相关，那么在编译期间将被移除。如果属于低精度运算，则将与相邻运算融合。否则，这些运算将被编译到 CPU 实现中。
	end		end
	strides		strides
pack	axis	stack	axis
neg		neg
realdiv		div
sub		sub
prod	axis	reduction_product	axis
prod	keep_dims	reduction_product	keep_dims
sum	axis	reduction_sum	axis
sum	keep_dims	reduction_sum	keep_dims
max	axis	reduction_max	axis
max	keep_dims	reduction_max	keep_dims
resizebilinear	size	resize	size	如果 resize 的模式为“BILINEAR”，那么 align_corner=false, half_pixel_centers = false, size = 2, 4, 8; align_corner=false, half_pixel_centers = true, size = 2, 4 可转换为 DPU 实现 (pad+depthwise-transposed conv2d)。如果 resize 模式为“NEAREST”，那么 size 为整数，且 resize 将被映射到 DPU 实现。
	align_corners		align_corners
	half_pixel_centers		half_pixel_centers
			mode="BILINEAR"
resizenearestneighbor	size	resize	size
	align_corners		align_corners
	half_pixel_centers		half_pixel_centers
			mode="NEAREST"
upsample2d/upsampling2d*	size	resize	scale
			align_corners
			half_pixel_centers
	interpolation		mode
reshape	shape	reshape	shape	在某些情况下，将被转换为 reshape 运算。否则，将被映射到 CPU。
reshape*	target_shape	reshape	shape
transpose	perm	transpose	order
squeeze	axis	squeeze	axis
exp		exp		将仅被编译到 CPU 实现中。
softmax	axis	softmax	axis
sigmoid		sigmoid
square+ rsqrt+ maximum		l2_normalize	axis	output = x / sqrt(max(sum(x ^ 2), epsilon)) 将融合到 XIR 中的 l2_normalize。
square+ rsqrt+ maximum		l2_normalize	epsilon
TensorFlow 中以上列出的运算 (OP) 在 XIR 中均受支持。所有这些运算在工具链中都有 CPU 实现。含 * 的运算符表示 TensorFlow 版本高于 2.0。