複数レーンの乗算 - sliding_mul

複数レーンの乗算 - sliding_mul - 2023.2 日本語

AI エンジンカーネルおよびグラフプログラミングガイド (UG1079)

Document ID

UG1079

Release Date

2023-12-04

Version

2023.2 日本語

AI エンジンには、スライディング乗算と呼ばれる複数レーン乗算を高速化するハードウェアサポートがあります。これにより、複数のレーンで MAC 演算を同時に実行し、結果がアキュムレータに追加することが可能になります。これは、有限インパルス応答 (FIR) フィルターのインプリメンテーションで特に有益です。

これらの特殊な乗算構造体または API には、aie::sliding_mul* という名前が付けられています。係数とデータを入力します。aie::sliding_mul_sym* の一部のバリアントでは、乗算の前にデータ入力を対称的に事前追加できます。次のクラスがあります。

aie::sliding_mul_ops
aie::sliding_mul_x_ops
aie::sliding_mul_y_ops
aie::sliding_mul_xy_ops
aie::sliding_mul_sym_ops
aie::sliding_mul_sym_x_ops
aie::sliding_mul_sym_y_ops
aie::sliding_mul_sym_xy_ops
aie::sliding_mul_sym_uct_ops

これらの API およびサポートされるパラメーターの詳細は、『AI エンジン API ユーザーガイド』 (UG1529) を参照してください。

たとえば aie::sliding_mul_ops クラスは、次の計算パターンをインプリメントするパラメーター指定乗算に使用します。

DSX = DataStepX
DSY = DataStepY
CS = CoeffStep
P = Points
L = Lanes
c_s = coeff_start
d_s = data_start 
out[0] = coeff[c_s] * data[d_s + 0] + coeff[c_s + CS] * data[d_s + DSX] + ... + coeff[c_s + (P-1) * CS] * data[d_s + (P-1) * DSX]
out[1] = coeff[c_s] * data[d_s + DSY] + coeff[c_s + CS] * data[d_s + DSY + DSX] + ... + coeff[c_s + (P-1) * CS] * data[d_s + DSY + (P-1) * DSX]
...
out[L-1] = coeff[c_s] * data[d_s + (L-1) * DSY] + coeff[c_s + CS] * data[d_s + (L-1) * DSY + DSX] + ... + coeff[c_s + (P-1) * CS] * data[d_s + (L-1) * DSY + (P-1) * DSX]

表 1. テンプレートパラメーター
パラメーター	説明
Lanes	出力要素数。
Points	各レーンの計算に使用されるデータ要素数。
CoeffStep	係数レジスタから要素を選択するステップ。このステップは、レーン内の要素選択に適用されます。
DataStepX	データレジスタから要素を選択するステップ。このステップは、レーン内の要素選択に適用されます。
DataStepY	データレジスタから要素を選択するステップ。このステップは、すべてのレーンでの要素選択に適用されます。
CoeffType	係数要素のタイプ。
DataType	データ要素のタイプ。
AccumTag	必要な累積ビットを指定するアキュムレータタグ。このクラスは、係数とデータ型 (実数/複素数) の乗算結果と互換性がある必要があります。

次の図に、aie::sliding_mul_ops クラスとそのメンバー関数 (mul) を使用してスライディング乗算を実行する方法を示します。各パラメーターがどのように乗算に対応するかも示します。

図 1. sliding_mul_ops の使用例

AI エンジン API には、aie::sliding_mul* クラスに加え、スライディング乗算を実行する aie::sliding_mul* 関数と、スライディング積和演算を実行する aie::sliding_mac* 関数があります。これらの関数は単にヘルパーであり、内部で aie::sliding_mul*_ops クラスが使用されており、利便性のために提供されています。その一部を次に示します。

aie::sliding_mul
aie::sliding_mac
aie::sliding_mul_sym
aie::sliding_mac_sym
aie::sliding_mul_antisym
aie::sliding_mac_antisym
aie::sliding_mul_sym_uct
aie::sliding_mac_sym_uct
aie::sliding_mul_antisym_uct
aie::sliding_mac_antisym_uct

次に、非対称スライディング乗算の例を示します。テンプレートプロトタイプは、簡単に参照できるようにコメントに含まれています。

/*template<unsigned Lanes, unsigned Points, int CoeffStep, 
  int DataStepX, int DataStepY, ElemBaseType CoeffType, 
  ElemBaseType DataType, 
  AccumElemBaseType AccumTag = detail::default_accum_tag_t<CoeffType, DataType>>
 */

/*struct aie::sliding_mul_ops< Lanes, Points, 
  CoeffStep, DataStepX, DataStepY, 
  CoeffType, DataType, AccumTag >
 */

// template<VectorOrOp VecCoeff, VectorOrOp VecData>

/*static constexpr accum_type mul (const VecCoeff &coeff, unsigned coeff_start, 
  const VecData &data, unsigned data_start)
*/

aie::vector<int16,16> va;
aie::vector<int16,64> vb0,vb1;
aie::accum<acc48,8>  acc = aie::sliding_mul_ops<8, 8, 1, 1, 1, int16, int16, acc48>::mul(va, 0, vb0, 0);
acc = aie::sliding_mul_ops<8, 8, 1, 1, 1, int16, int16, acc48>::mac(acc, va, 8, vb1, 0);
*outIter++=acc.to_vector<int32>(15);

/*template<unsigned Lanes, unsigned Points, int CoeffStep = 1, 
  int DataStepX = 1, int DataStepY = DataStepX, 
  AccumElemBaseType AccumTag = accauto, 
  VectorOrOp VecCoeff = void, VectorOrOp VecData = void>
 */

/*auto sliding_mul (const VecCoeff &coeff, unsigned coeff_start, 
  const VecData &data, unsigned data_start)
 */

aie::vector<int32,32> data_buff;
aie::vector<int32,8> coeff_buff;
aie::accum<acc80,8> acc_buff = aie::sliding_mul<8, 8>(coeff_buff, 0, data_buff, 0);

次に、対称スライディング乗算の例を示します。

/*template<unsigned Lanes, unsigned Points, 
  int CoeffStep, int DataStepX, int DataStepY, 
  ElemBaseType CoeffType, ElemBaseType DataType, 
  AccumElemBaseType AccumTag = detail::default_accum_tag_t<CoeffType, DataType>>
 */

/*struct aie::sliding_mul_sym_ops<Lanes, Points, CoeffStep, 
  DataStepX, DataStepY, CoeffType, DataType, AccumTag > 
 */

// template<VectorOrOp VecCoeff, VectorOrOp VecData>

/*static constexpr accum_type mul_sym (const VecCoeff &coeff, 
  unsigned coeff_start, const VecData &data, unsigned data_start)
 */

aie::vector<cint16,16> data_buff;
aie::vector<int16,8> coeff_buff;
auto acc_buff = aie::sliding_mul_sym_ops<4, 16, 1, 1, 1, int16, cint16, cacc48>::mul_sym(coeff_buff, 0, data_buff, 0);

/*template<unsigned Lanes, unsigned Points, int CoeffStep = 1, 
  int DataStepX = 1, int DataStepY = DataStepX,
  AccumElemBaseType AccumTag = accauto, VectorOrOp VecCoeff = void, 
  VectorOrOp VecData = void>
 */

/*auto sliding_mul_sym (const VecCoeff &coeff, unsigned coeff_start, 
  const VecData &data, unsigned data_start)
 */

auto acc_buff2 = aie::sliding_mul_sym<4, 16, 1, 1, 1>(coeff_buff, 0, data_buff, 0);

/*template<unsigned Lanes, unsigned Points, int CoeffStep = 1, 
  int DataStepX = 1, int DataStepY = DataStepX, 
  AccumElemBaseType AccumTag = accauto, 
  VectorOrOp VecCoeff = void, VectorOrOp VecData = void>
 */

/*auto sliding_mul_sym (const VecCoeff &coeff, unsigned coeff_start, 
  const VecData &ldata, unsigned ldata_start, const VecData &rdata, unsigned rdata_start)
 */

aie::vector<cint16,16> ldata,rdata;
aie::vector<int16,8> coeff;

// symmetric sliding_mul using two data registers
auto acc = aie::sliding_mul_sym<4, 8, 1, 1, 1>(coeff, 0, ldata, 0, rdata, 8);

注記: スライディング乗算のすべてのレジスタは、循環バッファーと考える必要があります。末尾に到達すると先頭に戻ります。

次の図に、上記の対称スライディング乗算の例がどのように計算されるかを示します。

図 2. sliding_mul_sym_ops の使用例

図 3. 2 つのデータレジスタを使用する sliding_mul_sym 関数

sliding_mul を使用する際の注意事項

次のような制限があります。

データ幅 <= 1024 ビット、および係数幅 <= 256 ビット
レーン * ポイント >= そのタイプのサイクルあたりの MAC 数
int8 は、sliding_mul_sym_ops および sliding_mul_sym_uct_ops ではサポートされていません。