グラフのレイテンシのプロファイリング

グラフのレイテンシのプロファイリング - 2023.2 日本語

AI エンジンツールおよびフローユーザーガイド (UG1076)

Document ID

UG1076

Release Date

2023-12-04

Version

2023.2 日本語

event::io_stream_start_difference_cycles 列挙を使用して、2 つの PLIO または GMIO ポート間のレイテンシを計測できます。event::start_profiling() API を実行後、2 つのパフォーマンスカウンターが各サイクルでインクリメントし始め、2 つの独立したネットが最初のデータを受信するのを待機します。最初のデータがいずれかのネットを通過すると、対応するパフォーマンスカウンターが停止します。event::read_profiling() でリードバックされた値は、2 つのパフォーマンスカウンターの値の差です。

event::stop_profiling() の実行後、パフォーマンスカウンターがクリアされ、解放されます。

グラフのレイテンシのプロファイリング

グラフのレイテンシは、最初のデータを受信してから最初の出力データを生成するまでにかかる時間です。これは、グラフの実行反復回数に依存しません。次の例に、AI エンジンのシミュレーションフローとハードウェア/ハードウェアエミュレーションフローにおいて、イベント API を使用してグラフのレイテンシをプロファイリングする例を示します。

注記: event::start_profiling() には、2 つの異なる PLIO パラメーターがあります。

AI エンジンのシミュレーション:

event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations); //Data transfer starts after graph.run()
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

ハードウェアおよびハードウェアエミュレーションのフロー:

auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);
event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S); //input data transfer starts
s2mm_run.wait();//make sure both ports have data transferred
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

注記: PL カーネル (mm2s) が開始されるとすぐに入力データの転送が開始されます。プロファイリンググラフのレイテンシに graph.run() がもたらすオーバーヘッドを回避するため、プロファイリングコードでは、PL カーネル mm2s を event::start_profiling の後に、そして graph.run() の後に開始するようにします。

2 つのポート間のレイテンシ差異のプロファイル

この方法は、同じグラフの入力ポートと出力ポート間のレイテンシのプロファイルに限定されません。2 つのポート間のレイテンシのプロファイルに使用できます。たとえば、共通の入力ポートを持つ 2 つの出力ポート間のレイテンシをプロファイルできます。

AI Engine Simulation

次に、このシミュレーションフローにおけるコード例を示します。

event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations);
gr_pl.wait();
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

Hardware Emulation and Hardware

次に、これらのフローにおけるコード例を示します。


auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations);
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
s2mm_run.wait();//make sure both ports have data transferred
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

ここで、正の数はデータが gr_pl.dataout より後に gr_pl.dataout2 に到着することを示し、負の数はデータが gr_pl.dataout より前に gr_pl.dataout2 に到着することを示します。