Profiling Graph Bandwidth - 2022.1 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2022-05-25
Version
2022.1 English

The event::io_total_stream_running_to_idle_cycles enumeration can be used to accumulate the running and stall events happened on the profiled AI Engine - PL interface, which means that it will counter how many cycles that has data passing and how many cycles the interface is stalled. But it will ignore the idle state.

After event::start_profiling(), the performance counter will wait for running data to start, and it will be paused if the stream is idle. After the performance is paused, it will resume if there's new data coming. After event::stop_profiling(), the performance counter will be cleared and released.

Profile Graph Bandwidth Using the Input Port

The bandwidth of the graph can be defined as a percentage of the time that the graph can accept data.

An example code to measure the graph bandwidth via graph input port is as follows:
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.in, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
gr_pl.run(iterations);
gr_pl.wait(); 
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count; 
event::stop_profiling(handle);//Performance counter is released and cleared

where, the total running cycles can be calculated from how many bytes are transferred. With four bytes a cycle, the total running cycles are WINDOW_SIZE_in_bytes*iterations/4. The total running and stalled cycles are read from the performance counter by event::read_profiling().

If the profiled bandwidth is 1, it means that the graph is running faster than the PL kernelmm2s, the input port has not been stalled.

If the profiled bandwidth is less than 1, it means that PL kernel mm2s can send data faster than the graph or PL kernel s2mm can receive. You might need to evaluate if the bandwidth drop is caused by the graph or PL kernel s2mm.

Profile Graph Bandwidth Using the Output Port

The bandwidth of the graph can be defined as percentage of the time that the graph can send data. If the profiled bandwidth is 1, it means that the graph is not blocked by PL kernel s2mm. If the profiled bandwidth is less than 1, it means that the graph is blocked in some percentage by s2mm due to back pressure. An example code to profile graph bandwidth via graph output port is as follows:
const int WINDOW_SIZE_in_bytes=8192;
int iterations=999;
event::handle handle = event::start_profiling(gr_pl.dataout, event::io_total_stream_running_to_idle_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
gr_pl.run(iterations);
gr_pl.wait(); 
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double) (WINDOW_SIZE_in_bytes*iterations/4) / cycle_count; 
event::stop_profiling(handle);//Performance counter is released and cleared