Profiling Graph Throughput

Profiling Graph Throughput - 2021.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID

UG1076

Release Date

2021-12-17

Version

2021.2 English

Graph throughput can be defined as the average number of bytes produced (or consumed) per second. The following example shows how to profile graph throughput using the event API. In the example, gr is the application graph object, plio_out is the PLIO object connecting to the graph output port, and the graph is designed to produce 256 int32 data in eight iterations.

gr.init();
event::handle handle = event::start_profiling(plio_out, event::io_stream_start_to_bytes_transferred_cycles, 256*sizeof(int32));
if(handle==event::invalid_handle){
  printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
  return 1;
}
gr.run(8);
gr.wait();
long long cycle_count = event::read_profiling(handle);
event::stop_profiling(handle);
double throughput = (double)256 * sizeof(int32) / (cycle_count * 1e-9); // byte per second

In the example, after the graph is initialized, event::start_profiling is called to configure the AI Engine to count the clock cycles from the stream start event to the event that indicates 256 × sizeof(int32) bytes have been transferred, assuming that the stream stops right after the specified number of bytes are transferred. If the stream continues after the number of bytes transferred, the counter continues and never ends. The first argument in event::start_profiling is plio_out, the second argument is set to event::io_stream_start_to_bytes_transferred_cycles, and the third argument specifies the number of bytes to be transferred before stopping the counter. The graph throughput is derived by dividing the total number of bytes produced in eight iterations (256 × sizeof(int32)) by the time spent from the first output data to the last output data (cycle_count × 1e^-9, assuming the AI Engine is running at 1 GHz).

Note: The DMA for the input window from PL is active after configuration. When the PL output is asserted, the input window can start receiving data even before graph::run. One way to profile a PLIO input is to assert the PL after event::start_profiling. Otherwise, call event::read_profiling immediately after graph::wait because the performance counter may not hit the stop condition and will run continuously.