Profiling Graph Throughput - 2020.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2020-11-24
Version
2020.2 English

Graph throughput can be defined as the average number of bytes produced (or consumed) per second. The following example shows how to profile graph throughput using the event API. In the example, gr is the application graph object, plio_out is the PLIO object connecting to the graph output port, and the graph is designed to produce 256 int32 data in eight iterations.

gr.init();
event::handle handle = event::start_profiling(plio_out, event::io_stream_start_to_bytes_transferred_cycles, 256*sizeof(int32));
gr.run(8);
gr.wait();
long long cycle_count = event::read_profiling(handle);
event::stop_profiling(handle);
double throughput = (double)256 * sizeof(int32) / (cycle_count * 1e-9); // byte per second

In the example, after the graph is initialized, event::start_profiling is called to configure the AI Engine to count the clock cycles from the stream start event to the event that indicates 256 × sizeof(int32) bytes have been transferred, assuming that the stream stops right after the specified number of bytes are transferred. If the stream continues after the number of bytes transferred, the counter will continue and never end. The first argument in event::start_profiling is plio_out, the second argument is set to event::io_stream_start_to_bytes_transferred_cycles, and the third argument specifies the number of bytes to be transferred before stopping the counter. The graph throughput is derived by dividing the total number of bytes produced in eight iterations (256 × sizeof(int32)) by the time spent from the first output data to the last output data (cycle_count × 1e-9, assuming the AI Engine is running at 1 GHz).