Graph throughput can be defined as the average number of bytes
produced (or consumed) per second. The following example shows how to profile graph
throughput using the event API. In the example, gr
is the application graph object, plio_out
is the
PLIO object connecting to the graph output port, and the graph is designed to
produce 256 int32 data in eight iterations.
gr.init();
event::handle handle = event::start_profiling(plio_out, event::io_stream_start_to_bytes_transferred_cycles, 256*sizeof(int32));
if(handle==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
gr.run(8);
gr.wait();
long long cycle_count = event::read_profiling(handle);
event::stop_profiling(handle);
double throughput = (double)256 * sizeof(int32) / (cycle_count * 1e-9); // byte per second
In the example, after the graph is initialized, event::start_profiling
is called to configure the AI Engine to count the clock cycles from the stream
start event to the event that indicates 256 × sizeof(int32)
bytes have been transferred, assuming that the stream
stops right after the specified number of bytes are transferred. If the stream
continues after the number of bytes transferred, the counter continues and never
ends. The first argument in event::start_profiling
is plio_out
, the second argument is set to event::io_stream_start_to_bytes_transferred_cycles
, and
the third argument specifies the number of bytes to be transferred before stopping
the counter. The graph throughput is derived by dividing the total number of bytes
produced in eight iterations (256 × sizeof(int32)
)
by the time spent from the first output data to the last output data (cycle_count
× 1e-9, assuming
the AI Engine is running at 1 GHz).
graph::run
. One
way to profile a PLIO input is to assert the PL after event::start_profiling
. Otherwise, call event::read_profiling
immediately after graph::wait
because the performance counter may not hit the stop
condition and will run continuously.