Profiling Platform I/O Port Bandwidth - 2021.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
English (United States)
Release Date
2021.2 English

The bandwidth of a platform I/O port can be defined as the average number of bytes transferred per second, which can be derived as the total number of bytes transferred divided by the time when the port is transferring or is stalled (for example, due to back pressure). The following example shows how to profile I/O port bandwidth using the event API. In the example, gr is the application graph object, plio_out is the PLIO object connecting to the graph output port, and the graph is designed to produce 256 int32 data samples in eight iterations.

event::handle handle = event::start_profiling(plio_out, event::io_total_stream_running_to_idle_cycles);
  printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
  return 1;
long long cycle_count = event::read_profiling(handle);
double bandwidth = (double)256 * sizeof(int32) / (cycle_count * 1e-9); //byte per second

In the example, after the graph is initialized, the event::start_profiling is called to configure the AI Engine to count the accumulated clock cycles between the stream running event and the stream idle event. In other words, it counts the number of cycles when the stream port is in running or in stall state. The first argument in event::start_profiling can be a PLIO or a GMIO object, in this case, it is plio_out. The second argument is event::io_profiling_option enumeration, and in this case, the enumeration is set to event::io_total_stream_running_to_idle_cycles. event::start_profiling returns a handle, which will be used later to read the counter value and to stop the profile. After the graph finishes eight iterations, you can call event::read_profiling to read the counter value by supplying the handle. After profiling is done, it is recommended to stop the performance counter by calling event::stop_profiling with the handle so the hardware resources configured to do the profile can be released for other uses. Finally, the bandwidth is derived by dividing the total number of bytes transferred (256 × sizeof(int32)) by the time spent when the stream port is active (cycle_count × 1e-9, assuming the AI Engine is running at 1 GHz).