Profiling Graph Latency - 2022.1 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2022-05-25
Version
2022.1 English

The event::io_stream_start_difference_cycles enumeration can be used to measure the latency between two PLIO or GMIO ports. After event::start_profiling() API, two performance counters starts incrementing each cycle, waiting two independent nets to receive their first data. After the first data passes either net, the corresponding performance counter will stop. The value read back by event::read_profiling() is the number difference between the two performance counters.

After event::stop_profiling(), the performance counter is cleared and released.

Profile Graph Latency

Graph latency can be defined as the time spent from receiving the first input data to producing the first output data. The following example shows how to profile graph latency using the event API.

Note: event::start_profiling() has two different PLIO parameters.
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);
gr_pl.run(iterations);
event::handle handle = event::start_profiling(gr_pl.in, gr_pl.dataout, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
gr_pl.wait();//make sure both ports have stopped
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

where, graph.run() is called before event::start_profiling() to avoid any overhead that graph.run() may introduce in profiling graph latency.

Profile Latency Difference Between Two Ports

This method is not limited to profile latency between input port and output port of the same graph. It can be used to profile latency between any two ports. For example, it can profile latency between two output ports that have a common input port. Example code is as follows:
gr_pl.run(iterations);
event::handle handle = event::start_profiling(gr_pl.dataout, gr_pl.dataout2, event::event::io_stream_start_difference_cycles);
if(handle==event::invalid_handle){
    printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
    return 1;
} 
auto mm2s_run = mm2s(nullptr, OUTPUT_SIZE_MM2S);
gr_pl.wait();//make sure both ports have stopped
long long cycle_count = event::read_profiling(handle);
printf("Latency cycles=: %d\n", cycle_count);
event::stop_profiling(handle);//Performance counter is released and cleared

where, a positive number indicates that the data arrives gr_pl.dataout2 later than gr_pl.dataout, while a negative number indicates that data arrives gr_pl.dataout2 earlier than gr_pl.dataout.