If a deadlock does not show in the AI Engine simulator or hardware emulation flows, it might still show in the hardware flow.
The PS code to profile how much data has been transferred for the input and output is shown below:
```
event::handle handle = event::start_profiling(*dout, event::io_stream_running_event_count);
event::handle handle2 = event::start_profiling(*din, event::io_stream_running_event_count);
if(handle==event::invalid_handle || handle2==event::invalid_handle){
printf("ERROR:Invalid handle. Only two performance counter in a AIE-PL interface tile\n");
return 1;
}
//kernel run
auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//1st run for s2mm has started
auto mm2s_run = mm2s(in_bo, nullptr, OUTPUT_SIZE);
gr.run(4);
// Wait graph for some cycles
gr.wait(50000); // wait for AIE kernel to complete or at most 50000 cycles
long long data_out_count = event::read_profiling(handle);
long long data_in_count = event::read_profiling(handle2);
event::stop_profiling(handle);
event::stop_profiling(handle2);
std::cout<<"Output data received:"<<data_out_count<<std::endl;
std::cout<<"Input data sent:"<<data_in_count<<std::endl;
```
Note: mm2s
needs to be started after event::start_profiling
. Otherwise, the data transfer begins after mm2s
starts, and that happens before event::start_profiling
and gr.run(4)
.
The output is similar:
```
Output data received:0
Input data sent:104
```
From how much data has been transferred for the input and output, the status of the design can be estimated. The graph.wait(50000)
in the above code can be replaced with sleep
or usleep
APIs to wait a certain amount of time depending on the scale of the design.
If necessary, an Integrated Logic Analyzer (ILA) can be inserted to probe the interfaces of the PL kernels to detect the AI Engine and PL kernels’ running status.
Refer to AI Engine Status Analysis for how to use Vitis Analyzer to understand the AI Engine status in both hardware and hardware emulation.