From the trace information in the run_summary in the Vitis Analyzer, navigate to the output port for which you want to calculate the throughput (
upscale
kernel in this case). Add a marker at the start of the first output sample as highlighted as follows. Then click theGo to last time
icon, and observe the cursor moves to the end of the last iteration. Now, click the previous transition icon to go the start of the last iteration. Add one more marker at the end, and observe the time difference as2282.320 ns
.The number of bytes transferred is
128 samples * 4 Bytes * 7 iterations
=3584
bytes.Throughput = 3584/2282 * e-9 ~= 1.5 GBPS.
Theoratically, AI Engine can transfer four bytes per cycle (in this case, 0.8 ns). So, to transfer
3584
bytes of data, it requires 896 cycles (896 * 0.8 = 716 ns). So, the theoratical througput is 5 GBPS.