The code to use C++ class API is common for Linux system for various platforms. The Timer
is defined as follows:
class Timer {
std::chrono::high_resolution_clock::time_point mTimeStart;
public:
Timer() { reset(); }
long long stop() {
std::chrono::high_resolution_clock::time_point timeEnd = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(timeEnd - mTimeStart).count();
}
void reset() { mTimeStart = std::chrono::high_resolution_clock::now(); }
};
The code to start profiling is as follows:
Timer timer;
The code to end profiling and calculate performance is as follows:
double timer_stop=timer.stop();
double throughput=(BLOCK_SIZE_in_Bytes+BLOCK_SIZE_out_Bytes)*NUM/timer_stop;
std::cout<<"Throughput (by timer GMIO in num="<<num<<",out num="<<num<<"):\t"<<throughput<<"M Bytes/s"<<std::endl;
The code is guarded by macro __TIMER__
. To use this method of profiling, define __TIMER__
for g++ cross compiler in sw/Makefile
:
CXXFLAGS += -std=c++17 -D__TIMER__ ......
To run it in hardware, use the following make command to build the hardware image:
make package TARGET=hw
After the package is done, run the following commands in the Linux prompt after booting Linux from an SD card (use petalinux/petalinux
to login):
cd /run/media/mmcblk0p1
./host.exe a.xclbin
The output in hardware is similar as follows:
Throughput (by timer GMIO in num=4,out num=4):9882.79M Bytes/s