Profiling using the Event API - 2023.2 English

Vitis Tutorials: AI Engine

Document ID
XD100
Release Date
2023-11-29
Version
2023.2 English

The AI Engine has hardware performance counters that can be configured to count hardware events for measuring performance metrics. The API used in this example can be used to profile graph throughput for specific GMIO ports. There may be conflict when multiple GMIO ports are used for the event API because of the restriction that performance counter is shared between GMIO ports that access the same AI Engine-PL interface column. Thus, all GMIO ports are constrained to different columns to avoid such conflicts.

The code to start profiling is as follows:

   event::handle handle[NUM];
   for(int i=0;i<NUM;i++){
       handle[i] = event::start_profiling(gr.gmioOut[i], event::io_stream_start_to_bytes_transferred_cycles, BLOCK_SIZE_out_Bytes);
    }

The code to end profiling and calculate performance is as follows:

	long long cycle_count[NUM];
	long long total_cycle_count=0;
		for(int i=0;i<NUM;i++){
			cycle_count[i] = event::read_profiling(handle[i]);
			event::stop_profiling(handle[i]);
			if(cycle_count[i]>total_cycle_count){
				total_cycle_count=cycle_count[i];
			}
		}
		double bandwidth = (double)(BLOCK_SIZE_in_Bytes+BLOCK_SIZE_out_Bytes)*NUM / ((double)total_cycle_count*0.8) *1000; //byte per second
		std::cout<<"Throughput (by event API) bandwidth="<<bandwidth<<"M Bytes/s"<<std::endl;

In this example, event::start_profiling is called to configure the AI Engine to count the clock cycles from the stream start event to the event that indicates BLOCK_SIZE_out_Bytes bytes have been transferred, assuming that the stream stops right after the specified number of bytes are transferred.

For detailed usage about event API, refer to the Versal Adaptive SoC AI Engine Programming Environment User Guide (UG1076).

The code is guarded by macro __USE_EVENT_PROFILE__. To use this method of profiling, define __USE_EVENT_PROFILE__ for g++ cross compiler in sw/Makefile:

CXXFLAGS += -std=c++17 -D__USE_EVENT_PROFILE__ ......

The commands to build and run in hardware are the same as previously shown. The output in hardware is similar as follows:

Throughput (by event API) bandwidth=10571.6M Bytes/s