XRT provides an xrt::error
class and its member functions to retrieve the asynchronous errors into the user-space host code. In this section, you will walk through a methodology to handle errors from underneath driver, system, hardware, etc.
To better understand the usage of error handling XRT APIs, an out of bound access in the kernel code is introduced which in turn causes issue executing the AI Engine graph controlled from the host code.
Add a memory read violation to the kernel code by opening
cmd_src/aie/kernels/peak_detect.cc
, and change the line 26 to v_in = *(InIter+8500000500).Replace the
cmd_src/sw/host.cpp
file with theHardware/host_xrtErrorAPI.cpp
. Make sure to take the back up of the original file.Observe lines 87-93.
xrt::error -> Class to retrieve the asynchronous errors in the host code.
get_error_code() -> Member function to get the timestamp of the last error.
to_string() -> Member function to get the description string of a given error code.
Do
make all TARGET=hw
to build the AI Engine kernels, s2mm and mm2s, host application, link, and package steps to generate the SD card image.Repeat the steps 3 and 4 from Running the Design on Hardware to run design on hardware.
Observe the output from the Linux console.
aie aie0: Asserted tile error event 60 at col 25 row 1
Above is the error propogated from the AI Engine array and is used to debug the application specific errors. For the list of error events, refer to the topic AI Engine Error Events. Notice the
error event 60
above which represents the DM address out of range, and the address out of range is happening incol 25 row 1
.You can open the graph compile summary in Vitis Analyzer and identify the kernel corresponding to the tile which is
peak_detect
in this case.You can debug this out of bound access at AI Engine simulation level - Refer to Debugging memory access violations for more information.
The other message in the console represents an asynchronous error ouput.
Error Driver (4): DRIVER_AIE Error Severity (3): SEVERITY_CRITICAL Error Module (3): MODULE_AIE_CORE Error Class (2): CLASS_AIE Timestamp: 1667916688683323200
XRT maintains the latest error for each class and an associated timestamp for when the error was generated. The information of error can be interpreted from xrt_error_code.h.
For example,
Error Module (3): MODULE_AIE_CORE
corresponds toXRT_ERROR_MODULE_AIE_CORE
in enumerationxrtErrorModule
.Press
ctrl+z
to suspend the execution.