Vitis Analyzer for Application End-to-end Timeline Analysis - 2023.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-11-13
Version
2023.2 English

Vitis Analyzer is a graphical tool which lets you browse many aspects of the design starting from the whole system down to the details of the kernel.

Click to expand! (instructions for Vitis Analyzer)
  1. Open a terminal and set up Vitis.

  2. Change directory to ./build.

  3. Run vitis_analyzer xrt.run_summary.

  4. Navigate around in Vitis Analyzer.

    Make sure to check:

    1. Profile summary

    2. Guidance reports — indicates area of improvement

    3. Timeline Trace — more information just below

The Timeline has the following structure:

  • Host

    • OpenCL API Calls: All OpenCL API calls are traced here. The activity time is measured from the host perspective.

    • General: All general OpenCL API calls such as clCreateProgramWithBinary, clCreateContext, and clCreateCommandQueue, are traced here.

    • Queue: OpenCL API calls that are associated with a specific command queue are traced here. This includes commands such as clEnqueueMigrateMemObjects, and clEnqueueNDRangeKernel. If the user application creates multiple command queues, then this section shows all the queues and activities.

    • Data Transfer: In this section, the direct memory access (DMA) transfers from the host to the device memory are traced. There are multiple DMA threads implemented in the OpenCL runtime, and there is typically an equal number of DMA channels. The DMA transfer is initiated by the user application by calling OpenCL APIs, such as clEnqueueMigrateMemObjects. These DMA requests are forwarded to the runtime which delegates to one of the threads. The data transfer from the host to the device appear under Write as they are written by the host, and the transfers from device to host appear under Read.

    • Kernel Enqueues: The kernels enqueued by the host program are shown here. The kernels here should not be confused with the kernels/compute units on the device. Here kernel refers to the NDRangeKernels and tasks created by the OpenCL commands clEnqueueNDRangeKernels and clEnqueueTask. These are plotted against the time measured from the host’s perspective. Multiple kernels can be scheduled to be executed at the same time, and they are traced from the point they are scheduled to run until the end of the kernel execution. Multiple entries would be shown in different rows depending on the number of overlapping kernel executions.

  • Device “name”

    Binary Container “name”: Simply the binary container name.

    • Accelerator “name”: Name of the compute unit (also known as, the Accelerator) on the FPGA.