Vitis AI Utilities - 1.4 English

DSight

DSight is the Vitis AI performance profiler for edge DPU and is a visual analysis tool for model performance profiling. The following figure shows its usage.

Figure 1. DSight Help Info

By processing the log file produced by the runtime N2cube, DSight can generate an html web page, providing a visual format chart showing DPU cores’ utilization and scheduling efficiency.

DExplorer

DExplorer is a utility running on the target board. It provides DPU running mode configuration, DNNDK version checking, DPU status checking, and DPU core signature checking. The following figure shows the help information for the usage of DExplorer.

Figure 2. DExplorer Usage Options

Check DNNDK Version

Running dexplore -v will display version information for each component in DNNDK, including N2cube, DPU driver, DExplorer, and DSight.

Check DPU Status

DExplorer provides DPU status information, including running mode of N2cube, DPU timeout threshold, DPU debugging level, DPU core status, DPU register information, DPU memory resource, and utilization. The following figure shows a screenshot of DPU status.

Figure 3. DExplorer Status

Configuring DPU Running Mode

Edge DPU runtime N2cube supports three kinds of DPU execution modes to help developers to debug and profile Vitis AI applications.

Normal Mode: In normal mode, the DPU application can get the best performance without any overhead.
Profile Mode: In profile mode, the DPU will turn on the profiling switch. When running deep learning applications in profile mode, N2cube will output to the console the performance data layer by layer while executing the neural network; at the same time, a profile with the name dpu_trace_[PID].prof will be produced under the current folder. This file can be used with the DSight tool.
Debug Mode: In this mode, the DPU dumps raw data for each DPU computation node during execution, including DPU instruction code in binary format, network parameters, DPU input tensor, and output tensor. This makes it possible to debug and locate issues in a DPU application.

Checking DPU Signature

New DPU cores have been introduced to meet various deep learning acceleration requirements across different Xilinx® FPGA devices. For example, DPU architectures B1024F, B1152F, B1600F, B2304F, and B4096F are available. Each DPU architecture can implement a different version of the DPU instruction set (named as a DPU target version) to support the rapid improvements in deep learning algorithms.

The DPU signature refers to the specification information of a specific DPU architecture version, covering target version, working frequency, DPU core numbers, harden acceleration modules (such as softmax), etc. The -w option can be used to check the DPU signature. The following figure shows a screen capture of a sample run of dexplorer -w.

For configurable DPU, dexplorer can help to display all configuration parameters of a DPU signature, as shown in the following figure.

Figure 4. Sample DPU Signature with Configuration Parameters

DDump

DDump is a utility tool to dump the information encapsulated inside a DPU ELF file, hybrid executable, or DPU shared library and can facilitate users to analyze and debug various issues. Refer to DPU Shared Library for more details.

DDump is available on both runtime container vitis-ai-docker-runtime and Vitis AI evaluation boards. Usage information is shown in the figure below. For runtime container, it is accessible from path /opt/vitis-ai/utility/ddump. For evaluation boards, it is installed under Linux system path and can be used directly.

Figure 5. DDump Usage Options

Check DPU Kernel Info

DDump can dump the following information for each DPU kernel from DPU ELF file, hybrid executable, or DPU shared library.

Mode: The mode of DPU kernel compiled by VAI_C compiler, NORMAL, or DEBUG.
Code Size: The DPU instruction code size in the unit of MB, KB, or bytes for DPU kernel.
Param Size: The Parameter size in the unit of MB, KB, or bytes for DPU kernel, including weight and bias.
Workload MACs: The computation workload in the unit of MOPS for DPU kernel.
IO Memory Space: The required DPU memory space in the unit of MB, KB, or bytes for intermediate feature map. For each created DPU task, N2Cube automatically allocates DPU memory buffer for intermediate feature map.
Mean Value: The mean values for DPU kernel.
Node Count: The total number of DPU nodes for DPU kernel.
Tensor Count: The total number of DPU tensors for DPU kernel.
Tensor In(H*W*C): The DPU input tensor list and their shape information in the format of height*width*channel.
Tensor Out(H*W*C): The DPU output tensor list and their shape information in the format of height*width*channel.

The following figure shows the screenshot of DPU kernel information for ResNet50 DPU ELF file dpu_resnet50_0.elf with command ddump -f dpu_resnet50_0.elf -k.

Figure 6. DDump DPU Kernel Information for ResNet50

Check DPU Arch Info

DPU configuration information from DPU DCF is automatically wrapped into DPU ELF file by VAI_C compiler for each DPU kernel. VAI_C then generates the appropriate DPU instructions, according to DPU configuration parameters. Refer to Zynq DPU v3.1 IP Product Guide (PG338) for more details about configurable DPU descriptions.

DDump can dump out the following DPU architecture information:

DPU Target Ver: The version of DPU instruction set.
DPU Arch Type: The type of DPU architecture, such as B512, B800, B1024, B1152, B1600, B2304, B3136, and B4096.
RAM Usage: Low or high RAM usage.
DepthwiseConv: DepthwiseConv engine enabled or not.
DepthwiseConv+Relu6: The operator pattern of DepthwiseConv following Relu6, enabled or not.
Conv+Leakyrelu: The operator pattern of Conv following Leakyrelu, enabled or not.
Conv+Relu6: The operator pattern of Conv following Relu6, enabled or not.
Channel Augmentation: An optional feature to improve DPU computation efficiency against channel dimension, especially for those layers whose input channels are much less than DPU channel parallelism.
Average Pool: The average pool engine, enabled or not.

DPU architecture information may vary with the versions of DPU IP. Running command ddump -f dpu_resnet50_0.elf -d, one set of DPU architecture information used by VAI_C to compile ResNet50 model is shown in the following figure.

Figure 7. DDump DPU Arch Information for ResNet50

Check VAI_C Info

VAI_C version information is automatically embedded into DPU ELF file while compiling network model. DDump can help to dump out this VAI_C version information, which users can provide to the Xilinx AI support team for debugging purposes.

Running command

ddump -f dpu_resnet50_0.elf
						-c

for ResNet50 model VAI_C information is shown in the following figure.

Figure 8. DDump VAI_C Info for ResNet50

Legacy Support

DDump also supports dumping the information for legacy DPU ELF file, hybrid executable, and DPU shared library generated. The main difference is that there is no detailed DPU architecture information.

An example of dumping all of the information for legacy ResNet50 DPU ELF file with command ddump -f dpu_resnet50_0.elf -a is shown in the following figure.

DLet

DLet is host tool designed to parse and extract various edge DPU configuration parameters from DPU hardware handoff file HWH, generated by Vivado. The following figure shows the usage information of DLet.

Figure 9. Dlet Usage Options

For Vivado project, DPU HWH is located under the following directory by default. <prj_name> is Vivado project name, and <bd_name> is Vivado block design name.

 <prj_name>/<prj_name>.srcs/sources_1/bd/<bd_name>/hw_handoff/<bd_name>.hwh

Running command dlet -f <bd_name>.hwh, DLet outputs the DPU configuration file DCF, named in the format of

dpu-dd-mm-yyyy-hh-mm.dcf.
						dd-mm-yyyy-hh-mm

is the timestamp of when the DPU HWH is created. With the specified DCF file, VAI_C compiler automatically produces DPU code instructions suited for the DPU configuration parameters.