- DSight is the Vitis AI performance profiler for edge DPU
and is a visual analysis tool for model performance profiling. The following
figure shows its usage.Figure 1. DSight Help Info
- By processing the log file produced by the runtime N2cube, DSight can generate an html web page, providing a visual format chart showing DPU cores’ utilization and scheduling efficiency.
- DExplorer is a utility running on the target board. It provides DPU running
mode configuration, DNNDK version checking, DPU status checking, and DPU core
signature checking. The following figure shows the help information for the
usage of DExplorer.Figure 2. DExplorer Usage Options
- Check DNNDK Version
dexplore -vwill display version information for each component in DNNDK, including N2cube, DPU driver, DExplorer, and DSight.
- Check DPU Status
- DExplorer provides DPU status information, including running mode of N2cube,
DPU timeout threshold, DPU debugging level, DPU core status, DPU register
information, DPU memory resource, and utilization. The following figure shows a
screenshot of DPU status.Figure 3. DExplorer Status
- Configuring DPU Running Mode
- Edge DPU runtime N2cube supports three kinds of DPU execution modes to help
developers to debug and profile Vitis AI applications.
- Normal Mode
- In normal mode, the DPU application can get the best performance without any overhead.
- Profile Mode
- In profile mode, the DPU will turn on the profiling switch. When running deep learning applications in profile mode, N2cube will output to the console the performance data layer by layer while executing the neural network; at the same time, a profile with the name dpu_trace_[PID].prof will be produced under the current folder. This file can be used with the DSight tool.
- Debug Mode
- In this mode, the DPU dumps raw data for each DPU computation node during execution, including DPU instruction code in binary format, network parameters, DPU input tensor, and output tensor. This makes it possible to debug and locate issues in a DPU application.
- Checking DPU Signature
- New DPU cores have been introduced to meet various deep learning acceleration requirements across different Xilinx® FPGA devices. For example, DPU architectures B1024F, B1152F, B1600F, B2304F, and B4096F are available. Each DPU architecture can implement a different version of the DPU instruction set (named as a DPU target version) to support the rapid improvements in deep learning algorithms.
- The DPU signature refers to the specification information of a specific DPU architecture version, covering target version, working frequency, DPU core numbers, harden acceleration modules (such as softmax), etc. The -w option can be used to check the DPU signature. The following figure shows a screen capture of a sample run of dexplorer -w.
- For configurable DPU, dexplorer can help to display all configuration
parameters of a DPU signature, as shown in the following figure.Figure 4. Sample DPU Signature with Configuration Parameters
- DDump is a utility tool to dump the information encapsulated inside a DPU ELF file, hybrid executable, or DPU shared library and can facilitate users to analyze and debug various issues. Refer to DPU Shared Library for more details.
- DDump is available on both runtime container vitis-ai-docker-runtime and
Vitis AI evaluation boards. Usage information is shown in the figure below. For
runtime container, it is accessible from path /opt/vitis-ai/utility/ddump. For
evaluation boards, it is installed under Linux system path and can be used
directly.Figure 5. DDump Usage Options
- Check DPU Kernel Info
- DDump can dump the following information for each DPU kernel from DPU ELF
file, hybrid executable, or DPU shared library.
- The mode of DPU kernel compiled by VAI_C compiler, NORMAL, or DEBUG.
- Code Size
- The DPU instruction code size in the unit of MB, KB, or bytes for DPU kernel.
- Param Size
- The Parameter size in the unit of MB, KB, or bytes for DPU kernel, including weight and bias.
- Workload MACs
- The computation workload in the unit of MOPS for DPU kernel.
- IO Memory Space
- The required DPU memory space in the unit of MB, KB, or bytes for intermediate feature map. For each created DPU task, N2Cube automatically allocates DPU memory buffer for intermediate feature map.
- Mean Value
- The mean values for DPU kernel.
- Node Count
- The total number of DPU nodes for DPU kernel.
- Tensor Count
- The total number of DPU tensors for DPU kernel.
- Tensor In(H*W*C)
- The DPU input tensor list and their shape information in the format of height*width*channel.
- Tensor Out(H*W*C)
- The DPU output tensor list and their shape information in the format of height*width*channel.
- The following figure shows the screenshot of DPU kernel information for
ResNet50 DPU ELF file
ddump -f dpu_resnet50_0.elf -k.Figure 6. DDump DPU Kernel Information for ResNet50
- Check DPU Arch Info
- DPU configuration information from DPU DCF is automatically wrapped into DPU ELF file by VAI_C compiler for each DPU kernel. VAI_C then generates the appropriate DPU instructions, according to DPU configuration parameters. Refer to Zynq DPU v3.1 IP Product Guide (PG338) for more details about configurable DPU descriptions.
- DDump can dump out the following DPU architecture information:
- DPU Target Ver
- The version of DPU instruction set.
- DPU Arch Type
- The type of DPU architecture, such as B512, B800, B1024, B1152, B1600, B2304, B3136, and B4096.
- RAM Usage
- Low or high RAM usage.
- DepthwiseConv engine enabled or not.
- The operator pattern of DepthwiseConv following Relu6, enabled or not.
- The operator pattern of Conv following Leakyrelu, enabled or not.
- The operator pattern of Conv following Relu6, enabled or not.
- Channel Augmentation
- An optional feature to improve DPU computation efficiency against channel dimension, especially for those layers whose input channels are much less than DPU channel parallelism.
- Average Pool
- The average pool engine, enabled or not.
DPU architecture information may vary with the versions of DPU IP. Running command
ddump -f dpu_resnet50_0.elf -d, one set of DPU architecture information used by VAI_C to compile ResNet50 model is shown in the following figure.Figure 7. DDump DPU Arch Information for ResNet50
- Check VAI_C Info
- VAI_C version information is automatically embedded into DPU ELF file while compiling network model. DDump can help to dump out this VAI_C version information, which users can provide to the Xilinx AI support team for debugging purposes.
- Running command
ddump -f dpu_resnet50_0.elf -cfor ResNet50 model VAI_C information is shown in the following figure.Figure 8. DDump VAI_C Info for ResNet50
- Legacy Support
- DDump also supports dumping the information for legacy DPU ELF file, hybrid executable, and DPU shared library generated. The main difference is that there is no detailed DPU architecture information.
- An example of dumping all of the information for legacy ResNet50 DPU ELF file
ddump -f dpu_resnet50_0.elf -ais shown in the following figure.
- DLet is host tool designed to parse and extract various edge DPU
configuration parameters from DPU hardware handoff file HWH, generated by
Vivado. The following figure shows the
usage information of DLet.Figure 9. Dlet Usage Options
- For Vivado project, DPU HWH is located under the
following directory by default.
<prj_name>is Vivado project name, and
<bd_name>is Vivado block design name.
- Running command
dlet -f <bd_name>.hwh, DLet outputs the DPU configuration file DCF, named in the format of
dpu-dd-mm-yyyy-hh-mm.dcf. dd-mm-yyyy-hh-mmis the timestamp of when the DPU HWH is created. With the specified DCF file, VAI_C compiler automatically produces DPU code instructions suited for the DPU configuration parameters.