Debugging - 1.4 English

After the model is deployed on edge DPU, perhaps the running results are not as desired, running into a lower accuracy issue. Under this situation, the users should first check the model’s accuracy after quantized by Vitis AI quantizer. If this is fine, then two suspected points are left to be further debugged. One possible point is related to the deployment source code, which should be checked very carefully. The other possible point is related to DPU execution itself. This section focuses on the illustrations about debugging the DPU running result. Normally, it involves the following five steps.

Run Vitis AI quantizer to generate the golden baseline from the quantized model.
Build the model as debug mode DPU kernel by Vitis AI compiler with option --dump fused_graph_info specified.
Before launching the running of DPU application, run command dexplorer -m debug to switch runtime N2Cube into debug mode, or calling dpuEnableTaskDebug() to enable debug mode for the dedicated DPU task only (other tasks will not be affected).
Run the DPU application and get the raw dump data for DPU task’s each node.
Compare DPU raw dump data with the golden baseline from quantizer.

DNNDK sample debugging is delivered within Vitis AI package to demonstrate how to debug the DPU. TensorFlow Inception-v1 model is deployed within this sample and there are two sub-folders: decent_golden and dpu_deployment. The folder decent_golden holds all the required files to generate golden baseline together with the evaluation version model quantize_eval_model.pb (deployable version model cannot be used) generated by quantizer. Run script decent_dump_golden.sh to dump the golden baseline for the input image decent_golden/dataset/images/cropped_224x224.jpg and save into the folder decent_golden/dump_golden/dump_results_0/.

For caffe model, the users can apply the following command to generate golden baseline from the quantized model. After completion, the golden results will be dumped into folder dump_gpu by default.

DECENT_DEBUG=5 vai_q_caffe test -model quantize_model/quantize_train_test.prototxt \
                                  -weights quantize_model/quantize_train_test.caffemodel \
                                  -test_iter 1 \
                                  2>&1 | tee ./log/dump.log

With option --dump fused_graph_info specified to Vitis AI compiler, while compiling Inception-v1 model, one file named fused_graph_kernel_0.txt will be produced with DPU kernel dpu_tf_inception_v1_0. The folder dpu_deployment holds the deployment source code for Inception-v1 and dpuEnableTaskDump() is used to enable DPU raw data dumping. After going through the code in source file main.cc, it can be noticed that pre-processing and post-processing for Inception-v1 model are not included, which is helpful for isolating the affections of deployment code during debugging DPU. The file fused_graph_kernel_0.txt describes the mapping relationship between DPU node (or super-layer), which may contain several fused layers or operators, and the quantized model’s layers or operators, which are divided into two types, in and out. For Caffe model, the layers’ names are identical with the original floating-point model. For TensorFlow model, the operators’ names are slightly different from the original floating-point model because Vitis AI quantizer performs some operators’ fusion. With the name of the quantized model’s layer or operator, the users can locate its corresponding dump files from quantizer golden baseline.

For kernel dpu_tf_inception_v1_0.elf of TensorFlow Inception-v1 model, the mapping information for its input node input and output node InceptionV1_Logits_Conv2d_0c_1x1_Conv2D is shown below. For input node input, its out operator is input. And for output node InceptionV1_Logits_Conv2d_0c_1x1_Conv2D, its out operator is InceptionV1_Logits_Conv2d_0c_1x1_Conv2D.

input :  
{
in(0): null
out(0): input
};

InceptionV1_Logits_Conv2d_0c_1x1_Conv2D :  
{
in(0): InceptionV1_Logits_AvgPool_0a_7x7_AvgPool
out(0): InceptionV1_Logits_Conv2d_0c_1x1_Conv2D
};

For out type operator input, its corresponding text format dump file from Vitis AI quantizer isinput_aquant_int8.txt (_aquant_int8 is the added suffix), which can be found fromdecent_golden/dump_golden/dump_results_0/. Feed Int8 type input data from input_aquant_int8.txt into DPU input node input. After compiling and running this DPU application, raw data for each DPU node will be dumped into a folder like dump_2134 (number 2134 is process ID). For the last DPU node InceptionV1_Logits_Conv2d_0c_1x1_Conv2D, locate the DPU Int8 type running result within the file tf_inception_v1_0_InceptionV1_Logits_Conv2d_0c_1x1_Conv2D_out0.bin (prefix tf_inception_v1_0_ is the kernel name. And suffix out0 indicates that it is the first output tensor for this DPU node). For the last DPU node InceptionV1_Logits_Conv2d_0c_1x1_Conv2D, use its out type operatorInceptionV1_Logits_Conv2d_0c_1x1_Conv2D to find the golden output file from quantizer. Quantizer may fuse operators during performing quantization for TensorFlow model. For Inception-v1 model, you can find the similar name dump file InceptionV1_Logits_Conv2d_0c_1x1_BiasAdd_aquant_int8.bin (Conv2d and BiasAdd are two adjacent operators within model. _aquant_int8 is the added suffix). Lastly, check to see if DPU output of tf_inception_v1_0_InceptionV1_Logits_Conv2d_0c_1x1_Conv2D_out0.bin and quantizer output of InceptionV1_Logits_Conv2d_0c_1x1_BiasAdd_aquant_int8.bin are equal or not. If they are the same then it can be confirmed that Inception-v1 runs well over DPU, as expected. Nevertheless, potential issues exist related to DPU execution. Contact Xilinx and report bugs.