ZCU102 评估套件 - 2.5 简体中文

Vitis AI Library 用户指南 (UG1354)

Document ID
UG1354
Release Date
2022-06-15
Version
2.5 简体中文

ZCU102 评估套件使用中端 ZU9 UltraScale+™ 器件。ZCU102 评估套件使用两种不同的硬件版本,其一以序列号 0432055-04 作为标头,另一个则以序列号 0432055-05 作为标头。由于这两个硬件版本具有不同的 DDR 存储器性能,因此,其 Vitis AI Library 性能也不尽相同。由于 ZCU102 的 0432055-04 版本已停产,因此下表仅显示 ZCU102 (0432055-05) 评估套件的性能。在 ZCU102 评估套件中,通过编程逻辑实现了 3 个 B4096 DPU 核,并为深度学习推断加速提供了 3.45 TOPS INT8 峰值性能。

请参阅下表,查看具有运行频率为 281 MHz 的 DPU 的 ZCU102 (0432055-05) 上的各种神经网络采样的吞吐量性能(以每秒帧数或 fps 为单位)。

注释: ZCU102 上的 DPU 具有硬件 softmax 加速模块。由于硬件 softmax 模块所限,当类别数量达到 1000 时,软件 softmax 更快。设置 XLNX_ENABLE_C_SOFTMAX=1 即可启用软件 softmax:softmax_cXLNX_ENABLE_C_SOFTMAX 默认值为 0,这表示根据以下优先级选择 softmax 方法。
  1. Neon 加速
  2. 硬件 Softmax
  3. 软件 Softmax_c

对于 ZCU102,请使用以下命令来测试分类的性能。

env XLNX_ENABLE_C_SOFTMAX=1 ./test_performance_classification resnet50 test_performance_classification.list -t 8 -s 60
表 1. ZCU102 (0432055-05) 性能
编号 神经网络 输入大小 GOPS 性能 (fps)(单线程) 性能 (fps)(多线程)
1 bcc_pt 800x1000 268.9 3.3 10.9
2 c2d2_lite 512x512 6.86 2.8 5.2
3 centerpoint 2560x40x4 54 16 47.9
4 chen_color_resnet18_pt 224x224 3.627 204.7 499.9
5 clocs 12000x100x4 41 2.8 10.1
6 densebox_320_320 320x320 0.49 501.6 1750.8
7 densebox_640_360 360x640 1.1 249.4 864.5
8 drunet_pt 528x608 2.59 60.8 189.3
9 efficientdet_d2_tf 768x768 11.06 3.1 6
10 efficientnet-b0_tf2 224x224 0.36 78 152.6
11 efficientNet-edgetpu-L_tf 300x300 19.36 35 90.2
12 efficientNet-edgetpu-M_tf 240x240 7.34 79.9 205.2
13 efficientNet-edgetpu-S_tf 224x224 4.72 115 308.1
14 ENet_cityscapes_pt 512x1024 8.6 10.1 36.3
15 face_landmark 96x72 0.14 959.8 1601.8
16 face_mask_detection_pt 512x512 0.593 115.7 400.8
17 face-quality 80x60 0.06 3038.2 8824.5
18 face-quality_pt 80x60 0.06 2981.6 8775.5
19 facerec_resnet20 112x96 3.5 168.4 338
20 facerec_resnet64 112x96 11 73.3 183.4
21 facerec-resnet20_mixed_pt 112x96 3.5 168.8 338.3
22 facereid-large_pt 96x96 0.5 941.5 2276.7
23 facereid-small_pt 80x80 0.09 2243.1 6336.8
24 fadnet 576x960 441 1.2 1.6
25 fadnet_pruned 576x960 154 1.8 2.7
26 FairMot_pt 640x480 36 22.6 66
27 fpn 256x512 8.9 34.7 149.8
28 FPN_Res18_Medical_segmentation 320x320 45.3 12.6 47.3
29 FPN-resnet18_covid19-seg_pt 352x352 22.7 37.1 107.2
30 FPN-resnet18_Endov 240x320 13.75 37.4 155.8
31 HardNet_MSeg_pt 352x352 22.78 24.7 56.8
32 hfnet_tf 960x960 20.09 3.6 15.9
33 hourglass-pe_mpii 256x256 10.2 18.8 70.6
34 inception_resnet_v2_tf 299x299 26.4 23.9 52
35 inception_v1 224x224 3.2 187.6 471.6
36 inception_v1_tf 224x224 3 191.3 470.6
37 inception_v2 224x224 4 136.3 303.9
38 inception_v2_tf 224x224 3.88 93 229.8
39 inception_v3 299x299 11.4 59.9 135.8
40 inception_v3_pt 299x299 5.7 59.8 136
41 inception_v3_tf 299x299 11.5 59.8 135.3
42 inception_v3_tf2 299x299 11.5 59.2 136.4
43 inception_v4 299x299 24.5 28.8 68.5
44 inception_v4_2016_09_09_tf 299x299 24.6 28.8 68.6
45 medical_seg_cell_tf2 128x128 5.3 154.3 393.1
46 MLPerf_resnet50_v1.5_tf 224x224 8.19 78.9 173.6
47 mlperf_ssd_resnet34_tf 1200x1200 433 1.9 7.1
48 mobilenet_1_0_224_tf2 224x224 1.1 321.5 939.6
49 mobilenet_edge_0_75_tf 224x224 0.62 262.6 720.5
50 mobilenet_edge_1_0_tf 224x224 0.99 214.7 554.8
51 mobilenet_v1_0_25_128_tf 128x128 0.027 1331.1 4759.2
52 mobilenet_v1_0_5_160_tf 160x160 0.15 914.6 3130.3
53 mobilenet_v1_1_0_224_tf 224x224 1.1 326.8 951.7
54 mobilenet_v2 224x224 0.6 274.9 745.5
55 mobilenet_v2_1_0_224_tf 224x224 0.6 267.9 707.5
56 mobilenet_v2_1_4_224_tf 224x224 1.2 191.1 473
57 mobilenet_v2_cityscapes_tf 1024x2048 132.74 1.7 5.3
58 mobilenet_v3_small_1_0_tf2 224x224 0.132 338.1 966.3
59 movenet_ntd_pt 192x192 0.5 94.1 390.9
60 MT-resnet18_mixed_pt 512x320 13.65 32.8 103
61 multi_task 288x512 14.8 39.7 129
62 multi_task_v3_pt 320x512 25.44 17.1 61.3
63 ocr_pt 960x960 875.7 1.1 3.4
64 ofa_depthwise_res50_pt 176x176 1.25 106 369.1
65 ofa_rcan_latency_pt 360x640 45.7 17 28.1
66 ofa_resnet50_0_9B_pt 160x160 0.9 183.9 354.8
67 ofa_yolo_pruned_0_30_pt 640x640 34.71 21.5 54.5
68 ofa_yolo_pruned_0_50_pt 640x640 24.62 27.7 71.3
69 ofa_yolo_pt 640x640 48.88 16.9 42.8
70 openpose_pruned_0_3 368x368 49.9 3.8 15.1
71 person-orientation_pruned_558m_pt 224x112 0.558 661.1 1428.5
72 personreid-res18_pt 176x80 1.1 370.6 690.7
73 personreid-res50_pt 256x128 5.4 107.5 237.7
74 plate_detect 320x320 0.49 628.3 2242.2
75 plate_num 96x288 1.75 189.2 548.2
76 pmg_pt 224x224 2.28 151.7 366.9
77 pointpainting_nuscenes_pt 40000x64x16 112 1.3 4.2
78 pointpillars_kitti_pt 12000x100x4 10.8 19.6 49.3
79 pointpillars_nuscenes_pt 40000x64x5 108 2.2 9.5
80 rcan_pruned_tf 360x640 86.95 8.6 18
81 refinedet_baseline 480x360 123 8.6 24.8
82 refinedet_pruned_0_8 360x480 25 33 98.4
83 refinedet_pruned_0_92 360x480 10.1 65.6 199.5
84 refinedet_pruned_0_96 360x480 5.1 92.2 282.9
85 refinedet_VOC_tf 320x320 81.9 11.3 34.5
86 RefineDet-Medical_EDD_tf 320x320 9.8 67.6 229.4
87 reid 80x160 0.95 371 699.3
88 resnet_v1_101_tf 224x224 14.4 46.4 110.7
89 resnet_v1_152_tf 224x224 21.8 31.7 77.3
90 resnet_v1_50_tf 224x224 7 87.9 191.1
91 resnet_v2_101_tf 299x299 26.78 23.6 55.6
92 resnet_v2_152_tf 299x299 40.47 16.1 38.2
93 resnet_v2_50_tf 299x299 13.1 45 99.7
94 resnet18 224x224 3.7 196.9 488.4
95 resnet50 224x224 7.7 88.4 193.8
96 resnet50_pt 224x224 4.1 78.3 173.4
97 resnet50_tf2 224x224 7.7 87.3 192.9
98 retinaface 360x640 1.11 140 567.6
99 SA_gate_base_pt 360x360 178 3.3 9.5
100 salsanext_pt 64x2048 20.4 5.6 21.3
101 salsanext_v2_pt 64x2048 32 4.1 11
102 semantic_seg_citys_tf2 512x1024 54 7.4 23.9
103 SemanticFPN_cityscapes_pt 256x512 10 35.5 161.9
104 SemanticFPN_Mobilenetv2_pt 512x1024 5.4 10.5 52.5
105 SESR_S_pt 360x640 7.48 88.2 140.6
106 solo_pt 640x640 107 1.4 4.8
107 sp_net 128x224 0.55 583.4 1632.8
108 squeezenet 227x227 0.76 543.5 1435.5
109 squeezenet_pt 224x224 0.82 575.4 1498.6
110 ssd_adas_pruned_0_95 360x480 6.3 91.7 296.4
111 ssd_inception_v2_coco_tf 300x300 9.6 39.4 102.2
112 ssd_mobilenet_v1_coco_tf 300x300 2.5 110.9 331.1
113 ssd_mobilenet_v2 360x480 6.6 40.7 117.5
114 ssd_mobilenet_v2_coco_tf 300x300 3.8 81.3 213.3
115 ssd_pedestrian_pruned_0_97 360x360 5.9 80.7 278
116 ssd_resnet_50_fpn_coco_tf 640x640 178.4 2.9 5.2
117 ssd_traffic_pruned_0_9 360x480 11.6 56.4 199.8
118 ssdlite_mobilenet_v2_coco_tf 300x300 1.5 105.6 305.7
119 ssr_pt 256x256 39.72 6 14.4
120 superpoint_tf 480x640 52.4 12.6 53.8
121 textmountain_pt 960x960 575.2 1.7 4.7
122 tiny_yolov3_vmss 416x416 5.46 125.9 393.9
123 tsd_yolox_pt 640x640 73 13.1 33.7
124 ultrafast_pt 288x800 8.4 35.1 95.9
125 unet_chaos-CT_pt 512x512 23.3 22.8 69.7
126 vehicle_make_resnet18_pt 224x224 3.627 203.4 498.6
127 vehicle_type_resnet18_pt 224x224 3.627 205 500.6
128 vgg_16_tf 224x224 31 20.2 40.9
129 vgg_19_tf 224x224 39.3 17.4 36.5
130 vpgnet_pruned_0_99 480x640 2.5 104.8 354.3
131 yolov2_voc 448x448 34 26.6 69.2
132 yolov2_voc_pruned_0_66 448x448 11.6 65.5 191.8
133 yolov2_voc_pruned_0_71 448x448 9.9 75.3 224.1
134 yolov2_voc_pruned_0_77 448x448 7.8 89.2 269.4
135 yolov3_adas_pruned_0_9 256x512 5.5 98.4 271.8
136 yolov3_bdd 288x512 53.7 12.9 33.9
137 yolov3_coco_416_tf2 416x416 65.9 13.2 34.9
138 yolov3_voc 416x416 65.4 13.4 34.9
139 yolov3_voc_tf 416x416 65.6 13.6 35.1
140 yolov4_leaky_416_tf 416x416 60.3 13.5 34
141 yolov4_leaky_512_tf 512x512 91.2 10.2 25.3
142 yolov4_leaky_spp_m 416x416 60.1 13.7 34.3
143 yolov4_leaky_spp_m_pruned_0_36 416x416 38.2 18.9 46.6