Alveo U50/U50LV Data Accelerator Card - 1.4.1 English

Vitis AI Library User Guide (UG1354)

Document ID
UG1354
Release Date
2021-12-11
Version
1.4.1 English

The Xilinx® Alveo U50 Data Center accelerator cards are peripheral component interconnect express ( PCIe® ) Gen3x16 compliant and Gen4x8 compatible cards featuring the Xilinx 16 nm UltraScale+ technology. In this release, DPU is implemented in program logic for deep learning inference acceleration.

Note: Some models cannot run at the highest frequency of DPU and need DPU frequency reduction. See For Edge for DPU frequency reduction operation.

U50 Performance with 6E300 MHz DPUCAHX8H

Refer to the following table for the throughput performance (in frames/sec or fps) for various neural network samples on U50 Gen3x4 with DPUCAHX8H running at 6E@300 MHz.

Table 1. U50 Performance with 6E300 MHz DPUCAHX8H
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 densebox_320_320 320x320 0.49 300 2231.2
2 densebox_640_360 360x640 1.1 300 969.4
3 ENet_cityscapes_pt 512x1024 8.6 300x0.9 69.5
4 face_landmark 96x72 0.14 300 10035.1
5 face-quality 80x60 0.06 300x0.9 21786.1
6 face-quality_pt 80x60 0.06 300x0.9 21835.0
7 facerec_resnet20 112x96 3.5 300 1330.5
8 facerec-resnet20_mixed_pt 112x96 3.5 300x0.9 1213.5
9 facerec_resnet64 112x96 11 300 491.1
10 facereid-large_pt 96x96 0.5 300x0.9 7096.1
11 facereid-small_pt 80x80 0.09 300x0.9 19550.9
12 fpn 256x512 8.9 300 439.2
13 FPN_Res18_Medical_segmentation 320x320 45.3 300 101.5
14 FPN-resnet18_covid19-seg_pt 352x352 22.7 300x0.9 205.1
15 inception_resnet_v2_tf 299x299 26.4 300 172.1
16 inception_v1 224x224 3.2 300 1232.8
17 inception_v1_tf 224x224 3 300 1252.0
18 inception_v2 224x224 4 300 972.4
19 inception_v3 299x299 11.4 300 406.5
20 inception_v3_pt 299x299 5.7 300 406.6
21 inception_v3_tf 299x299 11.5 300 406.9
22 inception_v3_tf2 299x299 11.5 300x0.9 372.0
23 inception_v4 299x299 24.5 300 185.0
24 inception_v4_2016_09_09_tf 299x299 24.6 300 184.8
25 medical_seg_cell_tf2 128x128 5.3 300x0.9 1051.1
26 MLPerf_resnet50_v1.5_tf 224x224 8.19 300x0.9 500.2
27 mlperf_ssd_resnet34_tf 1200x1200 433 300x0.9 13.8
28 multi_task 288x512 14.8 300 336.4
29 openpose_pruned_0_3 368x368 49.9 300x0.9 29.1
30 personreid-res18_pt 176x80 1.1 300x0.9 3404.4
31 personreid-res50_pt 256x128 5.4 300x0.9 805.8
32 plate_detection 320x320 0.49 300 5701.5
33 plate_num 96x288 1.75 300x0.9 1184.2
34 pmg_pt 224x224 2.28 300 1090.0
35 refinedet_baseline 480x360 123 300x0.9 49.8
36 RefineDet-Medical_EDD_tf 320x320 9.8 300x0.9 421.0
37 refinedet_pruned_0_8 360x480 25 300x0.9 193.0
38 refinedet_pruned_0_92 360x480 10.1 300x0.9 404.4
39 refinedet_pruned_0_96 360x480 5.1 300x0.9 592.6
40 refinedet_VOC_tf 320x320 81.9 300x0.9 71.2
41 reid 80x160 0.95 300 3849.5
42 resnet18 224x224 3.7 300 1410.3
43 resnet50 224x224 7.7 300 572.7
44 resnet50_pt 224x224 4.1 300 554.7
45 resnet50_tf2 224x224 7.7 300x0.9 516.4
46 resnet_v1_101_tf 224x224 14.4 300 333.6
47 resnet_v1_152_tf 224x224 21.8 300 222.3
48 resnet_v1_50_tf 224x224 7 300 642.6
49 salsanext_pt 64x2048 20.4 300x0.9 143.6
50 salsanext_v2_pt 64x2048 32 300 25.1
51 SemanticFPN_cityscapes_pt 256x512 10 300x0.9 425.8
52 semantic_seg_citys_tf2 512x1024 54 300x0.9 50.0
53 sp_net 128x224 0.55 300 2826.2
54 squeezenet 227x227 0.76 300 3619.4
55 squeezenet_pt 224x224 0.82 300 2079.3
56 ssd_adas_pruned_0_95 360x480 6.3 300 654.5
57 ssd_pedestrian_pruned_0_97 360x360 5.9 300x0.9 568.6
58 ssd_resnet_50_fpn_coco_tf 640x640 178.4 300x0.9 31.2
59 ssd_traffic_pruned_0_9 360x480 11.6 300 428.9
60 tiny_yolov3_vmss 416x416 5.46 300x0.9 870.6
61 unet_chaos-CT_pt 512x512 23.3 300x0.9 60.2
62 vgg_16_tf 224x224 31 300 156.3
63 vgg_19_tf 224x224 39.3 300 131.0
64 vpgnet_pruned_0_99 480x640 2.5 300x0.9 480.3
65 yolov2_voc 448x448 34 300x0.9 162.4
66 yolov2_voc_pruned_0_66 448x448 11.6 300x0.9 411.3
67 yolov2_voc_pruned_0_71 448x448 9.9 300x0.9 481.3
68 yolov2_voc_pruned_0_77 448x448 7.8 300x0.9 585.4
69 yolov3_adas_pruned_0_9 256x512 5.5 300x0.9 611.5
70 yolov3_bdd 288x512 53.7 300x0.9 74.9
71 yolov3_voc 416x416 65.4 300x0.9 76.7
72 yolov3_voc_tf 416x416 65.6 300x0.9 77.2
73 yolov4_leaky_spp_m 416x416 60.1 300x0.9 80.8
74 yolov4_leaky_spp_m_pruned_0_36 416x416 38.2 300 84.8

U50 Performance with 1E333 MHz DPUCAHX8L

Refer to the following table for the throughput performance (in frames/sec or fps) for various neural network samples on U50 Gen3x4 with DPUCAHX8L running at 1E@333 MHz.

Table 2. U50 Performance with 1E333 MHz DPUCAHX8L
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 ENet_cityscapes_pt 512x1024 8.6 333 5.9
2 face_landmark 96x72 0.14 333 4988.1
3 face-quality 80x60 0.06 333 7036.6
4 face-quality_pt 80x60 0.06 333 7037.8
5 facerec_resnet20 112x96 3.5 333 323.0
6 facerec-resnet20_mixed_pt 112x96 3.5 333 321.7
7 facerec_resnet64 112x96 11 333 144.3
8 facereid-small_pt 80x80 0.09 333 4403.3
9 fpn 256x512 8.9 333 39.8
10 FPN_Res18_Medical_segmentation 320x320 45.3 333 14.9
11 FPN-resnet18_covid19-seg_pt 352x352 22.7 333 90.3
12 inception_resnet_v2_tf 299x299 26.4 333 35.2
13 inception_v1 224x224 3.2 333 343.5
14 inception_v1_tf 224x224 3 333 348.3
15 inception_v2 224x224 3.88 333 197.3
16 inception_v3 299x299 11.4 333 107.5
17 inception_v3_pt 299x299 5.7 333 107.2
18 inception_v3_tf 299x299 11.5 333 107.8
19 inception_v3_tf2 299x299 11.5 333 102.1
20 inception_v4 299x299 24.5 333 57.0
21 inception_v4_2016_09_09_tf 299x299 24.6 333 57.1
22 medical_seg_cell_tf2 128x128 5.3 333 116.6
23 MLPerf_resnet50_v1.5_tf 224x224 8.19 333 88.9
24 mlperf_ssd_resnet34_tf 1200x1200 433 333 7.0
25 mobilenet_1_0_224_tf2 224x224 1.1 333 2104.1
26 mobilenet_v1_0_5_160_tf 160x160 0.15 333 5480.7
27 mobilenet_v1_1_0_224_tf 224x224 1.1 333 2135.2
28 mobilenet_v2 224x224 0.6 333 1190.2
29 mobilenet_v2_1_0_224_tf 224x224 0.6 333 1186.5
30 mobilenet_v2_1_4_224_tf 224x224 1.2 333 872.4
31 multi_task 288x512 14.8 333 21.5
32 openpose_pruned_0_3 368x368 49.9 333 17.4
33 personreid-res50_pt 256x128 5.4 333 106.4
34 plate_detection 320x320 0.49 333 1036.7
35 rcan_pruned_tf 360x640 86.95 333 4.1
36 refinedet_baseline 480x360 123 333 30.1
37 RefineDet-Medical_EDD_tf 320x320 9.8 333 152.3
38 refinedet_pruned_0_8 360x480 25 333 67.4
39 refinedet_pruned_0_92 360x480 10.1 333 77.4
40 refinedet_pruned_0_96 360x480 5.1 333 94.5
41 refinedet_VOC_tf 320x320 81.9 333 48.0
42 reid 80x160 0.95 333 612.4
43 resnet18 224x224 3.7 333 320.9
44 resnet50 224x224 7.7 333 87.9
45 resnet50_pt 224x224 4.1 333 89.2
46 resnet50_tf2 224x224 7.7 333 88.0
47 resnet_v1_101_tf 224x224 14.4 333 57.8
48 resnet_v1_152_tf 224x224 21.8 333 38.6
49 resnet_v1_50_tf 224x224 7 333 101.7
50 retinaface 360x640 1.11 333 206.5
51 salsanext_pt 64x2048 20.4 333 12.0
52 SemanticFPN_cityscapes_pt 256x512 10 333 36.5
53 SemanticFPN_Mobilenetv2_pt 512x1024 5.4 333 11.1
54 semantic_seg_citys_tf2 512x1024 54 333 5.6
55 sp_net 128x224 0.55 333 10455.9
56 squeezenet 227x227 0.76 333 838.0
57 squeezenet_pt 224x224 0.82 333 471.2
58 ssd_adas_pruned_0_95 360x480 6.3 333 117.5
59 ssdlite_mobilenet_v2_coco_tf 300x300 1.5 333 610.7
60 ssd_mobilenet_v1_coco_tf 300x300 2.5 333 1053.1
61 ssd_mobilenet_v2 360x480 6.6 333 157.7
62 ssd_mobilenet_v2_coco_tf 300x300 3.8 333 238.4
63 ssd_pedestrian_pruned_0_97 360x360 5.9 333 107.9
64 ssd_traffic_pruned_0_9 360x480 11.6 333 112.5
65 vgg_16_tf 224x224 31 333 57.7
66 vgg_19_tf 224x224 39.3 333 51.9
67 vpgnet_pruned_0_99 480x640 2.5 333 72.3

U50LV Performance with 10E275 MHz DPUCAHX8H

The following table shows the throughput performance (in frames/sec or fps) for various neural network samples on U50LV Gen3x4 with DPUCAHX8H running at 10E@275 MHz.

Table 3. U50LV Performance with 10E275 MHz DPUCAHX8H
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 densebox_320_320 320x320 0.49 275 3247.4
2 densebox_640_360 360x640 1.1 275 1435.5
3 ENet_cityscapes_pt 512x1024 8.6 275x0.9 104.1
4 face_landmark 96x72 0.14 275 14849.1
5 face-quality 80x60 0.06 275x0.9 23500.4
6 face-quality_pt 80x60 0.06 275x0.9 23449.1
7 facerec_resnet20 112x96 3.5 275 2026.2
8 facerec-resnet20_mixed_pt 112x96 3.5 275x0.9 1847.0
9 facerec_resnet64 112x96 11 275 749.7
10 facereid-large_pt 96x96 0.5 275x0.9 10587.9
11 facereid-small_pt 80x80 0.09 275x0.9 25648.8
12 fpn 256x512 8.9 275x0.9 597.8
13 FPN_Res18_Medical_segmentation 320x320 45.3 275x0.9 140.0
14 FPN-resnet18_covid19-seg_pt 352x352 22.7 275x0.9 313.2
15 inception_resnet_v2_tf 299x299 26.4 275x0.8 210.7
16 inception_v1 224x224 3.2 275x0.9 1700.5
17 inception_v1_tf 224x224 3 275x0.9 1726.9
18 inception_v2 224x224 4 275x0.9 1342.4
19 inception_v3 299x299 11.4 275x0.8 498.2
20 inception_v3_pt 299x299 5.7 275x0.8 498.0
21 inception_v3_tf 299x299 11.5 275x0.8 498.4
22 inception_v3_tf2 299x299 11.5 275x0.9 567.6
23 inception_v4 299x299 24.5 275x0.8 226.7
24 inception_v4_2016_09_09_tf 299x299 24.6 275x0.8 226.7
25 medical_seg_cell_tf2 128x128 5.3 275x0.9 1600.3
26 MLPerf_resnet50_v1.5_tf 224x224 8.19 275x0.9 763.1
27 mlperf_ssd_resnet34_tf 1200x1200 433 275x0.9 21.1
28 multi_task 288x512 14.8 275x0.9 462.2
29 openpose_pruned_0_3 368x368 49.9 275x0.7 34.8
30 personreid-res18_pt 176x80 1.1 275x0.9 5144.3
31 personreid-res50_pt 256x128 5.4 275x0.9 1227.6
32 plate_detection 320x320 0.49 275 6997.9
33 plate_num 96x288 1.75 275x0.9 1656.7
34 pmg_pt 224x224 2.28 275x0.8 1333.7
35 refinedet_baseline 480x360 123 275x0.7 59.2
36 RefineDet-Medical_EDD_tf 320x320 9.8 275x0.9 641.2
37 refinedet_pruned_0_8 360x480 25 275x0.9 295.6
38 refinedet_pruned_0_92 360x480 10.1 275x0.9 614.2
39 refinedet_pruned_0_96 360x480 5.1 275x0.9 902.6
40 refinedet_VOC_tf 320x320 81.9 275x0.9 108.6
41 reid 80x160 0.95 275 5833.1
42 resnet18 224x224 3.7 275x0.9 1935.5
43 resnet50 224x224 7.7 275x0.9 787.5
44 resnet50_pt 224x224 4.1 275x0.9 763.6
45 resnet50_tf2 224x224 7.7 275x0.9 788.3
46 resnet_v1_101_tf 224x224 14.4 275x0.9 459.7
47 resnet_v1_152_tf 224x224 21.8 275x0.9 306.5
48 resnet_v1_50_tf 224x224 7 275x0.9 884.5
49 salsanext_pt 64x2048 20.4 275x0.9 149.7
50 salsanext_v2_pt 64x2048 32 275x0.8 35.9
51 SemanticFPN_cityscapes_pt 256x512 10 275x0.9 640.8
52 semantic_seg_citys_tf2 512x1024 54 275x0.9 76.5
53 sp_net 128x224 0.55 275 4291.3
54 squeezenet 227x227 0.76 275x0.9 4826.3
55 squeezenet_pt 224x224 0.82 275x0.9 2847.2
56 ssd_adas_pruned_0_95 360x480 6.3 275x0.8 821.0
57 ssd_pedestrian_pruned_0_97 360x360 5.9 275x0.9 863.6
58 ssd_resnet_50_fpn_coco_tf 640x640 178.4 275x0.8 42.3
59 ssd_traffic_pruned_0_9 360x480 11.6 275 653.4
60 tiny_yolov3_vmss 416x416 5.46 275x0.9 1324.6
61 unet_chaos-CT_pt 512x512 23.3 275x0.9 91.4
62 vgg_16_tf 224x224 31 275x0.9 218.6
63 vgg_19_tf 224x224 39.3 275x0.9 182.7
64 vpgnet_pruned_0_99 480x640 2.5 275x0.9 720.1
65 yolov2_voc 448x448 34 275x0.8 220.8
66 yolov2_voc_pruned_0_66 448x448 11.6 275x0.8 558.5
67 yolov2_voc_pruned_0_71 448x448 9.9 275x0.8 654.1
68 yolov2_voc_pruned_0_77 448x448 7.8 275x0.8 794.0
69 yolov3_adas_pruned_0_9 256x512 5.5 275x0.8 835.4
70 yolov3_bdd 288x512 53.7 275x0.8 101.8
71 yolov3_voc 416x416 65.4 275x0.8 104.4
72 yolov3_voc_tf 416x416 65.6 275x0.8 104.7
73 yolov4_leaky_spp_m 416x416 60.1 275x0.8 110.0
74 yolov4_leaky_spp_m_pruned_0_36 416x416 38.2 275x0.8 112.2

U50LV Performance with 1E250 MHz DPUCAHX8L

The following table shows the throughput performance (in frames/sec or fps) for various neural network samples on U50LV Gen3x4 with DPUCAHX8L running at 1E@250 MHz.

Table 4. U50LV Performance with 1E250 MHz DPUCAHX8L
No Neural Network Input Size GOPS DPU Frequency (MHz) Performance (fps) (Multiple thread)
1 ENet_cityscapes_pt 512x1024 8.6 250 5.7
2 face_landmark 96x72 0.14 250 4592.4
3 face-quality 80x60 0.06 250 6584.9
4 face-quality_pt 80x60 0.06 250 6538.5
5 facerec_resnet20 112x96 3.5 250 295.3
6 facerec-resnet20_mixed_pt 112x96 3.5 250 293.8
7 facerec_resnet64 112x96 11 250 129.0
8 facereid-small_pt 80x80 0.09 250 4237.6
9 fpn 256x512 8.9 250 38.9
10 FPN_Res18_Medical_segmentation 320x320 45.3 250 14.2
11 FPN-resnet18_covid19-seg_pt 352x352 22.7 250 82.1
12 inception_resnet_v2_tf 299x299 26.4 250 32.2
13 inception_v1 224x224 3.2 250 322.5
14 inception_v1_tf 224x224 3 250 326.9
15 inception_v2 224x224 3.88 250 185.0
16 inception_v3 299x299 11.4 250 100.0
17 inception_v3_pt 299x299 5.7 250 99.7
18 inception_v3_tf 299x299 11.5 250 99.9
19 inception_v3_tf2 299x299 11.5 250 93.9
20 inception_v4 299x299 24.5 250 52.7
21 inception_v4_2016_09_09_tf 299x299 24.6 250 52.6
22 medical_seg_cell_tf2 128x128 5.3 250 114.0
23 MLPerf_resnet50_v1.5_tf 224x224 8.19 250 80.9
24 mlperf_ssd_resnet34_tf 1200x1200 433 250 6.2
25 mobilenet_1_0_224_tf2 224x224 1.1 250 1678.8
26 mobilenet_v1_0_5_160_tf 160x160 0.15 250 4660.9
27 mobilenet_v1_1_0_224_tf 224x224 1.1 250 1714.1
28 mobilenet_v2 224x224 0.6 250 1012.4
29 mobilenet_v2_1_0_224_tf 224x224 0.6 250 1008.8
30 mobilenet_v2_1_4_224_tf 224x224 1.2 250 742.2
31 multi_task 288x512 14.8 250 21.1
32 openpose_pruned_0_3 368x368 49.9 250 14.4
33 personreid-res50_pt 256x128 5.4 250 99.1
34 plate_detection 320x320 0.49 250 1104.5
35 rcan_pruned_tf 360x640 86.95 250 4.0
36 refinedet_baseline 480x360 123 250 26.3
37 RefineDet-Medical_EDD_tf 320x320 9.8 250 145.2
38 refinedet_pruned_0_8 360x480 25 250 64.6
39 refinedet_pruned_0_92 360x480 10.1 250 76.2
40 refinedet_pruned_0_96 360x480 5.1 250 93.7
41 refinedet_VOC_tf 320x320 81.9 250 39.8
42 reid 80x160 0.95 250 569.5
43 resnet18 224x224 3.7 250 296.6
44 resnet50 224x224 7.7 250 80.5
45 resnet50_pt 224x224 4.1 250 81.2
46 resnet50_tf2 224x224 7.7 250 80.4
47 resnet_v1_101_tf 224x224 14.4 250 52.4
48 resnet_v1_152_tf 224x224 21.8 250 34.9
49 resnet_v1_50_tf 224x224 7 250 93.3
50 retinaface 360x640 1.11 250 197.1
51 salsanext_pt 64x2048 20.4 250 11.6
52 SemanticFPN_cityscapes_pt 256x512 10 250 35.6
53 SemanticFPN_Mobilenetv2_pt 512x1024 5.4 250 11.0
54 semantic_seg_citys_tf2 512x1024 54 250 5.4
55 sp_net 128x224 0.55 250 10520.8
56 squeezenet 227x227 0.76 250 793.6
57 squeezenet_pt 224x224 0.82 250 471.6
58 ssd_adas_pruned_0_95 360x480 6.3 250 118.3
59 ssdlite_mobilenet_v2_coco_tf 300x300 1.5 250 522.5
60 ssd_mobilenet_v1_coco_tf 300x300 2.5 250 854.4
61 ssd_mobilenet_v2 360x480 6.6 250 138.0
62 ssd_mobilenet_v2_coco_tf 300x300 3.8 250 210.1
63 ssd_pedestrian_pruned_0_97 360x360 5.9 250 109.0
64 ssd_traffic_pruned_0_9 360x480 11.6 250 111.3
65 vgg_16_tf 224x224 31 250 52.2
66 vgg_19_tf 224x224 39.3 250 46.1
67 vpgnet_pruned_0_99 480x640 2.5 250 74.0