The Xilinx® Alveo U50 Data Center accelerator cards are peripheral component interconnect express ( PCIe® ) Gen3x16 compliant and Gen4x8 compatible cards featuring the Xilinx 16 nm UltraScale+ technology. In this release, DPU is implemented in program logic for deep learning inference acceleration.
Refer to the following table for the throughput performance (in frames/sec or
fps) for various neural network samples on U50 Gen3x4 with DPU running at 6E@300 MHz.
Note: Some models cannot run at the highest frequency of DPU
and need DPU frequency reduction. See Setting Up the Host for DPU frequency reduction operation.
No | Neural Network | Input Size | GOPS | DPU Frequency (Mhz) | Performance (fps) (Multiple thread) |
---|---|---|---|---|---|
1 | inception_resnet_v2_tf | 299x299 | 26.4 | 300 | 173.2 |
2 | inception_v1_tf | 224x224 | 3.0 | 300 | 1195.8 |
3 | inception_v3_tf | 299x299 | 11.5 | 300 | 398.5 |
4 | inception_v4_2016_09_09_tf | 299x299 | 24.6 | 300 | 187.6 |
5 | mobilenet_v1_0_25_128_tf | 128x128 | 0.027 | N/A | N/A |
6 | mobilenet_v1_0_5_160_tf | 160x160 | 0.15 | N/A | N/A |
7 | mobilenet_v1_1_0_224_tf | 224x224 | 1.1 | N/A | N/A |
8 | mobilenet_v2_1_0_224_tf | 224x224 | 0.60 | N/A | N/A |
9 | mobilenet_v2_1_4_224_tf | 224x224 | 1.2 | N/A | N/A |
10 | resnet_v1_101_tf | 224x224 | 14.4 | 300 | 365.1 |
11 | resnet_v1_152_tf | 224x224 | 21.8 | 300 | 244.7 |
12 | resnet_v1_50_tf | 224x224 | 7.0 | 300 | 703.8 |
13 | vgg_16_tf | 224x224 | 31.0 | 300 | 164.7 |
14 | vgg_19_tf | 224x224 | 39.3 | 300 | 137 |
15 | ssd_mobilenet_v1_coco_tf | 300x300 | 2.5 | N/A | N/A |
16 | ssd_mobilenet_v2_coco_tf | 300x300 | 3.8 | N/A | N/A |
17 | ssd_resnet_50_fpn_coco_tf | 640x640 | 178.4 | 300x0.9 | 32.7 |
18 | yolov3_voc_tf | 416x416 | 65.6 | 300x0.9 | 79.2 |
19 | mlperf_ssd_resnet34_tf | 1200x1200 | 433 | N/A | N/A |
20 | resnet50 | 224x224 | 7.7 | 300 | 631.2 |
21 | resnet18 | 224x224 | 3.7 | 300 | 1430 |
22 | inception_v1 | 224x224 | 3.2 | 300 | 1183.3 |
23 | inception_v2 | 224x224 | 4.0 | 300 | 983.6 |
24 | inception_v3 | 299x299 | 11.4 | 300 | 405.4 |
25 | inception_v4 | 299x299 | 24.5 | 300 | 187.7 |
26 | mobilenet_v2 | 224x224 | 0.6 | N/A | N/A |
27 | squeezenet | 227x227 | 0.76 | 300 | 3016.1 |
28 | ssd_pedestrain_pruned_0_97 | 360x360 | 5.9 | 300 | 621.5 |
29 | ssd_traffic_pruned_0_9 | 360x480 | 11.6 | 300 | 433 |
30 | ssd_adas_pruned_0_95 | 360x480 | 6.3 | 300 | 629 |
31 | ssd_mobilenet_v2 | 360x480 | 6.6 | N/A | N/A |
32 | refinedet_pruned_0_8 | 360x480 | 25 | 300x0.9 | 193.6 |
33 | refinedet_pruned_0_92 | 360x480 | 10.1 | 300x0.9 | 420.6 |
34 | refinedet_pruned_0_96 | 360x480 | 5.1 | 300x0.9 | 617.2 |
35 | vpgnet_pruned_0_99 | 480x640 | 2.5 | 300 | 478.8 |
36 | fpn | 256x512 | 8.9 | 300 | 450.9 |
37 | sp_net | 128x224 | 0.55 | 300 | 1158.5 |
38 | openpose_pruned_0_3 | 368x368 | 49.9 | 300x0.9 | 29.1 |
39 | densebox_320_320 | 320x320 | 0.49 | 300 | 1929.2 |
40 | densebox_640_360 | 360x640 | 1.1 | 300 | 877.3 |
41 | face_landmark | 96x72 | 0.14 | 300 | 8513.7 |
42 | reid | 80x160 | 0.95 | 300 | 3612.9 |
43 | multi_task | 288x512 | 14.8 | 300 | 237.3 |
44 | yolov3_adas_pruned_0_9 | 256x512 | 5.5 | 300x0.9 | 642.9 |
45 | yolov3_voc | 416x416 | 65.4 | 300x0.9 | 79 |
46 | yolov3_bdd | 288x512 | 53.7 | 300x0.9 | 77.2 |
47 | yolov2_voc | 448x448 | 34 | 300x0.9 | 165.6 |
48 | yolov2_voc_pruned_0_66 | 448x448 | 11.6 | 300x0.9 | 409.6 |
49 | yolov2_voc_pruned_0_71 | 448x448 | 9.9 | 300x0.9 | 481.5 |
50 | yolov2_voc_pruned_0_77 | 448x448 | 7.8 | 300x0.9 | 585.4 |
51 | facerec_resnet20 | 112x96 | 3.5 | 300 | 1278.3 |
52 | facerec_resnet64 | 112x96 | 11.0 | 300 | 495.7 |
53 | plate_detection | 320x320 | 0.49 | 300 | 5135.8 |
54 | plate_recognition | 96x288 | 1.75 | N/A | N/A |
55 | FPN_Res18_Medical_segmentation | 320x320 | 45.3 | 300 | 103.1 |
56 | refinedet_baseline | 480x360 | 123 | 300x0.9 | 50 |
57 | resnet50_pt | 224x224 | 4.1 | 300 | 546.4 |
58 | squeezenet_pt | 224x224 | 0.82 | 300 | 2024.4 |
59 | inception_v3_pt | 299x299 | 5.7 | 300 | 405.3 |
The following table shows the throughput performance (in frames/sec or fps) for various neural network samples on U50lv Gen3x4 with DPU running at 9E@275 MHz.
No | Neural Network | Input Size | GOPS | DPU Frequency (Mhz) | Performance (fps) (Multiple thread) |
---|---|---|---|---|---|
1 | inception_resnet_v2_tf | 299x299 | 26.4 | 275 | 224.1 |
2 | inception_v1_tf | 224x224 | 3.0 | 275 | 1607.4 |
3 | inception_v3_tf | 299x299 | 11.5 | 275 | 549.7 |
4 | inception_v4_2016_09_09_tf | 299x299 | 24.6 | 275 | 256.5 |
5 | mobilenet_v1_0_25_128_tf | 128x128 | 0.027 | N/A | N/A |
6 | mobilenet_v1_0_5_160_tf | 160x160 | 0.15 | N/A | N/A |
7 | mobilenet_v1_1_0_224_tf | 224x224 | 1.1 | N/A | N/A |
8 | mobilenet_v2_1_0_224_tf | 224x224 | 0.60 | N/A | N/A |
9 | mobilenet_v2_1_4_224_tf | 224x224 | 1.2 | N/A | N/A |
10 | resnet_v1_101_tf | 224x224 | 14.4 | 275 | 458.1 |
11 | resnet_v1_152_tf | 224x224 | 21.8 | 275 | 305.9 |
12 | resnet_v1_50_tf | 224x224 | 7.0 | 275 | 880.6 |
13 | vgg_16_tf | 224x224 | 31.0 | 275 | 228.9 |
14 | vgg_19_tf | 224x224 | 39.3 | 275 | 189.9 |
15 | ssd_mobilenet_v1_coco_tf | 300x300 | 2.5 | N/A | N/A |
16 | ssd_mobilenet_v2_coco_tf | 300x300 | 3.8 | N/A | N/A |
17 | ssd_resnet_50_fpn_coco_tf | 640x640 | 178.4 | 275x0.9 | 42.6 |
18 | yolov3_voc_tf | 416x416 | 65.6 | 275x0.9 | 104 |
19 | mlperf_ssd_resnet34_tf | 1200x1200 | 433 | N/A | N/A |
20 | resnet50 | 224x224 | 7.7 | 275 | 802.5 |
21 | resnet18 | 224x224 | 3.7 | 275 | 1927.4 |
22 | inception_v1 | 224x224 | 3.2 | 275 | 1565.3 |
23 | inception_v2 | 224x224 | 4.0 | 275 | 1289.1 |
24 | inception_v3 | 299x299 | 11.4 | 275 | 552.4 |
25 | inception_v4 | 299x299 | 24.5 | 275 | 256.2 |
26 | mobilenet_v2 | 224x224 | 0.6 | N/A | N/A |
27 | squeezenet | 227x227 | 0.76 | 275 | 3767.1 |
28 | ssd_pedestrain_pruned_0_97 | 360x360 | 5.9 | 275 | 664.2 |
29 | ssd_traffic_pruned_0_9 | 360x480 | 11.6 | 275 | 483.3 |
30 | ssd_adas_pruned_0_95 | 360x480 | 6.3 | 275 | 715 |
31 | ssd_mobilenet_v2 | 360x480 | 6.6 | N/A | N/A |
32 | refinedet_pruned_0_8 | 360x480 | 25 | 275 | 235.6 |
33 | refinedet_pruned_0_92 | 360x480 | 10.1 | 275 | 514.7 |
34 | refinedet_pruned_0_96 | 360x480 | 5.1 | 275 | 725.8 |
35 | vpgnet_pruned_0_99 | 480x640 | 2.5 | 275 | 595.3 |
36 | fpn | 256x512 | 8.9 | 275x0.9 | 530.9 |
37 | sp_net | 128x224 | 0.55 | 275 | 2687.7 |
38 | openpose_pruned_0_3 | 368x368 | 49.9 | 275 | 43.3 |
39 | densebox_320_320 | 320x320 | 0.49 | 275 | 2431.2 |
40 | densebox_640_360 | 360x640 | 1.1 | 275 | 1074.4 |
41 | face_landmark | 96x72 | 0.14 | 275 | 11759.4 |
42 | reid | 80x160 | 0.95 | 275 | 5013.9 |
43 | multi_task | 288x512 | 14.8 | 275 | 192.2 |
44 | yolov3_adas_pruned_0_9 | 256x512 | 5.5 | 275x0.9 | 810 |
45 | yolov3_voc | 416x416 | 65.4 | 275x0.9 | 104.2 |
46 | yolov3_bdd | 288x512 | 53.7 | 275x0.9 | 103 |
47 | yolov2_voc | 448x448 | 34 | 275x0.9 | 227.5 |
48 | yolov2_voc_pruned_0_66 | 448x448 | 11.6 | 275x0.9 | 565.2 |
49 | yolov2_voc_pruned_0_71 | 448x448 | 9.9 | 275x0.9 | 662.6 |
50 | yolov2_voc_pruned_0_77 | 448x448 | 7.8 | 275x0.9 | 807.8 |
51 | facerec_resnet20 | 112x96 | 3.5 | 275 | 1760.9 |
52 | facerec_resnet64 | 112x96 | 11.0 | 275 | 663.7 |
53 | plate_detection | 320x320 | 0.49 | 275 | 5563.8 |
54 | plate_recognition | 96x288 | 1.75 | N/A | N/A |
55 | FPN_Res18_Medical_segmentation | 320x320 | 45.3 | 275 | 140.2 |
56 | refinedet_baseline | 480x360 | 123 | 275 | 70.5 |
57 | resnet50_pt | 224x224 | 4.1 | 275 | 768.1 |
58 | squeezenet_pt | 224x224 | 0.82 | 275 | 2540.6 |
59 | inception_v3_pt | 299x299 | 5.7 | 275 | 551.5 |
The following table shows the throughput performance (in frames/sec or fps) for various neural network samples on U50lv Gen3x4 with DPU running at 10E@275 MHz.
No | Neural Network | Input Size | GOPS | DPU Frequency (Mhz) | Performance (fps) (Multiple thread) |
---|---|---|---|---|---|
1 | inception_resnet_v2_tf | 299x299 | 26.4 | N/A | N/A |
2 | inception_v1_tf | 224x224 | 3.0 | 275x0.9 | 1552.5 |
3 | inception_v3_tf | 299x299 | 11.5 | N/A | N/A |
4 | inception_v4_2016_09_09_tf | 299x299 | 24.6 | N/A | N/A |
5 | mobilenet_v1_0_25_128_tf | 128x128 | 0.027 | N/A | N/A |
6 | mobilenet_v1_0_5_160_tf | 160x160 | 0.15 | N/A | N/A |
7 | mobilenet_v1_1_0_224_tf | 224x224 | 1.1 | N/A | N/A |
8 | mobilenet_v2_1_0_224_tf | 224x224 | 0.60 | N/A | N/A |
9 | mobilenet_v2_1_4_224_tf | 224x224 | 1.2 | N/A | N/A |
10 | resnet_v1_101_tf | 224x224 | 14.4 | 275x0.9 | 458.9 |
11 | resnet_v1_152_tf | 224x224 | 21.8 | 275x0.9 | 306.5 |
12 | resnet_v1_50_tf | 224x224 | 7.0 | 275x0.9 | 882.93 |
13 | vgg_16_tf | 224x224 | 31.0 | 275x0.9 | 229.3 |
14 | vgg_19_tf | 224x224 | 39.3 | 275x0.9 | 189.9 |
15 | ssd_mobilenet_v1_coco_tf | 300x300 | 2.5 | N/A | N/A |
16 | ssd_mobilenet_v2_coco_tf | 300x300 | 3.8 | N/A | N/A |
17 | ssd_resnet_50_fpn_coco_tf | 640x640 | 178.4 | 275x0.8 | 41.8 |
18 | yolov3_voc_tf | 416x416 | 65.6 | 275x0.8 | 102.4 |
19 | mlperf_ssd_resnet34_tf | 1200x1200 | 433 | N/A | N/A |
20 | resnet50 | 224x224 | 7.7 | 275x0.9 | 802.5 |
21 | resnet18 | 224x224 | 3.7 | 275x0.9 | 1934.5 |
22 | inception_v1 | 224x224 | 3.2 | 275x0.9 | 1536.6 |
23 | inception_v2 | 224x224 | 4.0 | 275x0.9 | 1314 |
24 | inception_v3 | 299x299 | 11.4 | N/A | N/A |
25 | inception_v4 | 299x299 | 24.5 | N/A | N/A |
26 | mobilenet_v2 | 224x224 | 0.6 | N/A | N/A |
27 | squeezenet | 227x227 | 0.76 | 275x0.9 | 3451.1 |
28 | ssd_pedestrain_pruned_0_97 | 360x360 | 5.9 | 275x0.9 | 755.2 |
29 | ssd_traffic_pruned_0_9 | 360x480 | 11.6 | 275x0.9 | 570.8 |
30 | ssd_adas_pruned_0_95 | 360x480 | 6.3 | 275x0.9 | 818.2 |
31 | ssd_mobilenet_v2 | 360x480 | 6.6 | N/A | N/A |
32 | refinedet_pruned_0_8 | 360x480 | 25 | 275x0.9 | 273.8 |
33 | refinedet_pruned_0_92 | 360x480 | 10.1 | 275x0.9 | 574.8 |
34 | refinedet_pruned_0_96 | 360x480 | 5.1 | 275x0.9 | 795.1 |
35 | vpgnet_pruned_0_99 | 480x640 | 2.5 | 275 | 659 |
36 | fpn | 256x512 | 8.9 | 275x0.9 | 552.2 |
37 | sp_net | 128x224 | 0.55 | 275 | 1707 |
38 | openpose_pruned_0_3 | 368x368 | 49.9 | 275x0.8 | 39.7 |
39 | densebox_320_320 | 320x320 | 0.49 | 275 | 2572.7 |
40 | densebox_640_360 | 360x640 | 1.1 | 275 | 1125.1 |
41 | face_landmark | 96x72 | 0.14 | 275 | 12917.2 |
42 | reid | 80x160 | 0.95 | 275 | 5548.1 |
43 | multi_task | 288x512 | 14.8 | 275x0.9 | 177 |
44 | yolov3_adas_pruned_0_9 | 256x512 | 5.5 | 275x0.8 | 771.3 |
45 | yolov3_voc | 416x416 | 65.4 | 275x0.8 | 102.2 |
46 | yolov3_bdd | 288x512 | 53.7 | 275x0.8 | 100.6 |
47 | yolov2_voc | 448x448 | 34 | 275x0.8 | 223.3 |
48 | yolov2_voc_pruned_0_66 | 448x448 | 11.6 | 275x0.8 | 547.6 |
49 | yolov2_voc_pruned_0_71 | 448x448 | 9.9 | 275x0.8 | 639.1 |
50 | yolov2_voc_pruned_0_77 | 448x448 | 7.8 | 275x0.8 | 770.9 |
51 | facerec_resnet20 | 112x96 | 3.5 | 275 | 1943.4 |
52 | facerec_resnet64 | 112x96 | 11.0 | 275 | 736.4 |
53 | plate_detection | 320x320 | 0.49 | 275 | 5521.4 |
54 | plate_recognition | 96x288 | 1.75 | N/A | N/A |
55 | FPN_Res18_Medical_segmentation | 320x320 | 45.3 | 275x0.9 | 139.8 |
56 | refinedet_baseline | 480x360 | 123 | N/A | N/A |
57 | resnet50_pt | 224x224 | 4.1 | 275 | 764.6 |
58 | squeezenet_pt | 224x224 | 0.82 | 275x0.9 | 2393.2 |
59 | inception_v3_pt | 299x299 | 5.7 | N/A | N/A |