Example PS code for Packet Switching - 2023.2 English

Vitis Tutorials: AI Engine

Document ID
Release Date
2023.2 English

The PS code for hardware emulation and hardware flows is in sw/host.cpp. You can review the code. It opens the XCLBIN using the following code.

     // Open xclbin
     auto dhdl = xrtDeviceOpen(0);//device index=0
     xuid_t uuid;
     xrtDeviceGetXclbinUUID(dhdl, uuid);

It allocates buffers for mm2s kernels and s2mm kernels:

    // output memory
    xrtBufferHandle out_bo1 = xrtBOAlloc(dhdl, mem_size, 0, /*BANK=*/0);
    int *host_out1 = (int*)xrtBOMap(out_bo1);
    // input memory
    xrtBufferHandle in_bo1 = xrtBOAlloc(dhdl, mem_size, 0, /*BANK=*/0);
    int *host_in1 = (int*)xrtBOMap(in_bo1);

It initializes the input memory and then syncs the input memory:

    // initialize input memory
    for(int i=0;i<mem_size/sizeof(int);i++){
    // sync input memory
    xrtBOSync(in_bo1, XCL_BO_SYNC_BO_TO_DEVICE , mem_size,/*OFFSET=*/ 0);

Then it starts the output kernels and input kernels:

    // start output kernels
    xrtKernelHandle s2mm_k1 = xrtPLKernelOpen(dhdl, uuid, "s2mm:{s2mm_1}");
    xrtRunHandle s2mm_r1 = xrtRunOpen(s2mm_k1);
    xrtRunSetArg(s2mm_r1, 0, out_bo1);
    xrtRunSetArg(s2mm_r1, 2, mem_size/sizeof(int));
    xrtKernelHandle hls_packet_receiver_k = xrtPLKernelOpen(dhdl, uuid, "hls_packet_receiver");
    xrtRunHandle hls_packet_receiver_r = xrtRunOpen(hls_packet_receiver_k);
    xrtRunSetArg(hls_packet_receiver_r, 5, total_packet_num);
    // start input kernels
    xrtKernelHandle mm2s_k1 = xrtPLKernelOpen(dhdl, uuid, "mm2s:{mm2s_1}");
    xrtRunHandle mm2s_r1 = xrtRunOpen(mm2s_k1);
    xrtRunSetArg(mm2s_r1, 0, in_bo1);
    xrtRunSetArg(mm2s_r1, 2, mem_size/sizeof(int));
    xrtKernelHandle hls_packet_sender_k = xrtPLKernelOpen(dhdl, uuid, "hls_packet_sender");
    xrtRunHandle hls_packet_sender_r = xrtRunOpen(hls_packet_sender_k);
    xrtRunSetArg(hls_packet_sender_r, 5, packet_num);

Then it starts the graph:

    // start graph
    adf::registerXRT(dhdl, uuid);
    gr.run(2); //Iteration number=2. The amount of data matches for PL kernels and graph

Then it waits for s2mm kernels to complete, and syncs output memory:

    // wait for s2mm to complete
    // sync output memory
    xrtBOSync(out_bo1, XCL_BO_SYNC_BO_FROM_DEVICE , mem_size,/*OFFSET=*/ 0);   

Then, finally, it performs post-processing and releases objects.

Note that there is no special packet switching handling in the PS code. It is already done on the AI Engine and PL side.