Example PL Kernels for Packet Switching - 2022.2 English

Vitis Tutorials: AI Engine Development

Document ID
XD100
Release Date
2022-12-01
Version
2022.2 English

This section describes how the PL kernels can generate and decode packet headers, and how to distribute packets to the corresponding destinations. HLS example code is provided, and hardware emulation and hardware flows can be run.

The packet switching feature does not have a dependency on the PL kernel types (HLS, Verilog, etc) and their design structure. It just has requirements around the packet format and how the packet ID works as described in the previous sections.

The system design structure of the example is as shown in the following image.

graph

The previous section introduced the AI Engine side. It receives packets from one PLIO (AXI4-Stream interface), and distributes the packets to different AI Engine kernels. Then all AI Engine outputs are packed with packet headers automatically and sent to one PLIO.

In this example, the PL kernel mm2s1 sends raw data to the HLS packet sender module, and the HLS packet sender module generates packets that match the packet switching requirements. It goes through the AI Engine kernel, core[0] (aie/aie_core1.cpp). Then the HLS packet receiver module decodes the packet header and sends the raw data to the PL kernel, s2mm1. Similarly, PL kernel, mm2s2, sends a message to PL kernel, s2mm2. And it is the same for mm2s3 to s2mm3 and mm2s4 to s2mm4.

Only the HLS packet sender module and HLS packet receiver module deal with the packet IDs generated by the AI Engine compiler. Other PL kernels focus on the data processing.

In this example, the four mm2s kernels are created by the --nk option of Vitis (v++) linker. The same applies for the s2mm kernels. You can look at system.cfg to see how all PL kernels are created and connected:

[connectivity]
nk=s2mm:4:s2mm_1.s2mm_2.s2mm_3.s2mm_4
nk=mm2s:4:mm2s_1.mm2s_2.mm2s_3.mm2s_4
nk=hls_packet_sender:1:hls_packet_sender_1
nk=hls_packet_receiver:1:hls_packet_receiver_1
stream_connect=hls_packet_sender_1.out:ai_engine_0.Datain0
stream_connect=ai_engine_0.Dataout0:hls_packet_receiver_1.in

stream_connect=mm2s_1.s:hls_packet_sender_1.s0
stream_connect=mm2s_2.s:hls_packet_sender_1.s1
stream_connect=mm2s_3.s:hls_packet_sender_1.s2
stream_connect=mm2s_4.s:hls_packet_sender_1.s3
stream_connect=hls_packet_receiver_1.out0:s2mm_1.s
stream_connect=hls_packet_receiver_1.out1:s2mm_2.s
stream_connect=hls_packet_receiver_1.out2:s2mm_3.s
stream_connect=hls_packet_receiver_1.out3:s2mm_4.s

Next review the HLS packet sender module in pl_kernels/hls_packet_sender.cpp. You can review the packet format in the previous section if necessary. The packet ID is generated by the function, generateHeader. Pay special attention to how it sends the packet header and reads data from the corresponding PL kernels:

#include "hls_stream.h"
#include "ap_int.h"
#include "ap_axi_sdata.h"
#include "packet_ids_c.h"

static const unsigned int pktType=0;
static const int PACKET_NUM=4; //How many kernels do packet switching
static const int PACKET_LEN=8; //Length for a packet

static const unsigned int packet_ids[PACKET_NUM]={Datain0_0, Datain0_1, Datain0_2, Datain0_3}; //macro values are generated in packet_ids_c.h

ap_uint<32> generateHeader(unsigned int pktType, unsigned int ID){
#pragma HLS inline
  ap_uint<32> header=0;
  header(4,0)=ID;
  header(11,5)=0;
  header(14,12)=pktType;
  header[15]=0;
  header(20,16)=-1;//source row
  header(27,21)=-1;//source column
  header(30,28)=0;
  header[31]=header(30,0).xor_reduce()?(ap_uint<1>)0:(ap_uint<1>)1;
  return header;
}

void hls_packet_sender(hls::stream<ap_axiu<32,0,0,0>> &s0,hls::stream<ap_axiu<32,0,0,0>> &s1,hls::stream<ap_axiu<32,0,0,0>> &s2,hls::stream<ap_axiu<32,0,0,0>> &s3,
  hls::stream<ap_axiu<32,0,0,0>> &out, const unsigned int num){
  for(unsigned int iter=0;iter<num;iter++){
    for(int i=0;i<PACKET_NUM;i++){//Iterate on PL kernels that do packet switching
      unsigned int ID=packet_ids[i];
      ap_uint<32> header=generateHeader(pktType,ID); //packet header
      ap_axiu<32,0,0,0> tmp;
      tmp.data=header;
      tmp.keep=-1;
      tmp.last=0;
      out.write(tmp);
      for(int j=0;j<PACKET_LEN;j++){ //packet data
        switch(i){//based on which kernel is sending packet, read the corresponding stream
        case 0:tmp=s0.read();break;
        case 1:tmp=s1.read();break;
        case 2:tmp=s2.read();break;
        case 3:tmp=s3.read();break;
        }
        if(j==PACKET_LEN-1){
          tmp.last=1; //last word in a packet has TLAST=1
        }else{
          tmp.last=0;
        }
        out.write(tmp);
      }
    }
  }
}

Now, review the HLS packet receiver module in pl_kernels/hls_packet_receiver.cpp. The packet ID is retrieved from the packet header by the function, getPacketId. Note how it sends the packet data to the corresponding PL kernels:

#include "hls_stream.h"
#include "ap_int.h"
#include "ap_axi_sdata.h"
#include "packet_ids_c.h"

static const int PACKET_NUM=4;
static const int PACKET_LEN=8;

static const unsigned int packet_ids[PACKET_NUM]={Dataout0_0, Dataout0_1, Dataout0_2, Dataout0_3};

unsigned int getPacketId(ap_uint<32> header){
#pragma HLS inline
  ap_uint<32> ID=0;
  ID(4,0)=header(4,0);
  return ID;
}

void hls_packet_receiver(hls::stream<ap_axiu<32,0,0,0>> &in, hls::stream<ap_axiu<32,0,0,0>> &out0,hls::stream<ap_axiu<32,0,0,0>> &out1,hls::stream<ap_axiu<32,0,0,0>> &out2,hls::stream<ap_axiu<32,0,0,0>> &out3,
  const unsigned int total_num_packet){
    for(unsigned int iter=0;iter<total_num_packet;iter++){
      ap_axiu<32,0,0,0> tmp=in.read();//first word is packet header
      unsigned int ID=getPacketId(tmp.data);
      unsigned int channel=packet_ids[ID];
      for(int j=0;j<PACKET_LEN;j++){
        tmp=in.read();
        switch(channel){
        case 0:out0.write(tmp);break;
        case 1:out1.write(tmp);break;
        case 2:out2.write(tmp);break;
        case 3:out3.write(tmp);break;
        }
      }
  }
}

Note that for both packet sender and packet receiver, the packet IDs are read from packet_ids_c.h, which is generated by the AI Engine compiler. Therefore, it requires that the AI Engine compilation is completed before the PL kernel compilation. Or, if packet IDs are changed when the AI Engine side has had any change, it requires the PL kernels to be re-compiled.