Creating Traffic Generators in Python and C++ - 2022.1 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2022-05-25
Version
2022.1 English

Overview

Simulation and Emulation using external traffic generators can be run by launching the simulator/emulator and the traffic generator (TG) at the same time (in parallel). These TG can be written either in Python or in C++, using multi-threading capabilities of these two languages.

Writing a Traffic Generator in Python

Writing the traffic generator in Python requires various libraries to be imported:
# Mandatory
import os, sys

import multiprocessing as mp
import threading
import struct

from xilinx_xtlm import ipc_axis_master_util
from xilinx_xtlm import ipc_axis_slave_util
from xilinx_xtlm import xtlm_ipc

# Optionnal, just for ease of use
import numpy as np
import logging
Each port to/from the AI Engine array should be handled by a function that is launched as a separated process. First, a blocking transport utility should be created. The blocking transport utility will be used in the port handling function:
mm2s_util = ipc_axis_master_util("DataIn1")
self.s2mm_util = ipc_axis_slave_util("DataOut1")
The function handling the port (mm2s in the below example) should be launched as a separate process:
tx = mp.Process(target=mm2s)
tx.start()
At the end of the function the process should be stopped:
tx.join()
This is a blocking function that waits for the end of the function before ending the process.
A mechanism exists to communicate between the parent process and the child process: pipes. The parent process declares the pipe and the communication is operated using send and recv functions:
parent0, child0 = mp.Pipe()

child0.send(Tx_data)
Rx_data = parent0.recv()
If the port is an AI Engine to programmable logic port, the data must first be read from the port:
self.s2mm_util = ipc_axis_slave_util("DataOut1")
The variable payload is actually a structure that contains different fields:
  • data_length is the number of bytes of the data.
  • data is the data itself.
  • tlast is the TLAST flag which is set to true or false.
If the port is a programmable logic to AI Engine port, you first must create a packet:
payload = xtlm_ipc.axi_stream_packet()
Then, set the values of the different fields, and send it to the AI Engine array using the b_transpor method:
mm2s_util.b_transport(payload)

Formatting Data with Traffic Generators in Python

To emulate AXI4-Stream transactions AXI Traffic Generators require the payload data to be broken into appropriately sized bursts. For example, to send 128 bytes with a PLIO width of 32 bits (4 bytes) requires 128 bytes/4 bytes = 32 AXI4-Stream transactions. Converting between bytes arrays and AXI transactions can be handled in Python.

The Python struct library provides a mechanism to convert between Python and C data types. Specifically, the struct.pack and struct.unpack functions pack and unpack byte arrays according to a format string argument. The following table shows format strings for common C data types and PLIO widths.

For more information see: https://docs.python.org/3/library/struct.html

Table 1. Format Strings for C Data Types and PLIO Widths
Data Type PLIO Width Python Code Snippet
cfloat PLIO32 N/A
PLIO64 rVec = np.real(data)

iVec = np.imag(data)

out2column = np.zeros((L,2)).astype(np.single)

out2column.tobytes()

formatString = "<"+str(len(byte_arry)//4)+"f"

PLIO128
cint16 PLIO32 rVec = np.real(data).astype(np.int16)

iVec = np.imag(data).astype(np.int16)

formatString = "<"+str(len(byte_arry)//2)+"h"

PLIO64
PLIO128
int8 PLIO32 intvec = np.real(data).astype(np.int8)

formatString = "<"+str(len(byte_arry)//1)+"b"

PLIO64
PLIO128
int32 PLIO32 intvec = np.real(data).astype(np.int32)

formatString = "<"+str(len(byte_arry)//4)+"i"

PLIO64
PLIO128

Writing a Traffic Generator in C++

When using the C++ language to implement an external traffic generator, various headers are necessary to use some libraries. The Makefile dependencies are:
# Libraries directories
PROTO_PATH=$(XILINX_VIVADO)/data/simmodels/xsim/2022.1/lnx64/6.2.0/ext/protobuf/
IPC_XTLM= $(XILINX_VIVADO)/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/cpp/src/
IPC_XTLM_INC= $(XILINX_VIVADO)/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/cpp/inc/
LOCAL_IPC= $(IPC_XTLM)../

LD_LIBRARY_PATH:=$(XILINX_VIVADO)/data/simmodels/xsim/2022.1/lnx64/6.2.0/ext/protobuf/:$(XILINX_VIVADO)/lib/lnx64.o/Default:$(XILINX_VIVADO)/lib/lnx64.o/:$(LD_LIBRARY_PATH)

# Kernel directories
PLKERNELS_DIR := ../../pl_kernels
PLKERNELS := $(PLKERNELS_DIR)/polar_clip.cpp
PLHEADERS := $(PLKERNELS_DIR)/polar_clip.hpp $(PLKERNELS_DIR)/s2mm.hpp $(PLKERNELS_DIR)/mm2s.hpp

# XTLM source files
IPC_SRC := $(LOCAL_IPC)/src/axis/*.cpp $(LOCAL_IPC)/src/common/*.cpp $(LOCAL_IPC)/src/common/*.cc

# Compiler/linker flags
INC_FLAGS := -I$(LOCAL_IPC)/inc -I$(LOCAL_IPC)/inc/axis/ -I$(LOCAL_IPC)/inc/common/ -I$(PROTO_PATH)/include/ -I$(PLKERNELS_DIR) -I$(XILINX_HLS)/include
LIB_FLAGS := -L$(PROTO_PATH)/ -lprotobuf -L$(XILINX_VIVADO)/lib/lnx64.o/ -lrdizlib -L$(GCC)/../../lib64/ -lstdc++ -lpthread

# Compilation
compile: main.cpp $(PLHEADERS) $(PLKERNELS)
  $(GCC) -g main.cpp $(PLKERNELS) $(IPC_SRC) $(INC_FLAGS) $(LIB_FLAGS) -o chain
The headers useful for handling these libraries are:
# For the traffic generator
#include "xtlm_ipc.h"
#include <thread>
Depending if the traffic generator is a transmitter or a receiver (it can actually be both), the socket declaration will be different:
# Transmitter Traffic Generator
using b_init_socket = xtlm_ipc::axis_initiator_socket_util<xtlm_ipc::BLOCKING>;

# Receiver Traffic Generator
using b_targ_socket = xtlm_ipc::axis_target_socket_util<xtlm_ipc::BLOCKING>;
In this example, classes are used to handle the various functionality of the traffic generators:
class mm2s
{
std::thread m_thread;
std::unique_ptr<b_init_socket> m_socket_ptr;
int count;

void sock_data_handler()
{
m_socket_ptr = std::make_unique<b_init_socket>(m_sock_name);
std::vector<char> data_to_send;

while (count<512)
  {
  // Create a data to send ot the AI Engine Arra (vector of bytes)
  data_to_send = ...;

  m_socket_ptr->transport(data_to_send,count%128==127?true:false); // transport(data, tlast),   128 sample frame

  count++;
  }
}


protected :
// Name of the socket
const std::string m_sock_name;

public:
mm2s(const std::string sock_name) :
m_sock_name(sock_name), m_socket_ptr(nullptr),count(0)
{}

void run()
{
m_thread = std::thread(&mm2s::sock_data_handler, this);
}

// This function allows the user to check for the end of the transmission
int dataTransferred()
{
return(count);
}

// The destructor ends the thread
virtual ~mm2s()
{
std::cout << this->m_sock_name << " before join " << std::endl;
if(m_thread.joinable())
  m_thread.join();
std::cout << this->m_sock_name << " after join " << std::endl;
}
};
The main function is very simple as is meant only to start the various components of the traffic generator, while inserting some delays in between them to allow for the system to initialize without pushing too much:
int main(int argc, char *argv[])
{

mm2s chain_1_mm2s("DataIn1");
polar_clip chain_1_pc ("clip_in", "clip_out");
s2mm_chain_1_s2mm("DataOut1");

using namespace std::chrono_literals;

chain_1_mm2s.run();
std::cout << "Started mm2s " << std::endl;
std::this_thread::sleep_for(500ms);

chain_1_pc.run();
std::cout << "Started polar_clip " << std::endl;
std::this_thread::sleep_for(400ms);

chain_1_s2mm.run();
std::cout << "Started s2mm " << std::endl;

# Waits for the end of the simulation (1024 samples received from S2MM block)
while(chain_1_s2mm.dataTransferred()!=1024)
  {
  // Waits 2s before retesting
  std::this_thread::sleep_for(2s);
  }
return(0)
}

The interest of the C++ traffic generator is that you can use and test your HLS kernels as soon as they are created, without having to synthesize them in a .xo file. This allows you to add more and more realism and flexibility to your simulations without having to recreate a .xclbin file.