Free-Running PE - 2023.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-12-13
Version
2023.2 English

The PEs in a pipeline operate synchronously on transactions passing through. For every compute() call a PE will start and stop exactly once. However, when a PE is marked as FREE_RUNNINGas described in Guidance Macros, it has the following hardware semantics:

  • The PE does not start, stop, or reset per transaction or compute() call. It is an HLS kernel with the ap_none control interface as described in Block-Level Control Protocols
  • The interface must have only AXI4-Stream arguments, or scalar inputs
  • Operation is data-driven, with the PE acting only on the input stream words, and unaware of the payload size of the transaction
  • The PE begins execution immediately after the hardware bitstream is programmed into the device
Figure 1. Free-Running

The figure above shows a diagram of the free-running PE. This accelerator contains two PEs, a LdStr PE that has global memory access, and the fsk_incr which is a free-running PE. In the compute() scope these PEs are connected by two AXI4-Stream interfaces: AS that moves words from LdStr to fsk_incr, and XS which is the feedback path.

The code for this example is provided below.

class fsk_acc : public VPP_ACC<fsk_acc, NCU>
{
    ZERO_COPY(A);
    ZERO_COPY(X);
    SYS_PORT(A, DDR[0]);
    SYS_PORT(X, DDR[0]);
    FREE_RUNNING (fsk_incr);
public:
    static void compute(DT* A, DT* X, int sz);
    static void loadstore(DT* A, DT* X, hls::stream<DT>& AS, 
                          hls::stream<DT>& XS);
    static void fsk_incr(hls::stream<DT>& AS, hls::stream<DT>& XS);
};
Void fsk_acc::compute(DT* A, DT* X, int sz)
{
    static vpp::stream<DT> AS, XS;
    ldst(A, X, sz, AS, XS);
    fsk_incr(AS, XS);
}
void fsk_acc::ldst(DT* A, DT* X, int sz, hls::stream<DT>& AS, 
                   hls::stream<DT>& XS)
{
    for (int i = 0; i < sz; i++) {
        AS.write(A[i]);
    }
    for (int i = 0; i < sz; i++) {
        XS.read(X[i]);
    }
}
void fsk_acc::fsk_incr(hls::stream<DT>& AS, hls::stream<DT>& XS)
{
    DT val;
    AS.read(val);
    XS.write(val + 1);
}

The LdSt PE operates on sz words reading and writing to the global memory ports A and X respectively. Whereas, the free-running PE fsk_incr is agnostic to sz, and reacts only to the words on the incoming AS stream.

The free-running semantics described earlier greatly simplifies the implementation of a free-running PE, often simplifying the FPGA usage and routing resources required. It enables the design of a streaming pipeline design where the intermediate PEs can be free-running, thereby operating only on the input AXI4-Stream.

With any pipeline composition, when hardware replication is enabled (NCU is more than 1), the hardware contains as many replicated pipelines, and compute() jobs run on available pipeline slot. Thus, the application layer remains simple and automates running data through multiple pipelines in the hardware.