Using ap_ctrl_none Inside the Dataflow - 2020.2 English

Vitis High-Level Synthesis User Guide (UG1399)

Document ID
UG1399
Release Date
2021-03-22
Version
2020.2 English

The ap_ctrl_none block-level I/O protocol avoids the rigid synchronization scheme implied by the ap_ctrl_hs and ap_ctrl_chain protocols. These protocols require that all processes in the region are executed exactly the same number of times in order to better match the C/C++ behavior.

However, there are situations where, for example, the intent is to have a faster process that executes more frequently to distribute work to several slower ones.

For any dataflow region (except "dataflow-in-loop"), it is possible to specify #pragma HLS interface ap_ctrl_none port=return as long as all the following conditions are satisfied:

  • The region and all the processes it contains communicates only via FIFOs (hls::stream, streamed arrays, AXIS); that is, excluding memories.

  • All the parents of the region, up to the top level design, must fit the following requirements:
    • They must be dataflow regions (excluding "dataflow-in-loop").
    • They must all specify ap_ctrl_none.

This means that none of the parents of a dataflow region with ap_ctrl_none in the hierarchy can be:

  • A sequential or pipelined FSM
  • A dataflow region inside a for loop ("dataflow-in-loop")

The result of this pragma is that ap_ctrl_chain is not used to synchronize any of the processes inside that region. They are executed or stalled based on the availability of data in their input FIFOs and space in their output FIFOs. For example:

void region(...) {
#pragma HLS dataflow
#pragma HLS interface ap_ctrl_none port=return
    hls::stream<int> outStream1, outStream2;
    demux(inStream, outStream1, outStream2);
    worker1(outStream1, ...);
    worker2(outStream2, ....);

In this example, demux can be executed twice as frequently as worker1 and worker2. For example, it can have II=1 while worker1 and worker2 can have II=2, and still achieving a global II=1 behavior.

Note:
  • Non-blocking reads may need to be used very carefully inside processes that are executed less frequently to ensure that C/C++ simulation works.
  • The pragma is applied to a region, not to the individual processes inside it.
  • Deadlock detection must be disabled in co-simulation. This can be done with the -disable_deadlock_detection option in cosim_design.