任务与通道 - 2023.2 简体中文

Vitis 高层次综合用户指南 (UG1399)

Document ID

UG1399

Release Date

2023-12-18

Version

2023.2 简体中文

原 DATAFLOW 模型允许您写入顺序函数，然后要求 AMD Vitis™ HLS 工具识别数据流进程（任务）并使其并行执行、分析和管理依赖关系、执行标量传输以及最优化（如阵列到串流）。或者，使用 hls::task 对象则要求您显式例化任务和通道，并在您的算法设计内自行管理并行度。hls::task 的用途是定义编程模型，该模型支持仅使用串流数据通道的并行任务。任务不受函数调用/返回的控制，但只要输入串流中存在数据就会运行任务。

提示： hls::task 库提供了并发语义，以使 C 语言仿真与 RTL 保持一致。这样即可消除顺序数据流模型中存在的部分问题。

任务与通道示例如下所示。您可以看到，其中仅使用串流接口（hls::stream 或 hls::stream_of_blocks）。您也可以看到，顶层函数使用 hls_thread_local 关键字来定义任务和串流通道。

void func1(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
  int data = in.read();
  if (data >= 10)
    out1.write(data);
  else
    out2.write(data);
}
void func2(hls::stream<int> &in, hls::stream<int> &out) {
  out.write(in.read() + 1);
}
void func3(hls::stream<int> &in, hls::stream<int> &out) {
  out.write(in.read() + 2);
}
void top-func(hls::stream<int> &in, hls::stream<int> &out1, hls::stream<int> &out2) {
  hls_thread_local hls::stream<int> s1; // channel connecting t1 and t2
  hls_thread_local hls::stream<int> s2; // channel connecting t1 and t3
 
  hls_thread_local hls::task t1(func1, in, s1, s2); // t1 infinitely runs func1, with input in and outputs s1 and s2
  hls_thread_local hls::task t2(func2, s1, out1);   // t2 infinitely runs func2, with input s1 and output out1
  hls_thread_local hls::task t3(func3, s2, out2);   // t3 infinitely runs func3, with input s2 and output out2
}

hls::task 对象为变量，这些变量应声明为 hls_thread_local 以便在上述示例中的例化函数 (top_func) 的多次调用之间，使变量和底层线程保持活动状态。任务对象会隐式管理连续运行函数的线程，例如以上示例中的 func1、func2 或 func3。该函数为任务主体，其周围具有隐含的无限循环。

对于每个 hls::task，必须为其传递一组实参，其中包含函数名称、输入和输出通道 hls::streams 或 hls::stream_of_blocks。hls::task 对象通常只能读写串流通道 hls::stream 和 hls::stream_of_blocks。

hls::task 和与之相连的通道都必须声明为 hls_thread_local。这些通道必须声明为 hls_thread_local 以便在顶层函数的多次调用之间使其保持活动状态。非串流数据（例如，标量和阵列变量）必须全部为任务函数的本地数据，不得作为实参来传递，下文稳定的 M_AXI 和 S_AXILITE 访问中另有声明除外。

重要：如在其中包含 hls_task.h，则会导致 hls::stream 和 hls::stream_of_blocks 读取调用在 C 语言仿真中发生阻塞。这意味着先前依赖于读取空串流的代码现在将在仿真期间导致死锁。

未同步的指针到阵列与标量 I/O 访问

您还可将标量值（本地实参和顶层实参）与指针都传递到顶层函数中的阵列实参，前提是以 STABLE 编译指示或指令来对其加以标记，如数据驱动的 TLP 中未同步的 I/O 中所述。您还必须注意，确保内核执行期间，这些值从不更改（如果在顶层单独例化 hls::task 且无常规数据流进程，那么要确保这些值不更改几乎是不可能的），亦或内核行为与这些实参的值发生更改的时机之间不存在依赖关系。例如，该进程可以承受值在任意时间点发生更改，或者使用其他基于串流的同步机制来调节其访问。

标量值通过参考来传递：

void test(hls::stream<int> &in, hls::stream<int> &out, int &n)

对于r C/RTL 协同仿真，必须使用 cosim.enable_tasks_with_m_axi 命令启用含 m_axi 协议和 s_axilite 偏移的稳定的顶层指针，如协同仿真配置中所述。

以下所示 hls::task 设计示例含有稳定的按参考标量实参，其行为大体上对于该实参值的精确更改时机不敏感：

void task1(hls::stream<int> &in, hls::stream<int> &out) {
...
}

void task2(hls::stream<int> &in, hls::stream<int> &out) {
...
}

void task3(hls::stream<int> &in, hls::stream<int> &out, int &n) {
  int c = in.read();
  out.write(c + n);
}

void test(hls::stream<int> &in, hls::stream<int> &out, int &n) {
#pragma HLS stable variable=n
  HLS_TASK_STREAM<int> s1;
  HLS_TASK_STREAM<int> s2;
  HLS_TASK t1(task1, in, s1);
  HLS_TASK t2(task2, s1, s2);
  HLS_TASK t3(task3, s2, out, n);
}

以下示例所示的 hls::task 设计中具有稳定的 m_axi 指针实参，包含在顶层函数内。对底层 DRAM 缓冲器的任何访问都与函数进程取消同步。if (mem) 语句可用于确保仅当主机代码用 DRAM 中缓冲器的地址完成偏移寄存器的初始化后，才能访问 DRAM 缓冲器。

提示：由于 m_axi 接口的偏移寄存器会自动使用 ap_none 协议，因此仅当再次执行 write_process 时，C++ 和 RTL 才将重新读取其值。

...
void write_process(hls::stream<int>& in,         hls::stream<int>& out, int* mem)
{
#pragma HLS PIPELINE style=flp
...
  if (mem) {
    mem[...] = ...;
...
    ... = mem[...];
  }
...
}
...
void stable_pointer(int* mem,    hls::stream<int>& in,        hls::stream<int>& out)
{
#pragma HLS INTERFACE mode=m_axi port=mem ...
#pragma HLS stable variable=mem

    hls_thread_local hls::stream<int> int_fifo("int_fifo");
    hls_thread_local hls::stream<int> int_fifo2("int_fifo2");

    hls_thread_local hls::task t1(process_23, in, int_fifo);
    hls_thread_local hls::task t2(process_11, int_fifo, int_fifo2);
    hls_thread_local hls::task t3(write_process, int_fifo2, out, mem);
}

使用刷新流水线

总体上，hls::task 设计必须使用刷新流水线 (flp) 或自由运行的流水线 (frp)，如刷新流水线和流水线类型中所述。非刷新流水线会在进程执行之间引入依赖关系，这可能导致意外的死锁。

注释：您可在 hls::tasks 中使用 syn.compile.pipeline_flush_in_task 来配置默认刷新行为，如编译选项中所述。

嵌套任务

在以下示例中，task2 中使用了 2 个 task1 的实例，这两个实例也例化为 hls::task 实例。这表示，hls::task 的主体不仅可作为顺序函数，也可作为仅包含 hls::task 对象的函数。

void task1(hls::stream<int> &in, hls::stream<int> &out) {
  hls_thread_local hls::stream<int> s1;
 
  hls_thread_local hls::task t1(func2, in, s1);  
  hls_thread_local hls::task t2(func3, s1, out);
}
void task2(hls::stream<int> &in1, hls::stream<int> &in2, hls::stream<int> &out1, hls::stream<int> &out2) {
  hls_thread_local hls::task tA(task1, in1, out1);
  hls_thread_local hls::task tB(task1, in2, out2);
}

hls_thread_local 的使用仍然是必需的，目的是为了确保中间网络 tA 和 tB（在此示例中均为 task1 的实例）的多次例化的安全性；此外还用于确保叶级进程 t1（位于用于执行 func2 的不同副本的 tA 和 tB 内）及 t2（tA 和 tB 内）的实例的安全性。

仿真与协同仿真

任务与通道模型的 C 语言仿真行为将与 C/RTL 协同仿真行为相同。原先允许读取空串流，仅显示警告以声明此状况可能在仿真期间导致挂起。在 Vitis HLS 2022.2 中，读取空串流可能导致死锁，即使在 C 语言仿真中也是如此，因此现在这属于错误，并显示如下消息：

在包含 hls::task 对象的设计中：

ERROR [HLS SIM]: deadlock detected when simulating hls::tasks. 
Execute C-simulation in debug mode in the GUI and examine the source code 
location of all the blocked hls::stream::read() calls

在不使用 hls::task 的设计中：

ERROR [HLS SIM]: an hls::stream is read while empty, which may result in 
RTL simulation hanging. If this is not expected, execute C simulation in debug mode
in the GUI and examine the source code location of the blocked hls::stream::read() 
call to debug. If this is expected, add -DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE to 
-cflags to turn this error into a warning and allow empty hls::stream reads to return
 the default value for the data type.

提示：给 -cflags 添加 -DHLS_STREAM_READ_EMPTY_RETURNS_GARBAGE 即可将此错误转为警告