边界像素 - 2021.2 Chinese

Vitis 高层次综合用户指南 (UG1399)

Document ID

UG1399

Release Date

2021-12-15

Version

2021.2 Chinese

算法的最后一步是将边缘像素复制到边界区域内。同样，为确保数据流不中断和数据复用，算法使用 hls::stream 和高速缓存。

下图显示了边界采样与图像的对齐方式。

每个采样都是从垂直卷积的 vconv 输出读取的。
随后，采样将缓存为 4 种可能的像素类型之一。
随后，采样将写入输出串流。

图 1. 串流边界采样

判定边界像素的位置的代码为：

Border:for (int i = 0; i < height; i++) {
 for (int j = 0; j < width; j++) {
  T pix_in, l_edge_pix, r_edge_pix, pix_out;
#pragma HLS PIPELINE
 if (i == 0 || (i > border_width && i < height - border_width)) {
   if (j < width - (K - 1)) {
     pix_in = vconv.read();
     borderbuf[j] = pix_in;
   }
   if (j == 0) {
     l_edge_pix = pix_in;
   }
   if (j == width - K) {
     r_edge_pix = pix_in;
   }
 }
 if (j <= border_width) {
   pix_out = l_edge_pix;
 } else if (j >= width - border_width - 1) {
    pix_out = r_edge_pix;
 } else {
    pix_out = borderbuf[j - border_width];
 }
 dst << pix_out;
 }
 }
}

此新代码的一个明显差异是在任务内广泛使用条件。这使任务在流水打拍后可持续不断处理数据，且条件结果不会影响流水线的执行：结果将影响输出值，但只要输入采样可用，流水线就会持续处理。

这种有利于 FPGA 的算法的最终代码使用了以下最优化指令。

template<typename T, int K>
static void convolution_strm(
int width, 
int height,
hls::stream<T> &src, 
hls::stream<T> &dst,
const T *hcoeff, 
const T *vcoeff)
{
#pragma HLS DATAFLOW
#pragma HLS ARRAY_PARTITION variable=linebuf dim=1 type=complete

hls::stream<T> hconv("hconv");
hls::stream<T> vconv("vconv");
// These assertions let HLS know the upper bounds of loops
assert(height < MAX_IMG_ROWS);
assert(width < MAX_IMG_COLS);
assert(vconv_xlim < MAX_IMG_COLS - (K - 1));

// Horizontal convolution 
HConvH:for(int col = 0; col < height; col++) {
 HConvW:for(int row = 0; row < width; row++) {
#pragma HLS PIPELINE
   HConv:for(int i = 0; i < K; i++) {
 }
 }
}
// Vertical convolution 
VConvH:for(int col = 0; col < height; col++) {
 VConvW:for(int row = 0; row < vconv_xlim; row++) {
#pragma HLS PIPELINE
#pragma HLS DEPENDENCE variable=linebuf type=inter dependent=false
   VConv:for(int i = 0; i < K; i++) {
 }
}

Border:for (int i = 0; i < height; i++) {
 for (int j = 0; j < width; j++) {
#pragma HLS PIPELINE
 }
}

按采样级别对每项任务进行流水打拍。行缓冲器完全分区到寄存器中，以确保不存在因块 RAM 不足而导致的读写限制。行缓冲器还需要依赖关系指令。所有任务都在数据流区域中执行，这将确保任务可并发运行。hls::streams 将作为含 1 个元素的 FIFO 自动实现。