Writing Out Data - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID
XD100
Release Date
2024-03-05
Version
2023.2 English

The writeOut_row function is defined in the following example. It is structurally similar to readIn_row.

void writeOut_row(hls::stream<qdma_axis<128, 0, 0, 0>> &strm_out,
                  cmpxDataOut out[MAT_COLS]
                 )
{
   #if FFT_2D_DT == 0 // cint16 datatype
      LOOP_FFT_ROW_WRITE_OUT:for(int j = 0; j < MAT_COLS; j += 4) {
         #pragma HLS PIPELINE II=1
         #pragma HLS loop_tripcount min=16 max=512
         qdma_axis<128, 0, 0, 0> qdma;
         
         cmpxDataOut tmp;
         tmp = out[j];

         qdma.data.range( 15,   0) = real(tmp).range(15, 0);
         qdma.data.range( 31,  16) = imag(tmp).range(15, 0);

         tmp = out[j+1];

         qdma.data.range( 47,  32) = real(tmp).range(15, 0);
         qdma.data.range( 63,  48) = imag(tmp).range(15, 0);

         tmp = out[j+2];

         qdma.data.range( 79,  64) = real(tmp).range(15, 0);
         qdma.data.range( 95,  80) = imag(tmp).range(15, 0);

         tmp = out[j+3];

         qdma.data.range(111,  96) = real(tmp).range(15, 0);
         qdma.data.range(127, 112) = imag(tmp).range(15, 0);
         
         strm_out.write(qdma);
      }
   
   #else // cfloat datatype
      LOOP_FFT_ROW_WRITE_OUT:for(int j = 0; j < MAT_COLS; j += 2) {
         #pragma HLS PIPELINE II=1
         #pragma HLS loop_tripcount min=32 max=1024
         
         qdma_axis<128, 0, 0, 0> qdma;
         
         AXI_DATA rowOut;
         cmpxDataOut tmp;
         
         tmp = out[j];
         rowOut.fl_data[0] = real(tmp);
         rowOut.fl_data[1] = imag(tmp);
         
         tmp = out[j+1];
         rowOut.fl_data[2] = real(tmp);
         rowOut.fl_data[3] = imag(tmp);
         
         qdma.data.range( 63,  0) = rowOut.data[0];
         qdma.data.range(127, 64) = rowOut.data[1];
         
         strm_out.write(qdma);
      }
   #endif
}

The fft_2d kernel specifies HLS pragmas to help optimize the kernel code and adhere to interface protocols. See this page for detailed documentation of all HLS pragmas. A summary of the HLS pragmas used in this kernel is given in the following table.

Switch Description
#pragma HLS INTERFACE In C/C++ code, all input and output operations are performed, in zero time, through formal function arguments. In a RTL design, these same input and output operations must be performed through a port in the design interface and typically operate using a specific input/output (I/O) protocol. For more information, see this page.
#pragma HLS PIPELINE II=1 Reduces the initiation interval (II) for a function or loop by allowing the concurrent execution of operations. The default type of pipeline is defined by the config_compile -pipeline_style command, but can be overridden in the PIPELINE pragma or directive. For more information, see this page.
#pragma HLS dataflow The DATAFLOW pragma enables task-level pipelining, allowing functions and loops to overlap in their operation, increasing the concurrency of the RTL implementation and increasing the overall throughput of the design. For more information, see this page.
#pragma HLS array_reshape The ARRAY_RESHAPE pragma reforms the array with vertical remapping and concatenating elements of arrays by increasing bit widths. This reduces the amount of block RAM consumed while providing parallel access to the data. This pragma creates a new array with fewer elements but with greater bit width, allowing more data to be accessed in a single clock cycle. For more information, see this page.
PL Data Mover Kernel