Writing Out Data - 2023.2 English

Vitis Tutorials: AI Engine (XD100)

Document ID

XD100

Release Date

2024-03-05

Version

2023.2 English

The writeOut_row function is defined in the following example. It is structurally similar to readIn_row.

void writeOut_row(hls::stream<qdma_axis<128, 0, 0, 0>> &strm_out,
                  cmpxDataOut out[MAT_COLS]
                 )
{
   #if FFT_2D_DT == 0 // cint16 datatype
      LOOP_FFT_ROW_WRITE_OUT:for(int j = 0; j < MAT_COLS; j += 4) {
         #pragma HLS PIPELINE II=1
         #pragma HLS loop_tripcount min=16 max=512
         qdma_axis<128, 0, 0, 0> qdma;
         
         cmpxDataOut tmp;
         tmp = out[j];

         qdma.data.range( 15,   0) = real(tmp).range(15, 0);
         qdma.data.range( 31,  16) = imag(tmp).range(15, 0);

         tmp = out[j+1];

         qdma.data.range( 47,  32) = real(tmp).range(15, 0);
         qdma.data.range( 63,  48) = imag(tmp).range(15, 0);

         tmp = out[j+2];

         qdma.data.range( 79,  64) = real(tmp).range(15, 0);
         qdma.data.range( 95,  80) = imag(tmp).range(15, 0);

         tmp = out[j+3];

         qdma.data.range(111,  96) = real(tmp).range(15, 0);
         qdma.data.range(127, 112) = imag(tmp).range(15, 0);
         
         strm_out.write(qdma);
      }
   
   #else // cfloat datatype
      LOOP_FFT_ROW_WRITE_OUT:for(int j = 0; j < MAT_COLS; j += 2) {
         #pragma HLS PIPELINE II=1
         #pragma HLS loop_tripcount min=32 max=1024
         
         qdma_axis<128, 0, 0, 0> qdma;
         
         AXI_DATA rowOut;
         cmpxDataOut tmp;
         
         tmp = out[j];
         rowOut.fl_data[0] = real(tmp);
         rowOut.fl_data[1] = imag(tmp);
         
         tmp = out[j+1];
         rowOut.fl_data[2] = real(tmp);
         rowOut.fl_data[3] = imag(tmp);
         
         qdma.data.range( 63,  0) = rowOut.data[0];
         qdma.data.range(127, 64) = rowOut.data[1];
         
         strm_out.write(qdma);
      }
   #endif
}

The fft_2d kernel specifies HLS pragmas to help optimize the kernel code and adhere to interface protocols. See this page for detailed documentation of all HLS pragmas. A summary of the HLS pragmas used in this kernel is given in the following table.

Switch	Description
#pragma HLS INTERFACE	In C/C++ code, all input and output operations are performed, in zero time, through formal function arguments. In a RTL design, these same input and output operations must be performed through a port in the design interface and typically operate using a specific input/output (I/O) protocol. For more information, see this page.
#pragma HLS PIPELINE II=1	Reduces the initiation interval (II) for a function or loop by allowing the concurrent execution of operations. The default type of pipeline is defined by the `config_compile -pipeline_style` command, but can be overridden in the `PIPELINE` pragma or directive. For more information, see this page.
#pragma HLS dataflow	The `DATAFLOW` pragma enables task-level pipelining, allowing functions and loops to overlap in their operation, increasing the concurrency of the RTL implementation and increasing the overall throughput of the design. For more information, see this page.
#pragma HLS array_reshape	The `ARRAY_RESHAPE` pragma reforms the array with vertical remapping and concatenating elements of arrays by increasing bit widths. This reduces the amount of block RAM consumed while providing parallel access to the data. This pragma creates a new array with fewer elements but with greater bit width, allowing more data to be accessed in a single clock cycle. For more information, see this page.

PL Data Mover Kernel