xfcvDataMovers - 2023.2 English

Vitis Libraries

Release Date
2023-12-20
Version
2023.2 English

xfcvDataMovers class provides a high level API abstraction to initiate data transfer from DDR to AIE core and vice versa for hw-emulation / hw runs. Because each AIE core has limited amount of local memory which is not sufficient to fit in entire high resolution images (input / output), each image needs to be partitioned into smaller tiles and then send to AIE core for computation. After computation the tiled image at output is stitched back to generate the high resolution image at the output. This process involves complex computation as tiling needs to ensure proper border handling and overlap processing in case of convolution based kernels.

xfcvDataMovers class object takes input some simple parameters from users and provides a simple data transaction API where user does not have to bother about the complexity. Moreover it provides a template parameter using which application can switch from PL based data movement to GMIO based (and vice versa) seamlessly.

Table 254 Table. xfcvDataMovers Template Parameters
Parameter Description
KIND Type of object TILER / STITCHER
DATA_TYPE Data type of AIE core kernel input or output
TILE_HEIGHT_MAX Maximum tile height
TILE_WIDTH_MAX Maximum tile width
AIE_VECTORIZATION_FACTOR AIE core vectorization factor
CORES Number of AIE cores to be used
PL_AXI_BITWIDTH For PL based data movers. It is the data width for AXI transfers between DDR - PL
USE_GMIO Set to true to use GMIO based data transfer
Table 255 Table. xfcvDataMovers constructor parameters
Parameter Description
overlapH Horizontal overlap of the AIE core / pipeline
overlapV Vertical overlap of the AIE core / pipeline

Note

Horizontal overlap and Vertical overlaps should be computed for the complete pipeline. For example if the pipeline has a single 3x3 2D filter then overlap sizes (both horizontal and vertical) will be 1. However in case of two such filter operations which are back to back the overlap size will be 2. Currently if it is expected from users to provide this input correctly.

The data transfer using xfcvDataMovers class can be done in one out of 2 ways.

  1. PLIO data movers

    This is the default mode for xfcvDataMovers class operation. When this method is used, data is transferred using hardware Tiler / Stitcher IPs provided by Xilinx. The Makefile provided with designs examples shipped with the library provide location to .xo files for these IP’s. It also shows how to incorporate them in Vitis Build System. Having said that, user needs to create an object of xfcvDataMovers class per input / output image as shown in code below

    Important

    The implementations of Tiler and Stitcher for PLIO, are provided as .xo files in ‘L1/lib/hw’ folder. By using these files, you are agreeing to the terms and conditions specified in the LICENSE.txt file available in the same directory.

    int overlapH = 1;
    int overlapV = 1;
    xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> tiler(overlapH, overlapV);
    xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR> stitcher;
    

    Choice of MAX_TILE_HEIGHT / MAX_TILE_WIDTH provide constraints on image tile size which in turn governs local memory usage. The image tile size in bytes can be computed as below

    Image tile size = (TILE_HEADER_SIZE_IN_BYTES + MAX_TILE_HEIGHT*MAX_TILE_WIDTH*sizeof(DATA_TYPE))

    Here TILE_HEADER_SIZE_IN_BYTES is 128 bytes for current version of Tiler / Stitcher. DATA_TYPE in above example is int16_t (2 bytes).o

    Note

    Current version of HW data movers have 8_16 configuration (i.e. 8 bit image element data type on host side and 16 bit image element data type on AIE kernel side). In future more such configurations will be provided (example: 8_8 / 16_16 etc.)

    Tiler / Stitcher IPs use PL resources available on VCK boards. For 8_16 configuration below table illustrates resource utilization numbers for theese IPs. The numbers correspond to single instance of each IP.

    Table 256 Tiler / Stitcher resource utilization (8_16 config)
      LUTs FFs BRAMs DSPs Fmax
    Tiler 2761 3832 5 13 400 MHz
    Stitcher 2934 3988 5 7 400 MHz
    Total 5695 7820 10 20  
  2. GMIO data movers

    Transition to GMIO based data movers can be achieved by using a specialized template implementation of above class. All above constraints w.r.t Image tile size calculation are valid here as well. Sample code is shown below

    xF::xfcvDataMovers<xF::TILER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> tiler(1, 1);
    xF::xfcvDataMovers<xF::STITCHER, int16_t, MAX_TILE_HEIGHT, MAX_TILE_WIDTH, VECTORIZATION_FACTOR, 1, 0, true> stitcher;
    

    Note

    Last template parameter is set to true, implying GMIO specialization.

Once the objects are constructed, simple API calls can be made to initiate the data transfers. Sample code is shown below

//For PLIO
auto tiles_sz = tiler.host2aie_nb(src_hndl, srcImageR.size());
stitcher.aie2host_nb(dst_hndl, dst.size(), tiles_sz);

//For GMIO
auto tiles_sz = tiler.host2aie_nb(srcData.data(), srcImageR.size(), {"gmioIn[0]"});
stitcher.aie2host_nb(dstData.data(), dst.size(), tiles_sz, {"gmioOut[0]"});

Note

GMIO data transfers take additional argument which is corresponding GMIO port to be used.

Note

For GMIO based transfers there is a blocking method as well (host2aie(…) / aie2host(…)). For PLIO based data transfers the method only non-blocking API calls are provided.

Using ‘tile_sz’ user can run the graph appropriate number of times.

filter_graph.run(tiles_sz[0] * tiles_sz[1]);

After the runs are started, user needs to wait for all transactions to get complete.

filter_graph.wait();
tiler.wait();
stitcher.wait();

Note

Current implementation of xfcvDataMovers support only 1 core. Multi core support is planned for future releases.