Assigning DDR Bank in Host Code - 2023.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-12-13
Version
2023.2 English
Important: This is optional and only needed in specific cases as described below.

During the Vitis tool flow, the kernel port to memory bank connectivity can be established using the --connectivity.sp switch as described in Mapping Kernel Ports to Memory. The xclbin generated by v++ contains the information about the kernel port to memory connectivity so that XRT can allocate buffers appropriately. When a buffer is created in the host code, XRT automatically assigns the buffer to memory from the kernel xclbin, and manages the buffers internally. If a single kernel port is connected to multiple memory banks, XRT always starts from the lower numbered bank.

In most cases, this approach is sufficient. However, in some specific cases you may need to manually assign the buffer location (or special property) in the host code. For this purpose, the AMD OpenCL vendor extension provides a buffer extension called CL_MEM_XRT_PTR_XILINX to specifically manage bank assignment in the host code. The following code example shows the required header file and code for assigning input and output buffers to DDR bank 0 and bank 1:

#include <CL/cl_ext.h>
…
int main(int argc, char** argv) 
{
…
    cl_mem_ext_ptr_t inExt, outExt;  // Declaring two extensions for both buffers
    inExt.flags  = 0|XCL_MEM_TOPOLOGY; // Specify Bank0 Memory for input memory
    outExt.flags = 1|XCL_MEM_TOPOLOGY; // Specify Bank1 Memory for output Memory
    inExt.obj = 0   ; outExt.obj = 0; // Setting Obj and Param to Zero
    inExt.param = 0 ; outExt.param = 0;

    int err;
    //Allocate Buffer in Bank0 of Global Memory for Input Image using Xilinx Extension
    cl_mem buffer_inImage = clCreateBuffer(world.context, CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX,
            image_size_bytes, &inExt, &err);
    if (err != CL_SUCCESS){
        std::cout << "Error: Failed to allocate device Memory" << std::endl;
        return EXIT_FAILURE;
    }
    //Allocate Buffer in Bank1 of Global Memory for Input Image using Xilinx Extension
    cl_mem buffer_outImage = clCreateBuffer(world.context, CL_MEM_WRITE_ONLY | CL_MEM_EXT_PTR_XILINX,
            image_size_bytes, &outExt, NULL);
    if (err != CL_SUCCESS){
        std::cout << "Error: Failed to allocate device Memory" << std::endl;
        return EXIT_FAILURE;
    }
…
}

The extension pointer cl_mem_ext_ptr_t is a struct as defined below:

typedef struct{
    unsigned flags;
    void *obj;
    void *param;
  } cl_mem_ext_ptr_t;
  • Valid values for flags are:
    • XCL_MEM_DDR_BANK0
    • XCL_MEM_DDR_BANK1
    • XCL_MEM_DDR_BANK2
    • XCL_MEM_DDR_BANK3
    • <id> | XCL_MEM_TOPOLOGY
      Note: The <id> is determined by looking at the Memory Configuration section in the xxx.xclbin.info file generated next to the xxx.xclbin file. In the xxx.xclbin.info file, the global memory (DDR, HBM, PLRAM, etc.) is listed with an index representing the <id>.
  • obj is the pointer to the associated host memory allocated for the CL memory buffer only if CL_MEM_USE_HOST_PTR flag is passed to clCreateBuffer API, otherwise set it to NULL.
  • param is reserved for future use. Always assign it to 0 or NULL.

Here are some specific cases where you might want to use the extension pointer:

P2P Buffer
For an explanation and example, refer to https://xilinx.github.io/XRT/master/html/p2p.html.
Host-Memory Buffer
For an explanation and example, refer to https://xilinx.github.io/XRT/master/html/hm.html.
Allocating the host buffer to a specific bank when the kernel port is connected to multiple banks
For example, DDR[0:1]. This use case is described in detail in the Using Multiple DDR Banks lab of the Vitis Optimizing Accelerated FPGA Applications: Bloom Filter Example tutorial.

Example of Allocating the Host Buffer to a Specific Bank

An example of the third case listed above, where you might need to use cl_mem_ext_ptr_t, is when the host and kernel are both accessing the DDR bank simultaneously, and you would like to split the data so that kernel and host access memory banks in a ping-pong fashion. When the host is writing/reading to a specific memory bank, the kernel is writing/reading from another bank so that these host/kernel accesses don't compete and impact performance. For this scenario, you must manage the buffer allocation yourself.

The kernel ports in the xclbin are connected to DDR bank1 and bank2, and reading the data from these banks alternatively. The connectivity is established during linking by the Vitis compiler using the --connectivity.sp switch:

[connectivity]
sp=runOnfpga_1.input_words:DDR[1:2]

From the host code, you can send the input_words data to DDR banks 1 and 2 alternatively. Two AMD extension pointer (cl_mem_ext_ptr_t) objects are created as shown in the example code below. The object flags will determine which DDR bank each buffer will be assigned to for the kernel to access. The kernel argument can be set to input_words[0] and input_words[1] for consecutive kernel enqueues.

#include <CL/cl_ext.h>
…
int main(int argc, char** argv) 
{
cl_mem_ext_ptr_t buffer_words_ext[2];

buffer_words_ext[0].flags = 1 | XCL_MEM_TOPOLOGY; // DDR[1]
buffer_words_ext[0].param = 0;
buffer_words_ext[0].obj   = input_doc_words;
buffer_words_ext[1].flags = 2 | XCL_MEM_TOPOLOGY; // DDR[2]
buffer_words_ext[1].param = 0;
buffer_words_ext[1].obj   = input_doc_words;
…