An advantage of using the FPGA over other compute devices for OpenCL programs is the ability for the application programmer
to customize the memory architecture all throughout the system and into the compute unit. By
default, the Vitis compiler generates a memory
architecture within the compute unit that maximizes local and private memory bandwidth based
on static code analysis of the kernel code. Further optimization of these memories is
possible based on attributes in the kernel source code, which can be used to specify
physical layouts and implementations of local and private memories. The attribute in the
Vitis compiler to control the physical layout of
memories in a compute unit is
For one-dimensional arrays, the XCL_ARRAY_PARTITION attribute implements an
array declared within kernel code as multiple physical memories instead of a single physical
memory. The selection of which partitioning scheme to use depends on the specific
application and its performance goals. The array partitioning schemes available in the
Vitis compiler are
Place the attribute with the definition of the array variable.
__attribute__((xcl_array_partition(<type>, <factor>, <dimension>)))
<type>: Specifies one of the following partition types:
cyclic: Cyclic partitioning is the implementation of an array as a set of smaller physical memories that can be accessed simultaneously by the logic in the compute unit. The array is partitioned cyclically by putting one element into each memory before coming back to the first memory to repeat the cycle until the array is fully partitioned.
block: Block partitioning is the physical implementation of an array as a set of smaller memories that can be accessed simultaneously by the logic inside the compute unit. In this case, each memory block is filled with elements from the array before moving on to the next memory.
complete: Complete partitioning decomposes the array into individual elements. For a one-dimensional array, this corresponds to resolving a memory into individual registers. The default
<factor>: For cyclic type partitioning, the
<factor>specifies how many physical memories to partition the original array into in the kernel code. For block type partitioning, the
<factor>specifies the number of elements from the original array to store in each physical memory.Important: For
completetype partitioning, the
<factor>> is not specified.
<dimension>: Specifies which array dimension to partition. Specified as an integer from 1 to <N>. Vitis core development kit supports arrays of N dimensions and can partition the array on any single dimension.
For example, consider the following array declaration.
The integer array, named buffer, stores 16 values that are 32-bits wide each. Cyclic partitioning can be applied to this array with the following declaration.
int buffer __attribute__((xcl_array_partition(cyclic,4,1)));
In this example, the cyclic
<partition_type> attribute tells the Vitis compiler to distribute the contents of the array among four physical
memories. This attribute increases the immediate memory bandwidth for operations accessing
the array buffer by a factor of four.
All arrays inside a compute unit in the context of the Vitis core development kit are capable of sustaining a maximum of two concurrent accesses. By dividing the original array in the code into four physical memories, the resulting compute unit can sustain a maximum of eight concurrent accesses to the array buffer.
Using the same integer array as found in Example 1, block partitioning can be applied to the array with the following declaration.
int buffer __attribute__((xcl_array_partition(block,4,1)));
Because the size of the block is four, the Vitis compiler will generate four physical memories, sequentially filling each memory with data from the array.
Using the same integer array as found in Example 1, complete partitioning can be applied to the array with the following declaration.
int buffer __attribute__((xcl_array_partition(complete, 1)));
In this example, the array is completely partitioned into distributed RAM, or 16 independent registers in the programmable logic of the kernel. Because complete is the default, the same effect can also be accomplished with the following declaration.
int buffer __attribute__((xcl_array_partition));
While this creates an implementation with the highest possible memory bandwidth, it is not suited to all applications. The way in which data is accessed by the kernel code through either constant or data dependent indexes affects the amount of supporting logic that the Vitis compiler has to build around each register to ensure functional equivalence with the usage in the original code. As a general best practice guideline for the Vitis core development kit, the complete partitioning attribute is best suited for arrays in which at least one dimension of the array is accessed through the use of constant indexes.