To leverage computational parallelism during the implementation of an algorithm on the FPGA, it should be mentioned that the synthesis tool will need to be able to recognize computational parallelism from the source code first. Loops and functions are prime candidates for reflecting computational parallelism and compute units in the source description. However, even in this case, it is key to verify that the implementation takes advantage of the computational parallelism as in some cases the Vitis technology might not be able to apply the desired transformation due to the structure of the source code.
It is quite common, that some computational parallelism might not be reflected in the source code to begin with. In this case, it will need to be added. A typical example is a kernel that might be described to operate on a single input value, while the FPGA implementation might execute computations more efficiently in parallel on multiple values. This kind of parallel modeling is described in Task Parallelism.
A 512-bit interface can be created using OpenCL vector data types such as
or C/C++ arbitrary precision data type
ap_int<512>. These vector types can also be used as a powerful way
to model data parallelism within a kernel, with up to 16 data paths operating in
parallel in case of
int16. Refer to the Median Filter Example in the vision category at Xilinx Getting Started Example on GitHub for
the recommended method to use vectors.