Currently, Xilinx devices on Data Center accelerator cards use stacked silicon consisting of several Super Logic Regions (SLRs) to provide device resources, including global memory. For best performance, when assigning ports to global memory banks, as described in Mapping Kernel Ports to Memory, it is best that the CU instance is assigned to the same SLR as the global memory it is connected to. In this case, you will want to manually assign the kernel instance, or CU into the same SLR as the global memory to ensure the best performance.
A CU can be assigned to an SLR using the
connectivity.slr option in a config file. The syntax of the
connectivity.slr option in the config file is as follows:
[connectivity] #slr=<compute_unit_name>:<slr_ID> slr=vadd_1:SLR2 slr=vadd_2:SLR3
<compute_unit_name>is an instance name of the CU as determined by the
connectivity.nkoption, described in Creating Multiple Instances of a Kernel, or is simply
<kernel_name>_1if multiple CUs are not specified.
<slr_ID>is the SLR number to which the CU is assigned, in the form SLR0, SLR1,...
The assignment of a CU to an SLR must be specified for each CU
separately, but is not required. If an assigned CU is connected to global memory located
in another SLR, the tool will automatically insert SLR crossing registers to help with
timing closure. In the absence of an SLR assignment, the
v++ linker is free to assign the CU to any SLR.
v++linking process by specifying the config file using the
v++ -l --config config_slr.cfg ...