This message reports the kernel port that transfers the maximum data of the total amount of host data to the kernel.
Excessive data transfers can occur in some cases, as each successive execution of an algorithm requires only a small amount of additional data when compared to a previous call of an algorithm.
For example, in a 3x3 convolution matrix only one location is computed for each 9 values being communicated when brute force data transfer is implemented. However, when an image is processed, a single value is sufficient to be communicated if line buffers (internal memory banks) are deployed in the implementation.
To identify such situations, examine which port is consuming how much of the total amount of data being transferred. Ensure that this is not unnecessarily repeating data which could be stored between algorithm invocations.
Understanding the algorithm being implemented is key to achieving an optimized implementation on the accelerator. This is specifically true with respect of the interface requirements of the algorithm. If the same data is transferred multiple times through the interfaces, consider alternative implementations with temporary storage on the accelerator for an optimized data transfer.