A Vitis accelerated system includes a global memory subsystem that is used to share data between the kernels and the host application. Global memory available on the host system, outside of the Xilinx device, provides very large amounts of storage space but at the cost of longer access time compared to local memory on the Xilinx device. One of the measurements of the performance of a system/application is throughput, which is defined as the number of bytes transferred in a given time frame. Therefore, inefficient data transfers from/to the global memory will have a long memory access time which can adversely affect system performance and kernel execution time.
Development of accelerated applications in Vitis HLS should include two phases: kernel development, and improving system performance. Design Principles for Software Programmers suggested a kernel development approach implementing a cache-like Load-Compute-Store structure where the load-store functions read/write data to the global memory. Improving system performance involves implementing an efficient load and store design that can improve the kernel execution time. This chapter describes the features and metrics that can impact and improve the throughput of the load-store (LS) functions. Refer to Vitis-HLS-Introductory-Examples/Interface/Memory on Github for examples of some of the following concepts.