Write code in such a way that bursting can be inferred. Ensure that none of the preconditions are violated.
Bursting does not mean that you will get all your data in one shot – it is about merging the requests together into one request, but the data will arrive sequentially, one after another.
Burst length of 16 is ideal, but even burst lengths of 8 are enough. Bigger bursts have more latency while shorter bursts can be pipelined. Do not confuse bursting with pipelining, but note that bursts can be pipelined with other bursts.
If your bursts are of fixed length, you can unroll the inner loop where bursts are inferred and pipeline the outer loop. This will achieve the same burst length, but also pipelining between the bursts to enable higher throughput.
For greater throughput, focus on widening the interface up to 512 bits rather than simply achieving longer bursts.
Bigger bursts have higher priority with the AXI interconnect. No dynamic arbitration is done inside the kernel.
You can have two
m_axi ports connected to
same DDR to model mutually exclusive access inside kernel, but the AXI interconnect
outside the kernel will arbitrate competing requests.
One way to get around the out-of-order access restriction is to create your own buffer in BRAM, store the bursts in this buffer and then use this buffer to do out of order accesses. This is typically called a line buffer and is a common optimization used in video processing.
Review the Burst Optimization section of the Synthesis Summary report to learn more about burst optimizations in the
design, and missed burst opportunities. If automatic burst is not occurring in your
design, you may want to use the
hls::burst_maxi data type for manual
burst, as described in Using Manual Burst.