The following table for an example system uses an improvement multiplier rather than the efficiency figure. For example, for 64 B read only transactions, the measured bandwidth without RAMA is 4225 MB/s, while with RAMA it is 40730 MB/s, thus an almost 10 times improvement in bandwidth.
Access Type | 32 B | 64 B | 128 B | 256 B | 512 B |
---|---|---|---|---|---|
Read Only | 10 | 10 | 5 | 3 | 2 |
Write Only | 2 | 2 | 1.5 | 1 | 1 |
Read/Write | 3 | 3 | 2 | 1 | 1 |
- The number of memories used.
- How much of the HBM Subsystem switch is spanned (that is, how many memories are accessed by each port).
- The transaction size.
For some cases for a ratio of 1 port to 2 memories a further two-fold performance increase can be seen.
Latency
Latency figures should be considered carefully. The RAMA IP adds latency to an individual transaction due to data buffering and re-ordering. However, due to bandwidth improvements, using the RAMA IP means the time between transaction request and completion is, in general, much shorter. The following table below shows mean latency figures for 2000 Read Only and Write Only transactions.
Transaction Size | Read Only (AXI Clock Cycles) | Write Only (AXI Clock Cycles) | ||
---|---|---|---|---|
Without RAMA | With RAMA | Without RAMA | With RAMA | |
32 | 225 | 597 | 44 | 591 |
64 | 247 | 532 | 46 | 134 |
128 | 240 | 497 | 55 | 49 |
256 | 263 | 512 | 77 | 78 |
512 | 304 | 564 | 119 | 137 |
To illustrate why latency figures can be misleading, consider the following: a given number of read transactions of 32 bytes in size can take 100 μs to complete without RAMA. This means that the last transaction would be delayed by almost 100 μs after it could have been issued by the master. Because bandwidth is 10 times better for 32 bytes using RAMA, the same number of transactions would be completed within 10 μs, plus latency added by buffering and reordering in the RAMA IP (in this case typically 1.3 μs).