Performance Improvement

The following table for an example system uses an improvement multiplier rather than the efficiency figure. For example, for 64 B read only transactions, the measured bandwidth without RAMA is 4225 MB/s, while with RAMA it is 40730 MB/s, thus an almost 10 times improvement in bandwidth.

Table 1. Random Access Performance Improvement
Access Type	32 B	64 B	128 B	256 B	512 B
Read Only	10	10	5	3	2
Write Only	2	2	1.5	1	1
Read/Write	3	3	2	1	1

Note: Results shown are for a specific test scenario with four AXI masters, each randomly accessing four HBM pseudo-channels. The relative improvements quoted are the results with RAMA IP on each master compared to without RAMA IP on each master.

Note: It should be noted that for random access there can be significant advantage in using a higher ratio of memories enabled to ports connected. This is dependent upon:

The number of memories used.
How much of the HBM Subsystem switch is spanned (that is, how many memories are accessed by each port).
The transaction size.

For some cases for a ratio of 1 port to 2 memories a further two-fold performance increase can be seen.

Latency

Latency figures should be considered carefully. The RAMA IP adds latency to an individual transaction due to data buffering and re-ordering. However, due to bandwidth improvements, using the RAMA IP means the time between transaction request and completion is, in general, much shorter. The following table below shows mean latency figures for 2000 Read Only and Write Only transactions.

Table 2. RAMA IP Latency
Transaction Size	Read Only (AXI Clock Cycles)		Write Only (AXI Clock Cycles)
Transaction Size	Without RAMA	With RAMA	Without RAMA	With RAMA
32	225	597	44	591
64	247	532	46	134
128	240	497	55	49
256	263	512	77	78
512	304	564	119	137

To illustrate why latency figures can be misleading, consider the following: a given number of read transactions of 32 bytes in size can take 100 μs to complete without RAMA. This means that the last transaction would be delayed by almost 100 μs after it could have been issued by the master. Because bandwidth is 10 times better for 32 bytes using RAMA, the same number of transactions would be completed within 10 μs, plus latency added by buffering and reordering in the RAMA IP (in this case typically 1.3 μs).

Performance Improvement - 1.1 English

RAMA 1.1 LogiCORE IP Product Guide (PG310)

Latency