System-Level Considerations - 1.0 English

AXI High Bandwidth Memory Controller LogiCORE IP Product Guide (PG276)

Document ID
PG276
Release Date
2022-11-02
Version
1.0 English

The Global Addressing option allows for flexibility on accessing the HBM array by routing an AXI command from any ingress port which the AXI Switch then routes to the destination address. This routing is determined by the Stack Select bit, which is the most significant bit of the AXI address. The destination AXI port is then determined by the following four address bits. The AXI Port Assignments table in Port Descriptions gives a visual representation of the Stack and AXI Port mapping. As described in the Lateral AXI Switch Access Throughput Loss section, there are latency and performance implications when AXI commands are traversing the Switch. It is important to consider these limitations along with their user logic implementation and access pattern to determine how much traversal your use case can support and if Global Addressing is a viable option. If excessive traversal causes too many performance issues, it might be necessary to rearrange the traffic masters driving into the AXI ports to be closer to their destination memory controllers.

If a user application requires low latency the Global Addressing option should be disabled alongside the 64-entry deep AXI reordering queue. When Global Addressing is disabled the AXI commands are no longer routed from any ingress port to any destination. In this scenario, the AXI commands enter the ingress port and route directly to the pseudo channel attached to this AXI port. This bypasses the AXI Switch logic and enables the lowest latency path to the memory controller. Disabling AXI reordering also decreases latency because commands are directly consumed by the controller and reordered in the local 12-entry deep queue.

If AXI reordering is enabled, commands might sit in the queue for some time before they are serviced. Additionally, if a user application requires low latency, significant analysis must be performed on the AXI access pattern by the traffic masters and the HBM memory map. HBM follows the same basic protocol principles of DDR4, so with a well-defined access pattern, the HBM controller options and memory map should be reviewed to ensure the highest efficiency operation for a given use case. Each application and workload has a different solution because the optimal results are specific to that use case.

Depending on the application, the traffic masters might only issue a single AXI ID or multiple AXI IDs. If the master only generates a single AXI ID, transactions with the same ID are blocked on a channel level and execute in the order in which they are received. If multiple AXI IDs are generated, these are reordered within the AXI Reordering core if it is enabled. If the user logic does not manage AXI IDs or accesses to ensure coherency in these scenarios, enable the Enable Coherency in Reordering option.

Additional care should be taken when Global Addressing is enabled. The amount of time for an AXI access to navigate through the Switch is not deterministic because it is contending with all the other accesses and routing occurring in real time. In addition to the Enable Coherency in Reordering option, the user logic should manage this by waiting for the AXI write response signal (BRESP) before issuing a subsequent access that is dependent on the first access being accepted. Within the 12-entry command queue of the memory controller, coherency is always guaranteed because, as is the case with traditional DDR controllers, it does not break Write/Read order dependency for efficiency.

Whether reordering is performed or not and regardless of AXI ID, AXI commands will be returned in the order of arrival at the Memory Controller. If Global Addressing is disabled, the commands are generated from a single AXI master and the commands will be processed by the Memory Controller (potentially reordered for efficiency) and returned to the AXI master in the order received. If Global Addressing is enabled, the latency through the lateral switch can vary and each Memory Controller processes commands independently so the commands may not return in the same order as sent from the AXI master.

For user applications with small transaction sizes, highly random addressing, or if the traffic master can only generate a single AXI ID, consider using the Xilinx® Random Access Memory Attachment IP. The RAMA IP is specifically designed to assist HBM-based designs with non-ideal traffic masters and use cases. It is capable of resizing and reordering transactions to improve efficiency and bandwidth as well as generating substitution AXI IDs to prevent ID blocking if a master only generates a single AXI ID. More information on the RAMA IP can be found in the RAMA LogiCORE IP Product Guide (PG310).