Improving Performance Through the NoC - 2021.2 English

Versal ACAP System Integration and Validation Methodology Guide (UG1388)

Document ID
UG1388
Release Date
2021-11-19
Version
2021.2 English

To improve NoC performance, see the following common issues.

No Communication between Master and Slave

  • Check the NoC Connectivity tab. Is the master connected to the correct slaves?
  • Check the address editors to ensure the master is sending to the correct system address range. For example, is the traffic generator set up to have the correct address range matching the address editor?

    In case of the LPD connected to the NoC, verify that if the RPU uses this path. It can only access the first 32 bits of the address range and should not be used to access any resources (PL or DDR) above this range. AI Engine is an exception that the tools handle.

  • Check that the memory controller passed calibration. See this link in the Versal ACAP Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
  • Check that the PMC PLL reference clock frequency is correct and matches the CIPS Wizard input clock frequency.
  • Check the NoC frequency in the Clocks tab of CIPS.

Lower Than Expected Bandwidth

  • Check the Connectivity tab. For best performance, use all four NSU ports of the DDRMC.
  • Check the bandwidth values,the traffic class, and the following settings in the NoC QoS tab.
    Note: Bandwidth values return to default values if you change the Connectivity tab.
    • Best Effort: Default setting and the lowest priority. Use for general purpose masters.
    • Low Latency: High priority read-only setting. This setting has priority over Best Effort in NPS/DDRMC and is only recommended for used APU cache refills. Too much use can decrease system performance.
    • Isochronous: High priority setting. This setting has priority over Best Effort and Low Latency with a timer.
  • Check the master behavior.
    • Masters that issue short AXI bursts to random access generally have lower bandwidth.
    • Masters that request more bandwidth than allocated can cause other masters to have lower than expected bandwidth.
    • A single NMU cannot saturate the bandwidth of a DDRMC. Increase the number of masters issuing requests to increase bandwidth from the DDR.
    • Make sure you know the data width for the master (e.g., IP integrator might inherit the wrong data width), and maximize the data width if needed.
    • A short burst size must match the NoC packet.
  • Check the slave behavior.
    • The most common slave for the NoC is the integrated DDRMC. For information, see the Versal ACAP Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
    • Verify that the bus width and AXI clock frequencies match between the ingress master to the NoC and the egress slave connection.
    • Check for large average latency stackup. Measure this at both the master and the slave.
      • Start with the slave. If there is already excessive latency, the issue is with the slave.
      • If the slave and master are within 5% and both have an average latency greater than 1000 clks, the issue is likely within the NoC, and the NoC might need to be constrained further.
    • Perform a secondary analysis.
      • Inspect traffic through the NoC and look for switch contention, such as virtual channel (VC) assignment to QoS traffic classes.
      • Run integrated logic analyzer (ILA) on slave AXI4-Stream interfaces, and compute bus efficiency.
      • Redesign with AXI performance monitor (APM) on slave and master AXI4-Stream interfaces. Set up and capture extended metrics: bus efficiency, dead cycles, and slave-ready delays.
    • The DDRMC slaves contain information about activates, bus turn-arounds, etc. For information, see Versal ACAP Programmable Network on Chip and Integrated Memory Controller LogiCORE IP Product Guide (PG313).
Tip: You can measure NoC performance using the Xilinx open source ChipScoPy API. For more information, see the GitHub repository at http://www.github.com/Xilinx/chipscopy.