Interrupts and Error Handling

Versal ACAP CPM CCIX Architecture Manual (AM016)

Document ID
AM016
Release Date
2020-11-24
Revision
1.1 English

Errors detected by a component in CMN are classified into three main categories as shown in the following table:

Table 1. Interrupts and Error Handling
Error Type Description Examples Action taken by hardware
Correctable Errors Errors that can be corrected using ECC or other methods.
  • A single-bit ECC error in any of the RAMs
  • An error that is recovered using a local retry

1. Logs the error.

2. Counts the occurrence of these errors.

3. Signals the error to the global RAS block that can be controlled using a threshold count.

Deferred Errors Uncorrectable errors detected in one node of the CMN, but the data is not used within the same node, and poison bits are set for the data. With these errors, the system can typically operate for a period of time without being corrupted. A request packet received with an unsupported opcode.

1. Sends a response with a RespErr value of data error or non-data error.

2. Logs the error.

3. Signals error to the global RAS block within the CMN.

Uncorrectable Fatal Errors These are errors in the control logic in a node, where continuing operation might corrupt the system beyond recovery.
  • A double-bit ECC error in a data read from snoop filter
  • A packet received with an error in the target ID
  • An internal logic error

1. Logs the error.

2. Signals the error to the global RAS block.

The global RAS (reliability, availability, and serviceability) block signals these error interrupts which are handled by the CCIX firmware. The firmware is responsible for generating the protocol error reporting (PER) message packet, as appropriate. Details on PER generation are covered in CCIX Capable PCIe Controller.