PMU Error Handling and Propagation Logic

Zynq UltraScale+ Device Technical Reference Manual (UG1085)

Document ID
UG1085
Release Date
2023-12-21
Revision
2.4 English

The PMU is responsible for capturing, reporting, and taking an appropriate action with respect to each error. Each system error is identified in the PMU_GLOBAL error status registers. The PMU also includes the necessary registers, logic, and interfaces for handling this functionality.

The PMU provides a collection of error input signals that route all system-level hardware errors to capture them. These errors are recorded in the error status registers 1 and 2 within the PMU and are not cleared even during a system reset or an internal POR. A captured error can only be cleared if a 1 is explicitly written to each corresponding error status bit. All errors can generate an interrupt to the PMU. This interrupt can be masked per error. The propagation of all errors to error status registers can be disabled by using the bits in the error enable registers (ERROR_EN_1 and ERROR_EN_2) global registers in the PMU.

PMU also includes registers that can capture software-generated errors. The software errors refer to the errors that occur during the execution of PMU ROM, PMU firmware, and the CSU ROM.

Similar to the hardware errors, software errors are recorded in the PMU and are cleared only by an external POR or explicitly by writing a 1 to its corresponding error status register bit. All but the software errors are recorded by the PMU during its pre-boot execution can generate an interrupt to the PMU. Similar to the hardware errors, this interrupt can be masked per error.

For each of the errors that are processed by the error handling logic, you can decide what action should be taken when the error occurs. The possible scenarios would be one or a combination of the following choices.

Assertion of the PS_ERROR_OUT signal on the device.

Generation of an interrupt to the PMU processor (PMU_Int).

Generation of a system reset (SRST).

Generation of a power-on-reset (POR).

There are four mask registers associated with each of the ERROR_STATUS registers (ERROR_STATUS_1 and ERROR_STATUS_2). These mask registers can be used to enable either POR, SRST, PMU interrupt (if firmware is installed), or signal a PS_ERROR_OUT. To set the mask, write a 1 to the appropriate bit on the ERROR_INT_EN register (ERROR_INT_EN_1 or ERROR_INT_EN_2). To clear the mask, write a 1 to the appropriate bit on the ERROR_INT_DIS register (ERROR_INT_DIS_1 or ERROR_INT_DIS_2). When selecting the option to interrupt the PMU when a specific error occurs, there should be user firmware to process the error. Otherwise, a no-firmware error will occur. The signal states can be unmasked as desired. Table: PMU Error Sources and Reset State Masks lists all possible sources of error and the corresponding reset state of the ERROR_SIG_MASK_n mask registers for the PS_ERROR_OUT device pin signal. All of the other error mask registers are set = 1 (masked).

Table 6-12:      PMU Error Sources and Reset State Masks

System Error

ERROR_SIG_MASK_n

 

ERROR_STATUS Register and [Bits]

JTAG Error Register

GIC IRQ

Description

Software Errors

CSU BootROM detected error

U

_2 [26]

[0]

~

BootROM in CSU experienced an error during boot, including bitstream authentication failure.

PMU ROM code preboot errors

U

_2 [25]

[1]

~

PMU ROM code experienced an error during the preboot process.

PMU ROM code service errors

U

_2 [24]

[2]

~

PMU ROM code experienced an error processing a service request.

PMU firmware defined interrupt bits.

U

_2 [21:18]

[6:3]

~

PMU user firmware reported an error code.

FSBL detected errors

 

 

 

~

 

Hardware Errors

PMU hardware errors

U

_2 [17]

[7]

 

PMU ROM validation, TMR fault, RAM UE ECC, or register address access error.

CSU error

U

_2 [16]

[8]

 

CSU hardware errors. Includes CSU ROM validation error.

PMU_PB

 

_2 [25]

 

 

 

PLL lock errors

M

_2 [12:8]

[13:9]

 

PMU unmasks these bits when PLL is functioning. An error is signaled when a PLL loses lock; bits are in ERROR_STATUS_2.

Generic PL errors

U

_2 [5:2]

[17:14]

 

Generic PL errors communicated to PS.

FPD bus timeout error

U

_2 [1]

[18]

153

OR of all timeout signals from the FPD AIB units; ABP and AXI.

LPD bus timeout error

U

_2 [0]

[19]

86

OR of all timeout signals from the LPD AIB units; ABP and AXI.

Clock monitor error

U

_1 [26]

[25]

60

Error from clock monitor logic.

FPD XMPU isolation error

U

_1 [25]

[26]

166

OR of violation signals from the FPD and DDRx XMPU protection units.

LPD XMPU isolation error

U

_1 [24]

[27]

120

OR of violation signals from the OCM XMPU and the XPPU protection units.

Power supply failures detected by PS SYSMON unit

U

_1 [23:16]

[35:28]

~

[16]: VCC_PSINTLP, [17]: VCC_PSINTFP, [18]: VCC_PSAUX, [19]: VCCO_PSDDR, [20]: VCC_PSIO3, [22]: VCC_PSIO0, [21]: VCC_PSIO1, [23]: VCC_PSIO2

FPD SWDT error

U

_1 [13]

[36]

145

Timeout error from the FPD SWDT.

LPD SWDT error

U

_1 [12]

[37]

84

Timeout error from the LPD SWDT.

RPU CCF

U

_1 [9]

[38]

~

All RPU CCFS OR'ed together after RPU_CCF_MASK register.

RPU lock-step errors

M

_1 [7:6]

[40:39]

~

RPU lock-step errors from RPU MPCore.

FPD over temperature

U

_1 [5]

[41]

~

FPD temperature near APU indicates a shutdown alert from the PS SysMon unit.

LPD over temperature

U

_1 [4]

[42]

~

LPD temperature near RPU indicates a shutdown alert from the PS SysMon unit.

RPU hardware errors

U

_1 [3:2]

[44:43]

45, 44

RPU0 or RPU1 error including both correctable and uncorrectable errors.

OCM uncorrectable ECC

M

_1 [1]

[45]

42

The OCM reported an uncorrectable ECC error during an OCM memory access.

DDR uncorrectable ECC

M

_1 [0]

[46]

 

The DDR reported an uncorrectable ECC error during a DDR memory access.

All the errors listed in Table: PMU Error Sources and Reset State Masks and the five reserved errors are also routed to the PL and are directly accessible through JTAG. In addition to these errors, the 74 bits of software errors from the PMU_PB_ERR, CSU_BR_ERR, and PMU_SERV_ERR registers are also accessible directly through JTAG. You can suppress the accessibility to these errors through JTAG permanently by blowing an eFUSE. Table: JTAG Error Register Description lists the assignment of errors in the JTAG status register and the error status interface to PL.

Note:   The eFUSE suppresses accessibility of the errors through JTAG, but the errors are accessible internal to the device.

Table 6-13:      JTAG Error Register Description

Error source

Bit on JTAG Error Status

Bit on Error Status to PL

CSU ROM error (same as bit 120).

0

0

PMU pre-boot error (same as bit 78).

1

1

PMU ROM service error (same as bit 99).

2

2

PMU firmware error (same as bits 103:100).

6:3

6:3

Uncorrectable PMU error.
Includes ROM validation, TMR, uncorrectable RAM ECC, and local register address errors.

7

7

CSU error.

8

8

PLL lock errors [VideoPLL, DDRPLL, APUPLL, RPUPLL, IOPLL].

13:9

13:9

PL generic errors passed to PS.

17:14

17:14

Full-power subsystem time-out error.

18

18

Low-power subsystem time-out error.

19

19

Reserved errors.

24:20

24:20

Clock monitor error.

25

25

XMPU errors [FPD XMPU, LPD XMPU].

27:26

27:26

Supply Detection Failure Errors

[VCCO_PSIO_2, VCCO_PSIO_1, VCCO_PSIO_0, VCCO_PSIO_3,

VCCO_PSDDR, VCC_PSAUX, VCC_PSINTFP, VCC_PSINTLP]

35:28

35:28

FPD System Watch-Dog Timer Error

36

36

LPD System Watch-Dog Timer Error

37

37

RPU CCF error

38

38

RPU Lockstep Error

40:39

40:39

FPD Temperature Shutdown Alert

41

41

LPD Temperature Shutdown Alert

42

42

RPU1 Error (Both Correctable and Uncorrectable Errors)

43

43

RPU0 Error (Both Correctable and Uncorrectable Errors)

44

44

OCM Uncorrectable ECC Error

45

45

DDR Uncorrectable ECC Error

46

46

PMU Preboot Errors (PMU_PB_ERR.PBERR_Data)

77:47

77:47

PMU Preboot Error Flag (PMU_PB_ERR.PBERR_Flag)

78

78

PMU Service Errors (PMU_SERV_ERR.SERVERR_Data)

98:79

98:79

PMU Service Error Flag (PMU_SERV_ERR.SERVERR_Flag)

99

99

PMU Firmware Error (PMU_SERV_ERR.FWERR)

103:100

103:100

CSU BootROM Errors (CSU_BR_ERR.ERR_TYPE)

119:104

119:104

CSU BootROM Errors (CSU_BR_ERR.BR_ERROR)

120

120