ACP Usage

Zynq UltraScale+ Device Technical Reference Manual (UG1085)

Document ID
UG1085
Release Date
2023-12-21
Revision
2.4 English

The ACP provides a low-latency path between the PS and the accelerators implemented in the PL when compared with a legacy cache flushing and loading scheme. Steps that must take place in an example of a PL-based accelerator are as follows.

1.The CPU prepares input data for the accelerator within its local cache space.

2.The CPU sends a message to the accelerator using one of the HPM AXI master interfaces to the PL.

3.The accelerator fetches the data through the ACP, processes the data, and returns the result through the ACP.

4.The accelerator sets a flag by writing to a known location to indicate that the data processing is complete. The status of this flag can be polled by the processor or can generate an interrupt.

When compared to a tightly-coupled coprocessor, ACP access latencies are relatively long. Therefore, ACP is not recommended for fine-grained instruction level acceleration. Instead, for coarse-grain acceleration, such as video frame-level processing, ACP does not have a clear advantage over traditional memory-mapped PL acceleration because the transaction overhead is small relative to the transaction time, and can potentially cause undesirable cache thrashing. Therefore, ACP is optimal for medium-grain acceleration, such as a block-level crypto accelerator and video macro-block level processing.

The ACP port supports limited throughput (four outstanding transactions), two transaction burst lengths (64-byte and 16-byte), and adversely affects CPU cluster performance (by treating all ACP transactions as coherent).

 

RECOMMENDED:   For the best power and performance, AMD recommends using either an S_AXI_HPCx_FPD port or the ACE port to provide I/O coherency as the preferred approach over ACP.

Table 35-4:      PS-PL AXI Interfaces

Interface Name

Abbreviation

Type

Master

Data Width

Master ID Width

Usage Description

S_AXI_HP{0:3}_FPD

HP{0:3}

AXI4

PL

128/64/32

6

Non-coherent paths from PL to FPD main switch and DDR. No L2 cache allocation.

S_AXI_HPM0_LPD

PL_LPD

AXI4

PL

128/64/32

6

Non-coherent path from PL to IOP in LPD.

S_AXI_ACE_FPD

ACE

ACE

PL

128

6

Two-way coherent path between memory in PL and CCI.

S_AXI_ACP_FPD

ACP

AXI4

PL

128

5

I/O coherent with CCI. With L2 cache allocation.

S_AXI_HPC{0, 1}_FPD

HPC{0, 1}

AXI4

PL

128

6

I/O coherent with CCI.
No L2 cache allocation.

M_AXI_HPM{0, 1}_FPD

HPM{0, 1}

AXI4

PS

128/64/32

16

FPD masters to PL slaves.

M_AXI_HPM0_LPD

LPD_PL

AXI4

PS

128/64/32

16

LPD masters to PL slaves.

CAUTION!   Avoid the use of the ACP in security/safety critical applications requiring isolation within the APU and/or between PS and PL. The ACP, if enabled, has unrestricted access to the entire L2 cache of the APU and untrusted IP in the PL can make itself appear secure by modifying its AxPROT bits. If the ACP is used in this system, two precautions are highly recommended. The first, is to wrap any IP in the PL that has ACP access with trusted security logic that controls the AxPROT bits rather than allowing the IP to do so. The second is to ensure that access to the entirety of L2 cache by this IP does not violate the system security goals.