AI Engine-PL Interface Performance - 2022.2 English

AI Engine Kernel and Graph Programming Guide (UG1079)

Document ID
UG1079
Release Date
2022-10-19
Version
2022.2 English

Versal AI Core series devices include an AI Engine array with the following column categories.

PL column
provides PL stream access. Each column supports eight 64-bit slave channels for streaming data into the AI Engine and six 64-bit master channels for streaming data to the PL.
NoC column
provides connectivity between the AI Engine array and the NoC. These interfaces can also connect to the PL.

To instruct the AI Engine compiler to select higher frequency interfaces, use the --pl-freq=<number> to specify the clock frequency (in MHz) for the PL kernels. The default value is one quarter of the AI Engine frequency and the maximum supported value is a half of the AI Engine frequency, the values depending on the speed grade. Following are examples:

  • Option to enable an AI Engine-PL frequency of 300 MHz for all AI Engine-PL interfaces:
    --pl-freq=300
  • To set a different frequency for a specific PLIO interface use the following code to set it in the ADF graph.
    adf::PLIO *<input>= new adf::PLIO(<logical_name>, <plio_width>, <file>, <FreqMHz>);
Note: The following information applies to the AI Engine device architecture documented in Versal ACAP AI Engine Architecture Manual (AM009).

The AI Engine-PL AXI4-Stream channels use boundary logic interface (BLI) connections that include optional BLI registers with the exception of the slave channels 3 and 7. The two slave channels channel 3 and channel 7 are slower interfaces. The performance of the data transfer between the AI Engine and PL depends on whether the optional BLI registers are enabled or not.

For less timing-critical designs, all eight channels can be used without using the BLI registers. PL timing can still be met in this case. However, for higher frequency designs, only the six fast channels (0,1,2,4,5,6) can be used and the timing paths from the PL must be registered, using the BLI registers.

To control the use of BLI registers across the AI Engine-PL channels, use the --pl-register-threshold=<number> compiler option, specified in MHz. The default value is 1/8 of the AI Engine frequency based on speed grade. Following is an example:

  • –pl-register-threshold=125

    The compiler will map any PLIO interface with an AI Engine-PL frequency higher than this setting (125 MHz in this case) to high-speed channels with the BLI registers enabled. If the PLIO interface frequency is not higher than the pl-register-threshold value then any of the AI Engine-PL channels will be used.

In summary, if pl-freq < pl-register-threshold all eight channels can be used unregistered. If pl-freq > pl-register-threshold only the six fast channels can be used, with registering. pl-register-threshold is a way to control the threshold frequency beyond which only fast channels can be used (with registering).

Note: TLAST is required for a 64-bit stream between the AI Engine and PL if single 32-bit words are sent: AI Engine to PL 32-bit stream interfaces are automatically internally up-sized to 64-bit interfaces by the AI Engine compiler. When sending 32-bit stream data (to or from the PL from the AI Engine), single 32-bit words without TLAST are held in the interface until a second 32-bit word arrives to complete a 64-bit up-sizing. The workaround is to send TLAST after the 32-bit stream is sent.