Kernel Feature - 2023.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID
XD099
Release Date
2023-11-13
Version
2023.2 English

Refer to the following block diagram of the krnl_cbc kernel. It has four identical CBC engines, which receive input data from AXI read master via engine control unit. They then send the data to and receive output data from the krnl_aes kernel via the AXI-Stream port, and send the result to AXI write master via the engine control unit.

An AXI control slave module is used to set the necessary kernel arguments. The krnl_cbc kernel finishes the task with input/output grouped words stored in global memory. Each internal engine will handle one words group at one time. Consecutive input groups are assigned to different internal CBC engines in round-robin fashion by engine control module. The krnl_cbc kernel uses a single kernel clock for all internal modules.

axi_ctrl_slave
axi_ctrl_slave
knrl_cbc
knrl_cbc
engine
control
engine...
cbc_engine_0
cbc_engine_0
cbc_engine_1
cbc_engine_1
cbc_engine_2
cbc_engine_2
cbc_engine_3
cbc_engine_3
AXI Read Master
AXI Read Master
AXI Write Master
AXI Write Master
AXIS slave
AXIS slave
AXIS slave
AXIS slave
AXIS slave
AXIS slave
AXIS slave
AXIS slave
AXIS master
AXIS master
AXIS master
AXIS master
AXIS master
AXIS master
AXIS master
AXIS master
AXI master
AXI master
AXI master
AXI master
Viewer does not support full SVG 1.1

The krnl_cbc kernel supports the ap_ctrl_chain execution model. ap_ctrl_chain is an extension to the ap_ctrl_hs model; the kernel execution is divided into input sync and output sync stage. Control signals ap_start and ap_ready are used for input sync, while ap_done and ap_continue are used for output sync. Refer to Supported Kernel Execution Models for detailed explanations.

The following figure demonstrates an example waveform of ap_ctrl_chain module for two beat input sync and two beat output sync (kernel execute two jobs consecutively).

ap_ctrl_hs mode

For input sync, at clock edge a and b, ap_start is validated and deasserted by the ap_ready signal, and triggers the kernel execution simultaneously. (This is somewhat similar to TVALID validated by TREADY in the AXI stream protocol.) The XRT scheduler detects the status of the ap_start signal and asserts ap_start when the signal is low, meaning the kernel can accept a new task. The ap_ready signal is generated by the kernel, indicating its status.

For output sync, at clock edge c and d, ap_done is confirmed and de-asserted by the ap_continue signal, meaning the completion of one kernel job. When the XRT scheduler detects the ap_done signal has been asserted, XRT asserts ap_continue. Generally, this should be implemented as a self-clear signal, so that it only keeps one cycle.

From the waveform, you can see that before the ap_done signal was asserted, the kernel uses the ap_ready signal to tell the XRT that it can accept new input data. This scheme acts as back-pressure on the input sync stage to enable the task pipeline to fully utilize the hardware capability. In the above example waveform, XRT writes ap_start bit and ap_continue bit twice each in the AXI control slave register.

The following table lists all the control register and kernel arguments included in AXI slave port. There is no interrupt support in this kernel.

Name Addr Offset Width (bits) Description
CTRL 0x000 5 Control Signals.
bit 0 - ap_start
bit 1 - ap_done
bit 2 - ap_idle
bit 3 - ap_ready
bit 4 - ap_continue
MODE 0x010 1 Kernel cipher mode:
0 - decryption
1 - encryption
IV_W3 0x018 32 AES-CBC mode initial vector, Word 3
IV_W2 0x020 32 AES-CBC mode initial vector, Word 2
IV_W1 0x028 32 AES-CBC mode initial vector, Word 1
IV_W0 0x030 32 AES-CBC mode initial vector, Word 0
WORDS_NUM 0x038 32 Number of 128-bit words to process
SRC_ADDR_0 0x040 32 Input data buffer address, LSB
SRC_ADDR_1 0x044 32 Input data buffer address, MSB
DEST_ADDR_0 0x048 32 Output data buffer address, LSB
DEST_ADDR_1 0x04C 32 Output data buffer address, MSB
CBC_MODE 0x050 1 Cipher processing mode:
0 - AES-ECB mode
1 - AES-CBC mode