The functional block diagram of the core is shown in the following figure.
The core instantiates several AES engines internally based on the AXI4-Stream data width selected by you. It consists of a scalable tweak-calculation that generates several tweak values in a single clock-cycle so that all the engines can process data simultaneously without getting stalled. The key expansion module generates the round keys independently and provides them to all the engines.
When high-throughput mode is selected, the engines use a pipelined design that enables the core to provide data every cycle at the output. The high-throughput core also supports prefetch which enables you to provide the key and IV for the next packet while the current packet is being driven to the core. This enables the core to mask the latency that is introduced due to the generation of tweak or other metadata that must by ready before the engine can start accepting data. Thus, the design allows you to send in packets back-to-back without having to wait for the metadata to be ready for successive packets. This is useful in block-storage applications where the keys for all the blocks are pre-generated and stored in some memory within the system.
Prefetch is achieved by having a handshake mechanism for the key and IV. You
are expected to drive the key_valid
signal while sending valid key and IV.
The core asserts the key_fetch
signal to indicate that it is ready to take
the next set of key and IV. The key and IV are latched only when both the
key_valid
and key_fetch
are asserted during a particular
clock cycle. The controller within the core ensures that the keys are managed properly so that
they do not get overwritten when you feed the key and IV for the next packet. It also ensures
that the keys are provided to the pipeline based on the state of movement of data through
it.