Run RNN-T Demo on Versal - 2.0 English

Vitis AI RNN User Guide (UG1563)

Document ID
UG1563
Release Date
2022-01-20
Version
2.0 English

This section shows one RNN-T model-based ASR solution on Xilinx® Versal® device VCK5000. Versal is the first adaptive compute acceleration platform (ACAP). It is a fully software-programmable heterogeneous compute platform that combines Scalar Engines, Adaptable Engines, and Intelligent Engines to achieve dramatic performance improvements over FPGA and CPU implementations among different applications. For more information, see Xilinx Versal website. RNN-T is a sequence-to-sequence model that continuously processes input samples and streams output symbols. The speech recognition model used here is a modified RNN-T model and belongs to the MLPerf Inference benchmark suite. For more information, see mlcommons inference repo.

The hardware kernel is a 40 AIE cores design. INT8 matrix-matrix multiplications are performed on AIE cores, and other functions are implemented in programmable logic (PL) with INT16 precision. The following table shows the total resource utilization for the kernel and platform. UltraRAM resources are mainly used as weights buffer and are shared if multiple kernels are instantiated.

Table 1. Resources
  CLB LUTs Registers Block RAM UltraRAM DSP Slices AIE Cores
Available 899712 1799424 967 463 1968 400
Utilized 169163(18.8%) 241657(13.43%) 197(20.37%) 332(71.71%) 82(4.17%) 40(10%)
Note: For RNN-T model, the quantizer and compiler are not ready now. The process of mix-precision quantization and instruction generation are completed manually and produced for this demo. For more information, see DPU-for-RNN.