Design Validation

Arbitrary Resampling Filter Design (XAPP1373)

Document ID
XAPP1373
Release Date
2022-02-28
Revision
1.0 English

The heterogeneous ARF design is validated in the Xilinx VC1902 device on a VCK190 evaluation board. The AI Engine and PL portions of the ARF design are packaged as kernels, as is the tester, which drives the input ports of the device under test (DUT) using a prestored stimulus and monitors the output AXI bus with the reference test vector. Throughput and latency are measured by the PL tester and recorded in a set of registers accessible by the processor via the AXI4-Lite interface. At the end of the test, the results are summarized and printed via a COM port.

Figure 1. ARF Design Validation Environment

All the kernels can only have AXI interfaces, however, when both source and destination of an AXI bus are PL kernels, users can customize the signal definitions. Besides the AXI buses connected with AI Engine, the ARF PL kernel has the following signals mapped to the AXI interfaces with custom logic.

Table 1. ARF PL Kernel Signals Mapped to AXI Interfaces
AXI Bus Direction Signal Name Mapping to AXI Signal
Input (375 MHz) afsrc_in_vld T_VALID
afsrc_in_rdy T_READY
afsrc_in_soft_reset T_DATA[63]
afsrc_in_stp [29:0] T_DATA[61:32]
afsrc_in_dat [31:0] T_DATA[31:0]
Output (500 MHz) afsrc_out_flags [1:0] T_USER[1:0]
afsrc_out_rdy T_READY
afsrc_out_vld T_VALID
afsrc_out_dat[31:0] T_DATA[31:0]

Some details are explained in the following:

  • A soft reset is mapped to the most significant bit of the input data bus. It should be asserted before the valid data to do the following:
    • Reset the phase accumulation registers in PL
    • Reset the output FIFOs in PL
    • Clear the overlap memory in AI Engine
  • The AXI protocol requires the data transmission to pause immediately after the Ready signal goes Low. In the customized AXI interface, the protocol is relaxed to that of a FIFO which honors all write operations until the buffer is full. The backpressure is signaled by the programmable full signal asserted when less than 16 samples can be written to the FIFO. This allows the custom logic to flush out the data in a pipeline up to 15 stages.
  • The output Ready signal serves as a timing reference for the ARF to start output exactly 500 clock cycles after its assertion. This is realized by a carefully controlled output FIFO read signal.
  • The empty signals of the ARF FIFOs are mapped to T_USER for error detection. When the ARF output is active, a FIFO empty event indicates the output data could be corrupted.

The ARF tester kernel collects the test results to be accessed by the processor via a register map shown in the following figure. There are also fields controlling the test process. Every iteration in the test is 8192 input samples at 350 MSPS, and a maximum of (232 – 1) iterations can last for 8192 x (232 – 1) x 1/350 MHz = 14 hours.

Figure 2. ARF Tester Kernel Register Map

A floating-point MATLAB reference model is constructed to ensure the algorithm achieves satisfactory performance. Then a bit-true MATLAB model is developed, and the quantization noise is measured by comparing the output with that of the floating-point model. The following figure is a visual comparison of the input waveform, floating-point resampler output, and bit-true model. They match with each other very well, which suggests a high accuracy. The measured signal-to-quantization-noise ratio (SQNR) is 87 dBc for this test case.

Figure 3. MATLAB Model Simulation Results

The test vectors generated by the MATLAB scripts are used for AI Engine simulation and hardware testing. A fractional ratio of 5333/7993 is selected for testing purposes, where 5333 and 7993 are both prime numbers. The input test vector is a repetition of a 5333-sample waveform until the length of AI Engine simulation is reached. The output is expected to be a repetition of 7993 samples, except for the first several samples in the first iteration.

The Makefile includes the commands to run AI Engine simulation and post-process the output data. The test results shown in the following figure suggest that the output of AI Engine kernel bit-true matches the reference test vector, and the target throughput of 700 MHz is achieved with 2% margin.
$ make aie
------------------------------------
 Arbitrary Resampler AIE Sim Result 
------------------------------------
Throughput = 715.718 Msps
Mismatch   = 0
The RTL design is verified in a pure RTL simulation environment with self-checking monitors. Upon the completion of simulation, the test results are output as follows, which suggest the RTL behaviors are as expected.
$ make rtlsim

SIN Mismatch =    0
AIN Mismatch =    0
DIN Mismatch =    0
 
Test 0: Mismatch =    0, IdleCycle =   0, Latency = 500 cycles, ErrFlag = 0
Test 1: Mismatch =    0, IdleCycle =   0, Latency = 500 cycles, ErrFlag = 0
 
*************** TEST PASSED  ****************
The AI Engine and PL kernels are now ready for integration. For this design of two PL kernels in three clock domains, the whole system integration is completed with 14 lines of code, as shown in the figure below. A larger design with hundreds of AXI buses can benefit more from this approach because manually connecting thousands of signals in RTL is prone to errors.
[connectivity]
# Declare Kernels
nk=tst_arf:1:tst_arf_1
nk=plk_arf:1:plk_arf_1

# TESTER -> PL Kernel
sc=tst_arf_1.arf_in:plk_arf_1.arf_in

# PL Kernel -> AIE
sc=plk_arf_1.aie_sin:ai_engine_0.sin
sc=plk_arf_1.aie_ain:ai_engine_0.ain
sc=plk_arf_1.aie_din:ai_engine_0.din

# AIE -> PL Kernel
sc=ai_engine_0.dout:plk_arf_1.aie_out

# PL Kernel -> TESTER
sc=plk_arf_1.arf_out:tst_arf_1.arf_out

[clock]
# ID=0: 100MHz for Registers
id=0:tst_arf_1.reg_clk

# ID=4: 375MHz for AIE Interface
id=4:plk_arf_1.aie_clk
id=4:tst_arf_1.aie_clk

# ID=3: 500MHz for DAC Interface
id=3:plk_arf_1.dac_clk
id=3:tst_arf_1.dac_clk

Debugging with waveform views in a software simulation environment is much easier than doing so directly on hardware with limited visibility. The Vitis compiler supports PS+PL+AI Engine co-simulation and uses the Vivado® simulator as the GUI to display waveforms, on which latencies of various signals can be measured. The ARF output signals in the 500 MHz clock domain (dut_out_axi_trdy and dut_out_axi_tvld in the figure below) are fine-tuned to have a fixed latency of 1 μs between them. The cross-clock-domain signals and AXI interfaces will have some timing uncertainties. However, they are completely absorbed by the output FIFO and transparent to the custom logic.

Figure 4. ARF Input and Output Timing Diagram

After the design passes software verification, more comprehensive and longer tests are performed on the VCK190 evaluation board. By default, VCK190 boards come with VC1902-2MP devices, however, in the test platform, the part number is modified to VC1902-1LLP, which is recommended for customers who prioritize power efficiency. The software running on the Arm® processor starts and stops the test 10 times, from one million iterations (eight billion input samples) in the first test with an increment of 1.2 million iterations (10 billion samples) in each of the following tests. In the end, a short summary is output via the COM port.
---------------------------------------------------------------
--       ARBITRARY RESAMPLING FILTER TEST SUMMARY            --
---------------------------------------------------------------
 TestID Latency(us)  Outputs    Idle    Mismatch  Flag Result
--------------------------------------------------------------- 
   0      1.000     12279095842    0           0  0x00  PASS
   1      1.000     27437128450    0           0  0x00  PASS
   2      1.000     42595161058    0           0  0x00  PASS
   3      1.000     57753242780    0           0  0x00  PASS
   4      1.000     72911275388    0           0  0x00  PASS
   5      1.000     88069307996    0           0  0x00  PASS
   6      1.000    103227340606    0           0  0x00  PASS
   7      1.000    118385373214    0           0  0x00  PASS
   8      1.000    133543442656    0           0  0x00  PASS
   9      1.000    148701475266    0           0  0x00  PASS
-------------------------------------------------------------

PASS!

The test result confirms all the design targets have been met:

  • All output samples match the reference test vector stored in ROMs.
  • A deterministic latency of 1 μs is measured for all the tests.
  • No idle cycle is observed in the output data bus, which means the Valid signal stays solid High during the test.
  • Error flags are not asserted, which means the FIFOs did not underflow.