The heterogeneous ARF design is validated in the Xilinx VC1902 device on a VCK190 evaluation board. The AI Engine and PL portions of the ARF design are packaged as kernels, as is the tester, which drives the input ports of the device under test (DUT) using a prestored stimulus and monitors the output AXI bus with the reference test vector. Throughput and latency are measured by the PL tester and recorded in a set of registers accessible by the processor via the AXI4-Lite interface. At the end of the test, the results are summarized and printed via a COM port.
All the kernels can only have AXI interfaces, however, when both source and destination of an AXI bus are PL kernels, users can customize the signal definitions. Besides the AXI buses connected with AI Engine, the ARF PL kernel has the following signals mapped to the AXI interfaces with custom logic.
AXI Bus Direction | Signal Name | Mapping to AXI Signal |
---|---|---|
Input (375 MHz) |
afsrc_in_vld
|
T_VALID
|
afsrc_in_rdy
|
T_READY
|
|
afsrc_in_soft_reset
|
T_DATA[63]
|
|
afsrc_in_stp [29:0]
|
T_DATA[61:32]
|
|
afsrc_in_dat [31:0]
|
T_DATA[31:0]
|
|
Output (500 MHz) |
afsrc_out_flags [1:0]
|
T_USER[1:0]
|
afsrc_out_rdy
|
T_READY
|
|
afsrc_out_vld
|
T_VALID
|
|
afsrc_out_dat[31:0]
|
T_DATA[31:0]
|
Some details are explained in the following:
- A soft reset is mapped to the most significant bit of the input data bus. It
should be asserted before the valid data to do the following:
- Reset the phase accumulation registers in PL
- Reset the output FIFOs in PL
- Clear the overlap memory in AI Engine
- The AXI protocol requires the data transmission to pause immediately after
the
Ready
signal goes Low. In the customized AXI interface, the protocol is relaxed to that of a FIFO which honors all write operations until the buffer is full. The backpressure is signaled by theprogrammable full
signal asserted when less than 16 samples can be written to the FIFO. This allows the custom logic to flush out the data in a pipeline up to 15 stages.
- The output
Ready
signal serves as a timing reference for the ARF to start output exactly 500 clock cycles after its assertion. This is realized by a carefully controlled output FIFO read signal. - The
empty
signals of the ARF FIFOs are mapped to T_USER for error detection. When the ARF output is active, a FIFO empty event indicates the output data could be corrupted.
The ARF tester kernel collects the test results to be accessed by the processor via a register map shown in the following figure. There are also fields controlling the test process. Every iteration in the test is 8192 input samples at 350 MSPS, and a maximum of (232 – 1) iterations can last for 8192 x (232 – 1) x 1/350 MHz = 14 hours.
A floating-point MATLAB reference model is constructed to ensure the algorithm achieves satisfactory performance. Then a bit-true MATLAB model is developed, and the quantization noise is measured by comparing the output with that of the floating-point model. The following figure is a visual comparison of the input waveform, floating-point resampler output, and bit-true model. They match with each other very well, which suggests a high accuracy. The measured signal-to-quantization-noise ratio (SQNR) is 87 dBc for this test case.
The test vectors generated by the MATLAB scripts are used for AI Engine simulation and hardware testing. A fractional ratio of 5333/7993 is selected for testing purposes, where 5333 and 7993 are both prime numbers. The input test vector is a repetition of a 5333-sample waveform until the length of AI Engine simulation is reached. The output is expected to be a repetition of 7993 samples, except for the first several samples in the first iteration.
$ make aie
------------------------------------
Arbitrary Resampler AIE Sim Result
------------------------------------
Throughput = 715.718 Msps
Mismatch = 0
$ make rtlsim
SIN Mismatch = 0
AIN Mismatch = 0
DIN Mismatch = 0
Test 0: Mismatch = 0, IdleCycle = 0, Latency = 500 cycles, ErrFlag = 0
Test 1: Mismatch = 0, IdleCycle = 0, Latency = 500 cycles, ErrFlag = 0
*************** TEST PASSED ****************
[connectivity]
# Declare Kernels
nk=tst_arf:1:tst_arf_1
nk=plk_arf:1:plk_arf_1
# TESTER -> PL Kernel
sc=tst_arf_1.arf_in:plk_arf_1.arf_in
# PL Kernel -> AIE
sc=plk_arf_1.aie_sin:ai_engine_0.sin
sc=plk_arf_1.aie_ain:ai_engine_0.ain
sc=plk_arf_1.aie_din:ai_engine_0.din
# AIE -> PL Kernel
sc=ai_engine_0.dout:plk_arf_1.aie_out
# PL Kernel -> TESTER
sc=plk_arf_1.arf_out:tst_arf_1.arf_out
[clock]
# ID=0: 100MHz for Registers
id=0:tst_arf_1.reg_clk
# ID=4: 375MHz for AIE Interface
id=4:plk_arf_1.aie_clk
id=4:tst_arf_1.aie_clk
# ID=3: 500MHz for DAC Interface
id=3:plk_arf_1.dac_clk
id=3:tst_arf_1.dac_clk
Debugging with waveform views in a software simulation environment is much
easier than doing so directly on hardware with limited visibility. The Vitis compiler supports PS+PL+AI Engine co-simulation and uses the
Vivado®
simulator as
the GUI to display waveforms, on which latencies of various signals can be measured. The
ARF output signals in the 500 MHz clock domain (dut_out_axi_trdy
and
dut_out_axi_tvld
in the figure below) are fine-tuned to have a
fixed latency of 1 μs between them. The cross-clock-domain signals and AXI interfaces
will have some timing uncertainties. However, they are completely absorbed by the output
FIFO and transparent to the custom logic.
---------------------------------------------------------------
-- ARBITRARY RESAMPLING FILTER TEST SUMMARY --
---------------------------------------------------------------
TestID Latency(us) Outputs Idle Mismatch Flag Result
---------------------------------------------------------------
0 1.000 12279095842 0 0 0x00 PASS
1 1.000 27437128450 0 0 0x00 PASS
2 1.000 42595161058 0 0 0x00 PASS
3 1.000 57753242780 0 0 0x00 PASS
4 1.000 72911275388 0 0 0x00 PASS
5 1.000 88069307996 0 0 0x00 PASS
6 1.000 103227340606 0 0 0x00 PASS
7 1.000 118385373214 0 0 0x00 PASS
8 1.000 133543442656 0 0 0x00 PASS
9 1.000 148701475266 0 0 0x00 PASS
-------------------------------------------------------------
PASS!
The test result confirms all the design targets have been met:
- All output samples match the reference test vector stored in ROMs.
- A deterministic latency of 1 μs is measured for all the tests.
- No idle cycle is observed in the output data bus, which means the
Valid
signal stays solid High during the test. - Error flags are not asserted, which means the FIFOs did not underflow.