Experiencing Acceleration Performance

Experiencing Acceleration Performance - 2023.2 English

Vitis Tutorials: Hardware Acceleration (XD099)

Document ID

XD099

Release Date

2023-11-13

Version

2023.2 English

In this lab, you will experience the acceleration potential by running the application first as a software-only version and then as an optimized FPGA-accelerated version using a precompiled FPGA accelerator.

Run the following command to set up the application.

# Source the Vitis runtime environment
export LAB_WORK_DIR=<Downloaded Github repository>/Hardware_Acceleration/Design_Tutorials/02-bloom

Next, build the C application:

Navigate to the cpu_src directory.

Use the following command to run the original application with the number of documents as the argument, and generate the golden output file for comparison.

cd $LAB_WORK_DIR/cpu_src/
make run

The generated output compute scores are stored in the host code in the cpu_profile_score array that represents the outputs for the total number of specified documents. The results will look similar to the following:

./host 100000
Initializing data
Creating documents - total size : 1398.903 MBytes (349725824 words)
Creating profile weights

Total execution time of CPU          |  2949.3867 ms
Compute Hash processing time         |  2569.3266 ms
Compute Score processing time        |   380.0601 ms
--------------------------------------------------------------------
Execution COMPLETE

Run the application on the FPGA. For the purposes of this lab, the FPGA accelerator is implemented with an 8x parallelization factor.

Eight input words are processed in parallel, producing eight output flags in parallel during each clock cycle.

To run the optimized application on the FPGA, run the following make command.

   make run_fpga SOLUTION=1
   ```

   The following output displays.

   ```
   Processing 1398.905 MBytes of data
   Splitting data in 8 sub-buffers of 174.863 MBytes for FPGA processing
   --------------------------------------------------------------------
   Executed FPGA accelerated version  |   427.1341 ms   ( FPGA 230.345 ms )
   Executed Software-Only version     |   3057.6307 ms
   --------------------------------------------------------------------
   Verification: PASS
   ```

   The computed throughput is:

   Throughput = Total data/Total time = 1.39 GB/427.1341ms = 3.25 GB/s

   By efficiently leveraging FPGA acceleration, the throughput of the application increases by a factor of 7.