Throughput is composed of two processes: transferring data to/from the FPGA and running the computations. The demo contains options to measure timings as described in the README.md file.
As an example, processing a batche of 16384 call calculations with a floating point kernel breaks down as follows:
Total time (memory transfer time plus calculation time) = 974us
Calculation time (kernel execution time) = 235us
Memory transfer time = 739us