Host Side Data Generation - 2022.1 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
Release Date
2022.1 English

The convolutional filter consists of three independent CUs which will each process one color channel from the video image stream. The example does not use an actual video stream to keep project simple and stay focused on VSC. The example host code simply generates a random image with three color channels which is passed to a software function which is using the VSC specific code. This results in C-threads for sending and receiving data from host to device plus the compute() call.

The function passes all the filter parameters and data pointers to source and destination images. The code snippet below gives the definition of this function. The essential steps include:

  • Create buffer pool handles (srcBufPool...) to enable sending and receiving data between the host and the device. Attributes indicate if the buffers are inputs or outputs.
  • Call conv_acc::send_while() using a lambda function. The lambda function allocates buffers on the device, copies host data to the device buffers, then calls the compute() function (which ultimately runs the hardware accelerated function). send_while() keeps calling the lambda function as long as it return true.
  • Call conv_acc::receive_all_in_order() also using a lambda function to receive the processed buffers.
  • Use a join() call to wait and synchronize everything.
Tip: Refer to VPP_ACC Class API for an explanation of the various functions.
#include "conv_filter_acc_wrapper.hpp"
int conv_filter_execute_fpga(
        const char           coeffs[FILTER_V_SIZE][FILTER_H_SIZE],
        float                factor,
        short                bias,
        unsigned short       width,
        unsigned short       height,
        unsigned int         numImages,
        YUVImage             srcImage,
        YUVImage             dstImage
    auto srcBufPool = conv_acc::create_bufpool(vpp::input);
    auto dstBufPool = conv_acc::create_bufpool(vpp::output);
    auto coeffsBufPool = conv_acc::create_bufpool(vpp::input);
    int run = 0;
    int dataSizePerChannel = width * height ;
    // sending input
    conv_acc::send_while([=]()->bool {
            unsigned char *  srcBuf = (unsigned char *)conv_acc::alloc_buf(srcBufPool, 3*dataSizePerChannel);
            unsigned char *  dstBuf = (unsigned char *)conv_acc::alloc_buf(dstBufPool, 3*dataSizePerChannel);
            char *        coeffsBuf = (         char *)conv_acc::alloc_buf(coeffsBufPool, FILTER_V_SIZE*FILTER_H_SIZE);
            // initialize all input data before parallel computes
            unsigned char * srcChannel[3] = {srcImage.yChannel, srcImage.uChannel, srcImage.vChannel};
            for (int ch = 0; ch < 3; ch++){
                std::memcpy(srcBuf+ch*dataSizePerChannel, srcChannel[ch], dataSizePerChannel);
            // execute conv_acc<NCU> parallel computes
            for (int ch = 0; ch < 3; ch++){
                                  srcBuf + ch*dataSizePerChannel,
                                  dstBuf + ch*dataSizePerChannel);
            return (++run < numImages);
    // receive lambda function for receive thread
    conv_acc::receive_all_in_order([=]() {
            int run = conv_acc::get_handle();
            unsigned char * dstBuf = (unsigned char *)conv_acc::get_buf(dstBufPool);
            unsigned char * dstChannel[3] = {dstImage.yChannel, dstImage.uChannel, dstImage.vChannel};
            for (int ch = 0; ch < 3; ch++){
                std::memcpy(dstChannel[ch], dstBuf+ch*dataSizePerChannel, dataSizePerChannel);
    // wait for both loops to finish
    return 0;