Software Emulation - 2023.2 English

Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393)

Document ID
UG1393
Release Date
2023-12-13
Version
2023.2 English

VSC uses a single source C++ model which can be functionally validated fast. The software emulation target (-t sw_emu) must be specified during v++ compile step. For this target, VSC will compile the accelerator and application source files using C-compilation and will not run a hardware compilation. The Vitis HLS tool will not be run to create RTL, and the object files are linked using g++.

The following is an example code of an accelerator with two PEs, ldSt and fsk. The ldSt PE writes sz words in the AXI4-Stream L, and the fsk PE simply copies those words back in the feedback stream S. The fsk function body reads one word from L and writes one word to S. The fsk PE is marked free-running and therefore is agnostic to sz.

void compute(…) {
    hls::stream<T> L, S;
    ldSt(…, L, S); // S is feedback
    fsk(L, S);     // fsk is free-running
}
void ldSt(…, hls::stream<T>& L,
             hls::stream<T>& S) {
    for (int i=0; i<sz; i++) {
        L << input[i];
    }
    // Error-1: non-empty stream (i < sz-1)
    // Error-2: deadlock stream (i < sz+1)
    for (int i=0; (i < sz); i++) {
        S >> output[i];
    } 
}
void fsk(hls::stream<T>& L,
         hls::stream<T>& S) {
    T word;
    L >> word;
    S << word;
}

Because the C++ source is entirely built with C-compilation the process is very fast. Additionally, VSC checks the model against certain hardware behavior semantics of the VSC C++ model:

  • Automatic runtime assertion for host-device data movement, in the application layer
    • Input data buffer might not be written after it is synced to the device
    • Output data buffer might not be read before the result is synced from the device
  • The CUs and PE are executed in parallel, to model the hardware semantics. Therefore,
    • feedback connections, that are non-procedural C++ code, can be functionally validated
    • free-running PEs can be functionally validated along with regular PEs
  • Certain erroneous hardware behavior can be detected
  • Error-1: If the second loop in ldSt was using "i < sz-1" then it will lead to non-empty AXI4-Stream 'S', which will be asserted
  • Error-2: If the second loop in ldSt was using "i < sz+1" then it will lead to a deadlock as ldStr is expecting one more word than what is written to stream 'S'. This scenario will be captured as a hang in this early C-based validation.