Graph Execution Control - 2020.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID

UG1076

Release Date

2020-11-24

Version

2020.2 English

In Versal™ ACAPs with AI Engines, the processing system (PS) can be used to dynamically load, monitor, and control the graphs that are executing on the AI Engine array. Even if the AI Engine graph is loaded once as a single bitstream image, the PS program can be used to monitor the state of the execution and modify the run-time parameters of the graph.

The graph base class provides a number of API methods to control the initialization and execution of the graph that can be used in the main program. See Adaptive Data Flow Graph Specification Reference for more details.

Basic Iterative Graph Execution

The following graph control API illustrates how to use graph APIs to initialize, run, wait, and terminate graphs for a specific number of iterations. A graph object mygraph is declared using a pre-defined graph class called simpleGraph. Then, in the main application, this graph object is initialized and run. The init() method loads the graph to the AI Engine array at prespecified AI Engine tiles. This includes loading the ELF binaries for each AI Engine, configuring the stream switches for routing, and configuring the DMAs for I/O. It leaves the processors in a disabled state. The run() method starts the graph execution by enabling the processors. The run API is where a specific number of iterations of the graph can be run by supplying a positive integer argument at run time. This form is useful for debugging your graph execution.

#include "project.h" 
simpleGraph mygraph; 

int main(void) { 
  mygraph.init(); 
  mygraph.run(3); // run 3 iterations 
  mygraph.wait(); // wait for 3 iterations to finish 
  mygraph.run(10); // run 10 iterations 
  mygraph.end(); // wait for 10 iterations to finish 
  return 0; 
}

Note: The API wait() is used to wait for the first run to finish before starting the second run. wait has the same blocking effect as end except that it allows re-running the graph again without having to re-initialize it. Calling run back-to-back without an intervening wait to finish that run can have an unpredictable effect because the run API modifies the loop bounds of the active processors of the graph.

Finite Execution of Graph

For finite graph execution, the graph state is maintained across the graph.run(n). The AI Engine is not reinitialized and memory contents are not cleared after graph.run(n). In the following code example, after the first run of three invocations, the core-main wrapper code is left in a state where the kernel will start with the pong buffer in the next run (of ten iterations). The ping-pong buffer selector state is left as-is. graph.end() does not clean up the graph state (specifically, does not re-initialize global variables), nor clean up stream switch configurations. It merely exits the core-main. To re-run the graph, you have to reload the PDI/XCLBIN.

#include "project.h" 
simpleGraph mygraph; 

int main(void) { 
  mygraph.init(); 
  mygraph.run(3); // run 3 iterations 
  mygraph.wait(); // wait for 3 iterations to finish 
  mygraph.run(10); // run 10 iterations 
  mygraph.end(); // wait for 10 iterations to finish 
  return 0; 
}

Infinite Graph Execution

The following graph control API illustrates how to run the graph infinitely.

#include "project.h"
simpleGraph mygraph;

int main(void) {
  mygraph.init();  // load the graph
  mygraph.run();   // start the graph
  return 0;
}

A graph object mygraph is declared using a pre-defined graph class called simpleGraph. Then, in the main application, this graph object is initialized and run. The init() method loads the graph to the AI Engine array at prespecified AI Engine tiles. This includes loading the ELF binaries for each AI Engine, configuring the stream switches for routing, and configuring the DMAs for I/O. It leaves the processors in a disabled state. The run() method starts the graph execution by enabling the processors. This graph runs forever because the number of iterations to be run is not provided to the run() method.

Note: graph::run() without an argument runs the AI Engine kernels for a previously specified number of iterations (which is infinity by default if the graph is run without any arguments). If the graph is run with a finite number of iterations, for example, mygraph.run(3);mygraph.run() the second run call will also run for three iterations.

Parallel Graph Execution

Among the above API methods, only the wait() and end() methods are blocking operations that can block the main application indefinitely. Therefore, if you declare multiple graphs at the top level, you need to interleave the APIs suitably to execute the graphs in parallel, as shown.

#include "project.h"
simpleGraph g1, g2, g3;

int main(void) {
  g1.init(); g2.init(); g3.init();
  g1.run(<num-iter>); g2.run(<num-iter>); g3.run(<num-iter>);
  g1.end(); g2.end(); g3.end();
  return 0;
}

Note: Each graph should be started (run) only after it has been initialized (init). Also, to get parallel execution, all the graphs must be started (run) before any graph is waited upon for termination (end).

Timed Execution

In multi-rate graphs, all kernels need not execute for the same number of iterations. In such situations, a timed execution model is more suitable for testing. There are variants of the wait and end APIs with a positive integer that specifies a cycle timeout. This is the number of AI Engine cycles that the API call will block before disabling the processors and returning. The blocking condition does not depend on any graph termination event. The graph can be in an arbitrary state at the expiration of the timeout.

#include "project.h"
simpleGraph mygraph;

int main(void) {
  mygraph.init();
  mygraph.run();
  mygraph.wait(10000);  // wait for 10000 AI Engine cycles
  mygraph.resume();     // continue executing 
  mygraph.end(15000);   // wait for another 15000 cycles and terminate
}

Note: The API resume() is used to resume execution from the point it was stopped after the first timeout. resume only resets the timer and enables the AI Engines. Calling resume after the AI Engine execution has already terminated will have no effect.