Location Constraints - 2020.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
Release Date
2020-11-24
Version
2020.2 English

Kernel Location Constraints

When building large graphs with multiple subgraphs, it is sometimes useful to control the exact mapping of kernels to AI Engines, either relative to other kernels or in an absolute sense. The AI Engine compiler provides a mechanism to specify location constraints for kernels, which when used with the C++ template class specification, provides a powerful mechanism to create a robust, scalable, and predictable mapping of your graph onto the AI Engine array. It also reduces the choices for the mapper to try, which can considerably speed up the mapper. Consider the following graph specification:

#include <adf.h>
#include "kernels.h
#define NUMCORES (COLS*ROWS)
using namespace adf;

template <int COLS, int ROWS, int STARTCOL, int STARTROW>
class indep_nodes_graph1 : public graph {
 public:
   kernel kr[NUMCORES];
   port<input> datain[NUMCORES] ;
   port<output> dataout[NUMCORES] ;
  
 indep_nodes_graph1() {
  for (int i = 0; i < COLS; i++) {
    for (int j = 0; j < ROWS; j++) {
      int k = i*ROWS + j;
      kr[k] = kernel::create(mykernel);
      source(kr[k])  = "kernels/kernel.cc";
      runtime<ratio>(kr[k]) = 0.9;
      location<kernel>(kr[k]) = tile(STARTCOL+i, STARTROW+j);
    }
  }
  for (int i = 0; i < NUMCORES; i++) {
    connect<stream, window<64> >(datain[i], kr[i].in[0]);
    connect<window<64>, stream >(kr[i].out[0], dataout[i]);
  }
 };
};

The template parameters identify a COLS x ROWS logical array of kernels (COLS x ROWS = NUMCORES) that are placed within a larger logical device of some dimensionality starting at (STARTCOL, STARTROW) as the origin. Each kernel in that graph is constrained to be placed on a specific AI Engine. This is accomplished using an absolute location constraint for each kernel placing it on a specific processor tile. For example, the following declaration would create a 1 x 2 kernel array starting at offset (3,2). When embedded within a 4 x 4 logical device topology, the kernel array is constrained to the top right corner.

indep_nodes_graph1<1,2,3,2> mygraph;
Important: Earlier releases used location<absolute>(k), function to specify kernel constraints and proc(x,y) function to specify a processor tile location. These functions are now deprecated. Instead, use location<kernel>(k) to specify the kernel constraints and tile(x,y) to identify a specific tile location. See Adaptive Data Flow Graph Specification Reference for more information.

Buffer Location Constraints

The AI Engine compiler tries to automatically allocate buffers for windows, lookup tables, and run-time parameters in the most efficient manner possible. However, you might want to explicitly control their placement in memory. Similar to the kernels shown previously in this section, buffers inferred on a kernel port can also be constrained to be mapped to specific tiles, banks, or even address offsets using location constraints, as shown in the following example.

#include <adf.h>
#include "kernels.h"
#define NUMCORES (COLS*ROWS) 
using namespace adf;

template <int COLS, int ROWS, int STARTCOL, int STARTROW>
class indep_nodes_graph2 : public graph {
 public:
   kernel kr[NUMCORES];
   port<input> datain[NUMCORES] ;
   port<output> dataout[NUMCORES] ;
  
 indep_nodes_graph() {
  for (int i = 0; i < COLS; i++) {
    for (int j = 0; j < ROWS; j++) {
      int k = i*ROWS + j;
      kr[k] = kernel::create(mykernel);
      source(kr[k])  = "kernels/kernel.cc";
      runtime<ratio>(kr[k]) = 0.9;
      location<kernel>(kr[k]) = tile(STARTCOL+i, STARTROW+j); // kernel location
      location<buffer>(kr[k].in[0]) = 
        { address(STARTCOL+i, STARTROW+j, 0x0), 
          address(STARTCOL+i, STARTROW+j, 0x2000) };          // double buffer location
      location<stack>(kr[k]) = bank(STARTCOL+i, STARTROW+j, 2); // stack location
      location<buffer>(kr[k].out[0]) = location<kernel>(kr[k]); // relative buffer location
    }
  }

  for (int i = 0; i < NUMCORES; i++) {
    connect< stream, window<64> >(datain[i], kr[i].in[0]);
    connect< window<64>, stream >(kr[i].out[0], dataout[i]);
  }
 };
};

In the previous code, the location of double buffers at port kr[k].in[0] is constrained to the specific memory tile address offsets that are created using the address(col,row,offset) constructor. Furthermore, the location of the system memory (including the stack and static heap) for the processor that executes kernel instance kr[k] is constrained to a particular bank using the bank(col,row,bankid)constructor. Finally, the tile location of the buffers connected to the port kr[k].out[0] is constrained to be the same tile as that of the kernel instance kr[k]. Buffer location constraints are only allowed on window kernel ports.

CAUTION:
Using location constraint constructors and equality relations between them, you can make fine-grain mapping decisions that the compiler must honor. However, you must be careful because it is easy to create constraints that are impossible for the compiler to satisfy. For example, the compiler cannot allow two buffers to be mapped to the same address offset. See the complete reference in Adaptive Data Flow Graph Specification Reference.

Hierarchical Constraints

When creating complex graphs with multiple subgraph classes, or multiple instances of the same subgraph class, the location constraints described above can also be applied to each kernel instance or kernel port instance individually at the point of subgraph instantiation instead of the definition. In this case, you need to specify the graph qualified name of that kernel instance or kernel port instance in the constraint as shown below. Also, make sure that the kernels or their ports being constrained as above are defined to be public members of the subgraph.

class ToplevelGraph : public graph {
 public:
  indep_nodes_graph1<1,2,3,2> mygraph;
  port<input> datain[2] ;
  port<output> dataout[2] ;

  ToplevelGraph() {
    for (int i = 0; i < 2; i++) {
      connect<stream, window<64> >(datain[i], mygraph.datain[i]);
      connect<window<64>, stream >(mygraph.dataout[i], dataout[i]);

      // hierarchical constraints
      location<stack>(mygraph.kr[i]) = bank(3, 2+i, 2);
      location<buffer>(mygraph.kr[i].out[0]) = location<kernel>(mygraph.kr[i]);
    }
  };
};