Multi-Process and Multi-Thread Support for Controlling the AI Engine Graph - 2021.2 English

Versal ACAP AI Engine Programming Environment User Guide (UG1076)

Document ID
UG1076
ft:locale
English (United States)
Release Date
2021-12-17
Version
2021.2 English

Xilinx® XRT APIs provide multi-process support for controlling the AI Engine array and graphs. It supports three operating modes on AI Engine array and graphs.

Exclusive Mode
Can fully access the AI Engine array or graph. No other process can access it.
Primary Mode
Can fully access the AI Engine array or graph. Other processes can get non-destructive access to the AI Engine array or graph.
Shared Mode
Can only get non-destructive access to the AI Engine array or graph.

Xilinx XRT provides the following APIs for opening the AI Engine array in three modes.

  • xrtAIEDeviceOpenExclusive (Exclusive mode)
  • xrtAIEDeviceOpen (Primary mode)
  • xrtAIEDeviceOpenShared (Shared mode)
Note: If the application does not call xrtAIEDeviceOpen* to obtain device handle, as a default, it will try to acquire primary context while trying to access the AI Engine array through XRT APIs.

Xilinx XRT also provides the following APIs for opening the graph after the AI Engine array is opened.

  • xrtGraphOpenExclusive (Exclusive mode)
  • xrtGraphOpen (Primary mode)
  • xrtGraphOpenShared (Shared mode)

The following APIs on the AI Engine array in exclusive and primary modes allow loading, resetting, and GMIO data transferring.

  • xrtDeviceLoadXclbin
  • xrtAIEResetArray
  • xrtAIESyncBO
  • xrtDeviceClose

The following APIs on the AI Engine array in shared modes allow nondestructive operations.

  • xrtDeviceLoadXclbin: Read metadata from xclbin.
  • xrtDeviceClose

The allowed APIs on the AI Engine graph in exclusive and primary modes include the following.

  • xrtGraphRun
  • xrtGraphWait
  • xrtGraphEnd
  • xrtGraphUpdateRTP
  • xrtGraphReadRTP
  • xrtGraphTimeStamp
  • xrtGraphClose

The allowed APIs on the AI Engine array and graphs in shared mode include the following.

  • xrtGraphUpdateRTP (asynchronous mode)
  • xrtGraphReadRTP
  • xrtGraphTimeStamp
  • xrtGraphClose

The following rules apply for the AI Engine array multi-process support.

  • Only one process can open AI Engine array in the exclusive mode. If the AI Engine array is opened in exclusive mode, it cannot be opened again in any mode in the same process or other processes.
  • Only one process can open AI Engine array in the primary mode. If the AI Engine array is opened in primary mode, it cannot be opened again in exclusive mode or primary mode, but it can be opened in shared mode again.

The following rules apply for the AI Engine graph multi-process support.

  • Graph is not supposed to be opened multiple times in a process.
  • If an AI Engine graph is opened in the exclusive mode, the graph cannot be opened in any mode again.
  • There can be multiple processes to open graph in the shared mode, but the graph is allowed only one process in the primary mode.

Graph is to be closed before closing the array. After all open AI Engine arrays and graphs are closed, the AI Engine arrays and graphs can be opened again, and the preceding rules apply again.

Note: When a graph or AI Engine array is closed, only the opened context is closed. It does not destroy hardware in the AI Engine array.

The following figure summarizes the multi-process support in exclusive mode (black: supported, red: not supported, "*": any mode).

Figure 1. Multi-Process Support in Exclusive Mode

The following figure summarizes the multi-process support in primary and shared modes (black: supported, red: not supported).

Figure 2. Multi-Process Support in Primary and Shared Modes

It is recommended that multiple threads use the same model as multiple processes. However, because the AI Engine device handle and graph handle are sharable between threads, it is legal to use the same device handle or graph handle between threads. The host application is responsible for synchronizing the AI Engine array state and graph state between threads, especially when multiple threads are the exclusive or primary owner of the AI Engine array or graph.

A sample code to use multi-process is as follows.

#include <stdlib.h>
#include <fstream>
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
#include "adf/adf_api/XRTConfig.h"
#include "experimental/xrt_kernel.h"

#include "graph.cpp"

//8192 matches 32 iterations of graph::run
#define OUTPUT_SIZE 8192
int value1[16] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
int value2[16] = {-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16};

using namespace adf;

int run(int argc, char* argv[],int id){
	std::cout<<"Child process "<<id<<" start"<<std::endl;
	
	//TARGET_DEVICE macro needs to be passed from gcc command line
	if(argc != 2) {
		std::cout << "Usage: " << argv[0] <<" <xclbin>" << std::endl;
		return EXIT_FAILURE;
	}
	char* xclbinFilename = argv[1];
	std::string graph_name=std::string("gr[")+std::to_string(id)+"]";
	std::string rtp_inout_name=std::string("gr[")+std::to_string(id)+std::string("].k.inout[0]");
	
	int ret;
	int value_readback[16]={0};
	if(fork()==0){//child child process
		xrtDeviceHandle dhdl2=xrtAIEDeviceOpenShared(0);
		ret=xrtDeviceLoadXclbinFile(dhdl2,xclbinFilename);
   		if(ret){
			printf("child child Xclbin Load fail\n");
    		}
		if(!dhdl2){
			std::cout<<"child child device open error"<<std::endl;
			return 1;
		}else{
			std::cout<<"child child device open pass"<<std::endl;
		}
		xuid_t uuid2;
    		ret=xrtDeviceGetXclbinUUID(dhdl2, uuid2);
		if(ret){
			std::cout<<"child child get xclbin uuid error"<<std::endl;
			return 1;
		}else{
			std::cout<<"child child get xclbin uuid pass"<<std::endl;
		}
		auto ghdl2=xrtGraphOpenShared(dhdl2,uuid2,graph_name.data());
		if(!ghdl2){
			std::cout<<"child child graph open error"<<std::endl;
			return 1;
		}else{
			std::cout<<"child child graph open pass"<<std::endl;
		}

		ret=xrtGraphReadRTP(ghdl2, rtp_inout_name.data(), (char*)value_readback, 16*sizeof(int));
		if(ret){
			std::cout<<"child child Graph RTP read fail"<<std::endl;
			return 1;
		}
		std::cout<<"Add value read back are:";
		for(int i=0;i<16;i++){
			std::cout<<value_readback[i]<<",\t";
		}
		std::cout<<std::endl;
		xrtGraphClose(ghdl2);
		xrtDeviceClose(dhdl2);
		std::cout<<"child child process exit"<<std::endl;
		exit(0);
	}

	xrtDeviceHandle dhdl=xrtAIEDeviceOpen(0);
	ret=xrtDeviceLoadXclbinFile(dhdl,xclbinFilename);
   	if(ret){
		printf("Xclbin Load fail\n");
    	}
	xuid_t uuid;
    	xrtDeviceGetXclbinUUID(dhdl, uuid);

	auto ghdl=xrtGraphOpen(dhdl,uuid,graph_name.data());
	if(!ghdl){
		std::cout << "Graph Open error" << std::endl;
	}else{
		std::cout << "Graph Open ok" <<std::endl;
	}
	std::string rtp_in_name=std::string("gr[")+std::to_string(id)+std::string("].k.in[1]");
	ret=xrtGraphUpdateRTP(ghdl,rtp_in_name.data(),(char*)value1,16*sizeof(int));
	if(ret){
		std::cout<<"Graph RTP update fail"<<std::endl;;
		return 1;
	}
	xrtGraphRun(ghdl,16);

	xrtGraphWait(ghdl,0);
	std::cout<<"Graph wait done"<<std::endl;
			
	//second run
	ret=xrtGraphUpdateRTP(ghdl,rtp_in_name.data(),(char*)value2,16*sizeof(int));
	if(ret!=0){
		std::cout<<"Graph RTP update fail"<<std::endl;
		return 1;
	}else{
		std::cout<<"Graph RTP update pass"<<std::endl;
	}
	xrtGraphRun(ghdl,16);

	while(wait(NULL)>0){//Wait for child child process
	}

	ret=xrtGraphWait(ghdl,0);
	if(ret){
		std::cout << "Graph wait error" << std::endl;
	}else{
		std::cout<<"Graph done"<<std::endl;
	}
	xrtGraphClose(ghdl);
	xrtDeviceClose(dhdl);
	std::cout<<"Child process:"<<id<<" done"<<std::endl;
	return 0;
}

int main(int argc, char* argv[])
{
	try {
		for(int i=0;i<GRAPH_NUM;i++){
			if(fork()==0){//child
				auto match = run(argc, argv,i);
				std::cout << "TEST child " <<i<< (match ? " FAILED" : " PASSED") << "\n";
				return (match ? EXIT_FAILURE :  EXIT_SUCCESS);
			}else{
				size_t output_size_in_bytes = OUTPUT_SIZE * sizeof(int);
				//TARGET_DEVICE macro needs to be passed from gcc command line
				if(argc != 2) {
					std::cout << "Usage: " << argv[0] <<" <xclbin>" << std::endl;
					return EXIT_FAILURE;
				}
				char* xclbinFilename = argv[1];
				
				int ret;
				// Open xclbin
				auto device = xrt::device(0); //device index=0
				auto uuid = device.load_xclbin(xclbinFilename);
			
				// s2mm & data_generator kernel handle
				std::string s2mm_kernel_name=std::string("s2mm:{s2mm_")+std::to_string(i+1)+std::string("}");
				xrt::kernel s2mm = xrt::kernel(device, uuid, s2mm_kernel_name.data());
				std::string data_generator_kernel_name=std::string("data_generator:{data_generator_")+std::to_string(i+1)+std::string("}");
				xrt::kernel data_generator = xrt::kernel(device, uuid, data_generator_kernel_name.data());
			
				// output memory
				auto out_bo=xrt::bo(device, output_size_in_bytes,s2mm.group_id(0));
				auto host_out=out_bo.map<int*>();
				auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//1st run for s2mm has started
				auto data_generator_run = data_generator(nullptr, OUTPUT_SIZE);

				// wait for s2mm done
				std::cout<<"Waiting s2mm to complete"<<std::endl;
				auto state = s2mm_run.wait();
				std::cout << "s2mm "<<" completed with status(" << state << ")"<<std::endl;
				out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
				
				int match = 0;
				int counter=0;
				for (int i = 0; i < OUTPUT_SIZE/2/16; i++) {
					for(int j=0;j<16;j++){
						if(host_out[i*16+j]!=counter+value1[j]){
							std::cout<<"ERROR: num="<<i*16+j<<" out="<<host_out[i*16+j]<<std::endl;
							match=1;
							break;
						}
						counter++;
					}
				}
				for(int i=OUTPUT_SIZE/2/16;i<OUTPUT_SIZE/16;i++){
					for(int j=0;j<16;j++){
						if(host_out[i*16+j]!=counter+value2[j]){
							std::cout<<"ERROR: num="<<i*16+j<<" out="<<host_out[i*16+j]<<std::endl;
							match=1;
							break;
						}
						counter++;
					}
				}

				std::cout << "TEST " <<i<< (match ? " FAILED" : " PASSED") << "\n";
				while(wait(NULL)>0){//Wait for all child process
				}
				std::cout<<"all done"<<std::endl;
				return (match ? EXIT_FAILURE :  EXIT_SUCCESS);
			}
		}
	}	
		catch (std::exception const& e) {
		std::cout << "Exception: " << e.what() << "\n";
		std::cout << "FAILED TEST\n";
		return 1;
	}
}