Split I/O Memory Model

Split I/O Memory Model - 1.3 English

Vitis AI User Guide (UG1414)

Document ID

UG1414

Release Date

2021-02-03

Version

1.3 English

The split I/O memory model is introduced to resolve the limitation of the unique memory model so that data coming from other physical memory buffer can be consumed by the DPU directly. When the dpuCreateTask() function is called to create a DPU task from the DPU kernel compiled with the -split-io-mem option, N2Cube only allocates the DPU memory buffer for the intermediate feature maps. It is up to you to allocate the physical continuous memory buffers for boundary input tensors and output tensors individually. The size of the input memory buffer and the output memory buffer can be found from the compiler building log with the field names Input Mem Size and Output Mem Size. You also need to take care of cache coherence, if these memory buffers can be cached.

DNNDK sample split_io provides a programming reference for the split I/O memory model. The TensorFlow SSD is used as the reference. There is one input tensor image:0, and two output tensors ssd300_concat:0 and ssd300_concat_1:0 for the SSD model. From the compiler building log, you can see that the size of the DPU input memory buffer (for tensor image:0) is 270000, and the size of the DPU output memory buffer (for output tensors ssd300_concat:0 and ssd300_concat_1:0) is 218304. dpuAllocMem() is used to allocate memory buffers for them. dpuBindInputTensorBaseAddress() and dpuBindOutputTensorBaseAddress() are subsequently used to bind the input/output memory buffer address to DPU task before launching its execution. After the input data is fed into the DPU input memory buffer, dpuSyncMemToDev() is called to flush the cache line. When the DPU task completes running, dpuSyncDevToMem() is called to invalidate the cache line.

Note: The four APIs: dpuAllocMem(), dpuFreeMem(), dpuSyncMemToDev(), and dpuSyncDevToMem() are provided only to demonstrate the split IO memory model. They are not expected to be used directly in your production environment. It is up to you whether you want to implement such functionalities to better meet customized requirements.