Split IO Memory Model - 1.4 English

Vitis AI User Guide (UG1414)

Document ID
UG1414
Release Date
2021-07-22
Version
1.4 English

Split IO memory model is introduced to resolve the limitation within unique memory model so that data coming from other physical memory buffer can be consumed by DPU directly. When calling dpuCreateTask() to create DPU task from the DPU kernel compiled with options -split-io-mem, N2Cube only allocates DPU memory buffer for the intermediate feature maps. It is up to the users to allocate the physical continuous memory buffers for boundary input tensors and output tensors individually. The size of input memory buffer and output memory buffer can be found from compiler building log with the field names Input Mem Size and Output Mem Size. The users also need to take care of cache coherence if these memory buffers can be cached.

DNNDK sample split_io provides a programming reference for split IO memory model, and the TensorFlow model SSD is used. There is one input tensor image:0, and two output tensors ssd300_concat:0 and ssd300_concat_1:0 for SSD model. From compiler building log, you can see that the size of DPU input memory buffer (for tensor image:0) is 270000, and the size of DPU output memory buffer (for output tensors ssd300_concat:0 and ssd300_concat_1:0) is 218304. Then dpuAllocMem() is used to allocate memory buffers for them. dpuBindInputTensorBaseAddress() and dpuBindOutputTensorBaseAddress() are subsequently used to bind the input/output memory buffer address to DPU task before launching its execution. After the input data is fed into DPU input memory buffer, dpuSyncMemToDev() is called to flush cache line. When DPU task completes running, dpuSyncDevToMem() is called to invalidate the cache line.

Note: The four APIs dpuAllocMem(), dpuFreeMem(), dpuSyncMemToDev() and dpuSyncDevToMem() are provided only as demonstration purpose for split IO memory model. They are not expected to be used directly in your production environment. It is up to you whether you want to implement such functionalities to better meet customized requirements.