Split IO memory model is introduced to resolve the limitation within unique memory model
so that data coming from other physical memory buffer can be consumed by DPU directly.
dpuCreateTask() to create DPU task from the DPU kernel
compiled with options
-split-io-mem, N2Cube only allocates DPU memory
buffer for the intermediate feature maps. It is up to the users to allocate the physical
continuous memory buffers for boundary input tensors and output tensors individually.
The size of input memory buffer and output memory buffer can be found from compiler
building log with the field names Input Mem Size and Output Mem Size. The users also
need to take care of cache coherence if these memory buffers can be cached.
split_io provides a programming reference for split IO
memory model, and the TensorFlow model SSD is used. There is one input tensor image:0,
and two output tensors
ssd300_concat_1:0 for SSD model. From compiler building log, you can see
that the size of DPU input memory buffer (for tensor image:0) is 270000, and the size of
DPU output memory buffer (for output tensors
ssd300_concat_1:0) is 218304. Then
is used to allocate memory buffers for them.
dpuBindOutputTensorBaseAddress() are subsequently used to bind the
input/output memory buffer address to DPU task before launching its execution. After the
input data is fed into DPU input memory buffer,
called to flush cache line. When DPU task completes running,
dpuSyncDevToMem() is called to invalidate the cache line.
dpuSyncDevToMem()are provided only as demonstration purpose for split IO memory model. They are not expected to be used directly in your production environment. It is up to you whether you want to implement such functionalities to better meet customized requirements.