For each DPU task in this mode, all its boundary input tensors and output
tensors together with its intermediate feature maps stay within one physical continuous
memory buffer, which is allocated automatically while calling
dpuCreateTask() to instantiate one DPU task from one DPU kernel. This DPU
memory buffer can be cached in order to optimize memory access from the Arm CPU side.
Cache flushing and invalidation is handled by N2Cube. Therefore, you don’t need to take
care of DPU memory management and cache manipulation. It is very easy to deploy models
with unique memory model, which is the case for most of the
You should copy unique memory model demands, that input data after pre-processing, into the boundary input tensors of DPU task’s memory buffer. After this, you can launch the DPU task for running. This may bring additional overhead as there might be situations where the pre-processed input Int8 data already stays in a physical continuous memory buffer. This buffer which can be accessed by DPU directly. One example is the camera based deep learning application. The pre-processing over each input image from the camera sensor can be accelerated by FPGA logic, such as image scaling, model normalization, and Float32-to-Int8 quantization. The log result data is then logged to the physical continuous memory buffer. With a unique memory model, this data must be copied to DPU input memory buffer again.