Starting with Vitis AI 2.5, Pytorch model and Tensorflow2 model with custom op are supported. The basic workflow for custom op is shown below.
Figure 1. Custom Op Workflow
The following are the steps in the workflow:
- Define the OP as a custom OP which is unknown to XIR and then quantize the model.
- Compile the quantized model.
- Register and implement the custom OP.
- Deploy the model with graph_runner APIs.
Note: If you want to implement an accelerated (PL or
AI Engine) function for a
custom op, make it as a CPU OP, but implement the PL/AI Engine calling codes in this CPU OP's
implementation.
For the step 4, graph_runner APIs support both C++ and Python. When using the Graph_runner API to deploy Custom OP, its runtime has been optimized, including Zero-copy technology between different DPU OPs and CPU OPs. It means address sharing between different layers without data copying.
The following model structure is supported by Zero copy.
Type | Output of OP | Input of OP | Using Zero copy |
---|---|---|---|
a | Single dpu OP | Single cpu OP | Yes |
b | Single cpu OP | Single dpu OP | Yes |
c | Single cpu OP | Single cpu OP | Yes |
d | Single dpu OP | Multiple cpu OP | Yes |
e | Multiple cpu OP and multiple dpu OP | Single cpu OP | Yes |
Note: Model structure types a-e are
shown in the figure below.
Figure 2. Model Structure Types
Note: The application of Zero copy for the other
model structures depends on the situation.
The following are examples for the two models respectively.
- MNIST model based on Tensorflow2
- Pointpillars model based on Pytorch