Iterative Pruning - 3.5 English

Vitis AI User Guide (UG1414)

Document ID
Release Date
3.5 English

The design of the pruning algorithm is such that it reduces the number of model parameters while minimizing the accuracy loss. The process is iterative, as shown in the following figure. Pruning results in accuracy loss, while fine-tuning of the remaining weights through training recovers accuracy. A trained, unpruned model serves as the input for the first iteration, referred to as the baseline model. This model is pruned and fine-tuned. Next, the fine-tuned model obtained from the previous iteration becomes the new baseline and is again pruned and fine-tuned. This process is repeated through multiple iterations until the desired sparsity is reached. This iterative approach is required because a model cannot be pruned with a high pruning ratio in a single pass while maintaining accuracy. When too many parameters are pruned in a single iteration, the accuracy loss can become too steep, making accuracy recovery through fine-tuning impossible.

Important: The parameters are progressively reduced at each iteration to improve accuracy during the fine-tuning stage.

Leveraging the process of iterative pruning, higher pruning rates can be achieved without any significant loss of model performance.

Figure 1. Iterative Pruning

The four primary stages in iterative pruning are as follows:

Perform a sensitivity analysis on the model to determine the optimal pruning strategy.
Reduce the number of computations in the input model.
Retrain the pruned model to recover accuracy.
Generate a dense model with fewer weights.