Most neural networks are typically over-parameterized, with significant redundancy to achieve a certain accuracy. “Pruning” is the process of eliminating redundant weights while keeping the accuracy loss as low as possible.
The simplest form of pruning is called “fine-grained pruning” and results in sparse weight matrices. The Vitis AI pruner employs the “coarse-grained pruning” method, which eliminates neurons that do not contribute significantly to the network’s accuracy. For convolutional layers, “coarse-grained pruning” prunes the entire 3D kernel and so is also called channel pruning.
Pruning always reduces the accuracy of the original model. Retraining (finetuning) adjusts the remaining weights to recover accuracy.