In coarse-grained pruning, also known as channel pruning, the objective is to prune channels instead of individual weights. The result is a computational graph where the pruning algorithm prunes one or more convolution kernels for a given layer. For instance, a convolution layer with 128 channels prior to pruning might require the computation of only 57 channels post-pruning.
Channel pruning is very friendly to hardware acceleration and can be implemented with any inference architecture. However, the overall pruning ratio achievable is lower than is possible with fine-grained implementations simply because an entire kernel must be pruned always.
Coarse-grained pruning always reduces the accuracy of the original model. Retraining (finetuning) adjusts the remaining weights to recover accuracy. The technique works well on large models that leverage conventional convolutions, for example, ResNet and VGGNet. However, with depthwise convolution models such as MobileNet-v2, the accuracy of the pruned model drops dramatically, even at a low pruning rate.