Iterative PruningThe method includes two stages: model analysis and pruned model generation. After the model analysis is completed, analysis result is saved in the file named .vai/xxx.sens. You can prune a model iteratively using this file. In other words, you should prune the model to the target sparsity gradually to avoid the failure to improve the model performance in the retraining stage that is caused by setting a too high pruning ratio.
- Define an evaluation function. The function must take a model as
its first argument and return a
def eval_fn(model, dataloader): top1 = AverageMeter('Acc@1', ':6.2f') model.eval() with torch.no_grad(): for i, (images, targets) in enumerate(dataloader): images = images.cuda() targets = targets.cuda() outputs = model(images) acc1, _ = accuracy(outputs, targets, topk=(1, 5)) top1.update(acc1, images.size(0)) return top1.avg
- Run model analysis and get a pruned
runner.ana(eval_fn, args=(val_loader,)) model = pruning_runner.prune(removal_ratio=0.2)
Run analysis only once for the same model. You can prune the model iteratively without re-running analysis because there is only one pruned model generated for a specific pruning ratio. The subnetwork obtained by pruning may not be very good because an approximate algorithm is used to generate this unique pruned model according to the analysis result. The one-step pruning method can generate a better subnetwork.
The method also include two stages: adaptive-BN-based searching for pruning strategy and pruned model generation. After searching, a file named .vai/xxx.search is generated in which the search result (pruning strategies and corresponding evaluation scores) is stored. You can get the final pruned model in one-step.
num_subnet provides the number of
candidate subnetworks satisfying the sparsity requirement to be searched. The best
subnetwork can be selected from these candidates. The higher the value, the longer
it takes to search, but the higher the probability of finding a better subnetwork.
# Adaptive-BN-based searching for pruning strategy. 'calibration_fn' is a function for calibrating BN layer's statistics. runner.search(gpus=['0'], calibration_fn=calibration_fn, calib_args=(val_loader,), eval_fn=eval_fn, eval_args=(val_loader,), num_subnet=1000, removal_ratio=0.7) model = runner.prune(removal_ratio=0.7, index=None)
eval_fn is the same with
iterative pruning method. A
that implements adaptive-BN is shown in the following example code. You should
define your code similarly.
def calibration_fn(model, dataloader, number_forward=100): model.train() with torch.no_grad(): for index, (images, target) in enumerate(dataloader): images = images.cuda() model(images) if index > number_forward: break
The one-step pruning method has several advantages over the iterative approach.
- The generated pruned models are more accurate. All subnetworks that meet the requirements are evaluated.
- The workflow is simpler because you can obtain the final pruned model in one step without iterations.
- Retraining a slim model is faster than a sparse model.
There are two disadvantages to one-step pruning: One is that the random generation of pruning strategy is unstable. The other is that the subnetwork searching must be performed once for every pruning ratio.