Add batch normalization folding to QAT quantizer#3911
Conversation
| """ | ||
|
|
||
| def __init__(self, model, config_list, optimizer=None): | ||
| def __init__(self, model, config_list, optimizer=None, model_inputs=None): |
There was a problem hiding this comment.
Is model_inputs the same concept with dummy_input in pruning speedup and quantization speedup? If so, recommend using dummy_input instead of model_inputs to be aligned.
|
Looks good. I only have one question right now. Is there any problems If we want to export simulated model with new feature bn folding to backend execution engine such as TensorRT? For instance, during inference, conv+bn+relu will be fused into singel op by updating the conv's weight/bias parameter with bn parameters. However, currently our conv's weights have already been equal to fused weight while bn layer still exists. If the problem actual exists, maybe we can discuss an appropriate method to resolve it. |
|
You are right. I have added some code logic to restore folded weight/bias in |
|
Please update content of bn folding in doc Supported Quantization Algorithms on NNI. |
|
the content of bn folding has been added |
| def fold_bn(self, config, **kwargs): | ||
| # TODO simulate folded weight | ||
| pass | ||
| def fold_bn(self, *inputs, wrapper): |
There was a problem hiding this comment.
this function is QAT_Quantizer specific? other quantizers may have a different fold_bn function?
There was a problem hiding this comment.
This function should also work well for other quantizers. (at least for lsq quantizer I think:) ). I will make it a common utility function in the pr that enables batch normalization folding for other quantizers.
This pr adds batch normalization folding to the QAT quantizer, the core ideas are described in #3890