no code implementations • 29 Sep 2021 • Yicun Duan, Xiangjun Peng
However, those techniques do not keep the compressed representation during inference runtime, which incurs significant overheads in terms of both performance and space consumption.
Quantization