1 code implementation • ICLR 2019 • Artemy Kolchinsky, Brendan D. Tracey, Steven Van Kuyk
We demonstrate three caveats when using IB in any situation where $Y$ is a deterministic function of $X$: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of $\beta$; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal.