Generalisation and the Geometry of Class Separability
Recent results in deep learning show that considering only the capacity of machines does not adequately explain the generalisation performance we can observe. We propose that by considering the geometry of the data we can better explain generalisation achieved in deep learning. In particular we show that in classification the separability of the data can explain how good generalisation can be achieved in high dimensions. Further we show that layers within a CNNs sequentially increase the linear separability of data, and that the information these layers retain or discard can help explain why these models generalise.
PDF Abstract