Search Results for author: Charles H. Martin

Found 9 papers, 5 papers with code

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

1 code implementation NeurIPS 2023 Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang

In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training.

Scheduling

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

1 code implementation6 Feb 2022 Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics.

Model Selection

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

no code implementations1 Jun 2021 Charles H. Martin, Michael W. Mahoney

Our results highlight the subtlety of comparing models when both architectures and hyperparameters are varied; the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality; and the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of NN models.

Learning Theory

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

1 code implementation17 Feb 2020 Charles H. Martin, Tongsu, Peng, Michael W. Mahoney

We find that norm based metrics correlate well with reported test accuracies for well-trained models, but that they often cannot distinguish well-trained versus poorly-trained models.

Traditional and Heavy Tailed Self Regularization in Neural Network Models

no code implementations ICLR 2019 Charles H. Martin, Michael W. Mahoney

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.

Traditional and Heavy-Tailed Self Regularization in Neural Network Models

2 code implementations24 Jan 2019 Charles H. Martin, Michael W. Mahoney

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.

Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

no code implementations24 Jan 2019 Charles H. Martin, Michael W. Mahoney

In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this.

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

3 code implementations2 Oct 2018 Charles H. Martin, Michael W. Mahoney

Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

no code implementations ICLR 2018 Charles H. Martin, Michael W. Mahoney

Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.

Cannot find the paper you are looking for? You can Submit a new open access paper.