Search Results for author: Charles H. Martin

Found 9 papers, 5 papers with code

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

1 code implementation • NeurIPS 2023 • Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang

In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training.

Scheduling

Paper
Code

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

1 code implementation • 6 Feb 2022 • Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics.

Model Selection

Paper
Code

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

no code implementations • 1 Jun 2021 • Charles H. Martin, Michael W. Mahoney

Our results highlight the subtlety of comparing models when both architectures and hyperparameters are varied; the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality; and the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of NN models.

Learning Theory

Paper
Add Code

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

1 code implementation • 17 Feb 2020 • Charles H. Martin, Tongsu, Peng, Michael W. Mahoney

We find that norm based metrics correlate well with reported test accuracies for well-trained models, but that they often cannot distinguish well-trained versus poorly-trained models.

Paper
Code

Traditional and Heavy Tailed Self Regularization in Neural Network Models

no code implementations • ICLR 2019 • Charles H. Martin, Michael W. Mahoney

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.

Paper
Add Code

Traditional and Heavy-Tailed Self Regularization in Neural Network Models

2 code implementations • 24 Jan 2019 • Charles H. Martin, Michael W. Mahoney

1,393

Paper
Code

Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

no code implementations • 24 Jan 2019 • Charles H. Martin, Michael W. Mahoney

In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this.

Paper
Add Code

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

3 code implementations • 2 Oct 2018 • Charles H. Martin, Michael W. Mahoney

Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.

Paper
Code

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

no code implementations • ICLR 2018 • Charles H. Martin, Michael W. Mahoney

Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.