Search Results for author: Tengyu Ma

Found 117 papers, 41 papers with code

Entity and Evidence Guided Document-Level Relation Extraction

no code implementations • ACL (RepL4NLP) 2021 • Kevin Huang, Peng Qi, Guangtao Wang, Tengyu Ma, Jing Huang

In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that jointly extracts relations and the underlying evidence sentences by using large pretrained language model (LM) as input encoder.

Document-level Relation Extraction Language Modelling +1

Paper
Add Code

Linguistic Calibration of Language Models

no code implementations • 30 Mar 2024 • Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto

Our results demonstrate that long-form generations may be calibrated end-to-end by constructing an objective in the space of the predictions that users make in downstream decision-making.

Decision Making Question Answering

Paper
Add Code

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

no code implementations • 20 Feb 2024 • Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma

Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT.

Decoder

Paper
Add Code

Trash to Treasure: Low-Light Object Detection via Decomposition-and-Aggregation

no code implementations • 7 Sep 2023 • Xiaohan Cui, Long Ma, Tengyu Ma, JinYuan Liu, Xin Fan, Risheng Liu

In this work, we try to arouse the potential of enhancer + detector.

object-detection Object Detection

Paper
Add Code

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

no code implementations • 7 Jul 2023 • Arvind Mahankali, Tatsunori B. Hashimoto, Tengyu Ma

Then, we find that changing the distribution of the covariates and weight vector to a non-isotropic Gaussian distribution has a strong impact on the learned algorithm: the global minimizer of the pre-training loss now implements a single step of $\textit{pre-conditioned}$ GD.

In-Context Learning regression

Paper
Add Code

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

no code implementations • 22 Jun 2023 • Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions.

Inductive Bias

Paper
Add Code

Large Language Models as Tool Makers

1 code implementation • 26 May 2023 • Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou

Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks.

1,004

Paper
Code

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

3 code implementations • 23 May 2023 • Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma

Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.

Language Modelling Stochastic Optimization

901

Paper
Code

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations • NeurIPS 2023 • Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

474

Paper
Code

Symbol tuning improves in-context learning in language models

no code implementations • 15 May 2023 • Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

Paper
Add Code

Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields

no code implementations • 29 Apr 2023 • Kefan Dong, Tengyu Ma

Our key technical novelty is to prove that the degree-$k$ spherical harmonics components of a function from Gaussian random field cannot be spiky in that their $L_\infty$/$L_2$ ratios are upperbounded by $O(d \sqrt{\ln k})$ with high probability.

Paper
Add Code

Larger language models do in-context learning differently

no code implementations • 7 Mar 2023 • Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

Paper
Add Code

Data Selection for Language Models via Importance Resampling

1 code implementation • NeurIPS 2023 • Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang

To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space.

195

Paper
Code

What learning algorithm is in-context learning? Investigations with linear models

no code implementations • 28 Nov 2022 • Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou

We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.

In-Context Learning regression

Paper
Add Code

A Theoretical Study of Inductive Biases in Contrastive Learning

no code implementations • 27 Nov 2022 • Jeff Z. HaoChen, Tengyu Ma

Understanding self-supervised learning is important but challenging.

Clustering Contrastive Learning +1

Paper
Add Code

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

no code implementations • 21 Nov 2022 • Kefan Dong, Tengyu Ma

The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift.

Paper
Add Code

How Does Sharpness-Aware Minimization Minimize Sharpness?

no code implementations • 10 Nov 2022 • Kaiyue Wen, Tengyu Ma, Zhiyuan Li

SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.

Paper
Add Code

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models

no code implementations • 25 Oct 2022 • Hong Liu, Sang Michael Xie, Zhiyuan Li, Tengyu Ma

Toward understanding this implicit bias, we prove that SGD with standard mini-batch noise implicitly prefers flatter minima in language models, and empirically observe a strong correlation between flatness and downstream performance among models with the same minimal pre-training loss.

Language Modelling

Paper
Add Code

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

no code implementations • 18 Jul 2022 • Ananya Kumar, Tengyu Ma, Percy Liang, aditi raghunathan

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM.

Paper
Add Code

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

no code implementations • 16 Jun 2022 • Margalit Glasgow, Colin Wei, Mary Wootters, Tengyu Ma

Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails.

Generalization Bounds Memorization

Paper
Add Code

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

no code implementations • 6 Jun 2022 • Kefan Dong, Tengyu Ma

Past research on interactive decision making problems (bandits, reinforcement learning, etc.)

Decision Making Multi-Armed Bandits +2

Paper
Add Code

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

no code implementations • 22 May 2022 • Haoyuan Cai, Tengyu Ma, Simon Du

In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$.

Paper
Add Code

Toward Fast, Flexible, and Robust Low-Light Image Enhancement

1 code implementation • CVPR 2022 • Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, Zhongxuan Luo

Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios.

Computational Efficiency Face Detection +2

416

Paper
Code

Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations

no code implementations • 6 Apr 2022 • Jeff Z. HaoChen, Colin Wei, Ananya Kumar, Tengyu Ma

In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target domain accurately, even though the representations of the two domains are far from each other.

Contrastive Learning Unsupervised Domain Adaptation

Paper
Add Code

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

no code implementations • 1 Apr 2022 • Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang

We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.

Contrastive Learning Unsupervised Domain Adaptation

Paper
Add Code

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

3 code implementations • 21 Feb 2022 • Ananya Kumar, aditi raghunathan, Robbie Jones, Tengyu Ma, Percy Liang

However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.

Paper
Code

Safe Reinforcement Learning by Imagining the Near Future

1 code implementation • NeurIPS 2021 • Garrett Thomas, Yuping Luo, Tengyu Ma

Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences.

Continuous Control reinforcement-learning +2

Paper
Code

Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision

1 code implementation • 9 Dec 2021 • Risheng Liu, Long Ma, Tengyu Ma, Xin Fan, Zhongxuan Luo

To partially address above issues, we establish Retinex-inspired Unrolling with Architecture Search (RUAS), a general learning framework, which not only can address low-light enhancement task, but also has the flexibility to handle other more challenging downstream vision applications.

Rolling Shutter Correction

Paper
Code

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

no code implementations • ICLR 2022 • Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.

Atari Games D4RL +3

Paper
Add Code

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

1 code implementation • 22 Nov 2021 • Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.

Continuous Control Multi-agent Reinforcement Learning +3

Paper
Code

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

1 code implementation • 5 Nov 2021 • Margalit Glasgow, Honglin Yuan, Tengyu Ma

In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable.

Federated Learning

Paper
Code

An Explanation of In-context Learning as Implicit Bayesian Inference

1 code implementation • ICLR 2022 • Sang Michael Xie, aditi raghunathan, Percy Liang, Tengyu Ma

At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt.

Few-Shot Learning In-Context Learning +1

Paper
Code

Self-supervised Learning is More Robust to Dataset Imbalance

1 code implementation • ICLR 2022 • Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma

Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.

Ranked #9 on Long-tail Learning on CIFAR-10-LT (ρ=100)

Long-tail Learning Self-Supervised Learning

Paper
Code

Calibrated ensembles - a simple way to mitigate ID-OOD accuracy tradeoffs

no code implementations • 29 Sep 2021 • Ananya Kumar, aditi raghunathan, Tengyu Ma, Percy Liang

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy.

Paper
Add Code

Fine-Tuning Distorts Pretrained Features and Underperforms Out-of-Distribution

no code implementations • ICLR 2022 • Ananya Kumar, aditi raghunathan, Robbie Matthew Jones, Tengyu Ma, Percy Liang

It is well known that fine-tuning leads to better accuracy in-distribution (ID).

Paper
Add Code

Statistically Meaningful Approximation: a Theoretical Analysis for Approximating Turing Machines with Transformers

no code implementations • 29 Sep 2021 • Colin Wei, Yining Chen, Tengyu Ma

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.

Paper
Add Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

860

Paper
Code

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

1 code implementation • NeurIPS 2021 • Yuping Luo, Tengyu Ma

This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

no code implementations • 28 Jul 2021 • Colin Wei, Yining Chen, Tengyu Ma

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.

Generalization Bounds

Paper
Add Code

Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration

no code implementations • NeurIPS 2021 • Shengjia Zhao, Michael P. Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon

In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers.

Decision Making

Paper
Add Code

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

no code implementations • 18 Jun 2021 • Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski

Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.

Domain Generalization

Paper
Add Code

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

1 code implementation • NeurIPS 2021 • Colin Wei, Sang Michael Xie, Tengyu Ma

The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.

Task 2

Paper
Code

Label Noise SGD Provably Prefers Flat Global Minimizers

no code implementations • NeurIPS 2021 • Alex Damian, Tengyu Ma, Jason D. Lee

In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.

Paper
Add Code

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System

no code implementations • 9 Jun 2021 • Zichuan Lin, Jing Huang, BoWen Zhou, Xiaodong He, Tengyu Ma

Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e. g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation.

Data Augmentation Goal-Oriented Dialog

Paper
Add Code

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

1 code implementation • NeurIPS 2021 • Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma

Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i. e., data augmentations of the same image).

Contrastive Learning Generalization Bounds +1

Paper
Code

Variance-reduced First-order Meta-learning for Natural Language Processing Tasks

no code implementations • NAACL 2021 • Lingxiao Wang, Kevin Huang, Tengyu Ma, Quanquan Gu, Jing Huang

The core of our algorithm is to introduce a novel variance reduction term to the gradient estimation when performing the task adaptation.

dialog state tracking Few-Shot Text Classification +2

Paper
Add Code

Why Do Local Methods Solve Nonconvex Problems?

no code implementations • 24 Mar 2021 • Tengyu Ma

Non-convex optimization is ubiquitous in modern machine learning.

BIG-bench Machine Learning

Paper
Add Code

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap

no code implementations • 9 Feb 2021 • Haike Xu, Tengyu Ma, Simon S. Du

We further show that for general MDPs, AMB suffers an additional $\frac{|Z_{mul}|}{\Delta_{min}}$ regret, where $Z_{mul}$ is the set of state-action pairs $(s, a)$'s satisfying $a$ is a non-unique optimal action for $s$.

Multi-Armed Bandits

Paper
Add Code

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

no code implementations • NeurIPS 2021 • Kefan Dong, Jiaqi Yang, Tengyu Ma

This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Improved Uncertainty Post-Calibration via Rank Preserving Transforms

no code implementations • 1 Jan 2021 • Yu Bai, Tengyu Ma, Huan Wang, Caiming Xiong

In this paper, we propose Neural Rank Preserving Transforms (NRPT), a new post-calibration method that adjusts the output probabilities of a trained classifier using a calibrator of higher capacity, while maintaining its prediction accuracy.

text-classification Text Classification

Paper
Add Code

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

1 code implementation • ICLR 2021 • Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang

To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).

Time Series Time Series Analysis +1

Paper
Code

Meta-learning Transferable Representations with a Single Target Domain

no code implementations • 3 Nov 2020 • Hong Liu, Jeff Z. HaoChen, Colin Wei, Tengyu Ma

Recent works found that fine-tuning and joint training---two popular approaches for transfer learning---do not always improve accuracy on downstream tasks.

Meta-Learning Representation Learning +1

Paper
Add Code

Beyond Lazy Training for Over-parameterized Tensor Decomposition

no code implementations • NeurIPS 2020 • Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.

Tensor Decomposition

Paper
Add Code

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling

1 code implementation • 21 Oct 2020 • Wenxuan Zhou, Kevin Huang, Tengyu Ma, Jing Huang

In this paper, we propose two novel techniques, adaptive thresholding and localized context pooling, to solve the multi-label and multi-entity problems.

Ranked #6 on Relation Extraction on ReDocRED

Document-level Relation Extraction Multi-Label Classification +2

189

Paper
Code

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

no code implementations • ICLR 2021 • Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma

Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks.

Generalization Bounds Unsupervised Domain Adaptation

Paper
Add Code

Simplifying Models with Unlabeled Output Data

no code implementations • 28 Sep 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang

We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.

Code Translation Image Generation +2

Paper
Add Code

Entity and Evidence Guided Relation Extraction for DocRED

no code implementations • 27 Aug 2020 • Kevin Huang, Guangtao Wang, Tengyu Ma, Jing Huang

Document-level relation extraction is a challenging task which requires reasoning over multiple sentences in order to predict relations in a document.

Ranked #14 on Relation Extraction on DocRED

Document-level Relation Extraction Language Modelling +1

Paper
Add Code

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

no code implementations • 9 Jul 2020 • Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

We consider the dynamic of gradient descent for learning a two-layer neural network.

Vocal Bursts Valence Prediction

Paper
Add Code

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

1 code implementation • ICLR 2021 • Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma

Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed.

Ranked #11 on Image Classification on WebVision-1000

Image Classification

Paper
Code

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

2 code implementations • 29 Jun 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang

Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).

Code Translation Denoising +2

Paper
Code

Active Online Learning with Hidden Shifting Domains

no code implementations • 25 Jun 2020 • Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.

Domain Adaptation regression

Paper
Add Code

Individual Calibration with Randomized Forecasting

no code implementations • ICML 2020 • Shengjia Zhao, Tengyu Ma, Stefano Ermon

We show that calibration for individual samples is possible in the regression setup if the predictions are randomized, i. e. outputting randomized credible intervals.

Decision Making Fairness +1

Paper
Add Code

Self-training Avoids Using Spurious Features Under Domain Shift

no code implementations • NeurIPS 2020 • Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma

In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close.

Unsupervised Domain Adaptation

Paper
Add Code

Model-based Adversarial Meta-Reinforcement Learning

1 code implementation • NeurIPS 2020 • Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

When the test task distribution is different from the training task distribution, the performance may degrade significantly.

Continuous Control Meta Reinforcement Learning +2

Paper
Code

Federated Accelerated Stochastic Gradient Descent

1 code implementation • NeurIPS 2020 • Honglin Yuan, Tengyu Ma

We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a principled acceleration of Federated Averaging (FedAvg, also known as Local SGD) for distributed optimization.

Distributed Optimization

Paper
Code

Shape Matters: Understanding the Implicit Bias of the Noise Covariance

1 code implementation • 15 Jun 2020 • Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma

We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms.

Paper
Code

Active Online Domain Adaptation

no code implementations • ICML Workshop LifelongML 2020 • Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.

Online Domain Adaptation regression

Paper
Add Code

MOPO: Model-based Offline Policy Optimization

6 code implementations • NeurIPS 2020 • Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

We also characterize the trade-off between the gain and risk of leaving the support of the batch data.

Continuous Control Offline RL +1

238

Paper
Code

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

no code implementations • ICLR 2020 • Colin Wei, Tengyu Ma

For linear classifiers, the relationship between (normalized) output margin and generalization is captured in a clear and simple bound – a large output margin implies good generalization.

Generalization Bounds Robust classification

Paper
Add Code

Robust and On-the-fly Dataset Denoising for Image Classification

no code implementations • ECCV 2020 • Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma

We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.

Ranked #35 on Image Classification on mini WebVision 1.0

Classification counterfactual +4

Paper
Add Code

Optimal Regularization Can Mitigate Double Descent

no code implementations • ICLR 2021 • Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.

regression

Paper
Add Code

The Implicit and Explicit Regularization Effects of Dropout

1 code implementation • ICML 2020 • Colin Wei, Sham Kakade, Tengyu Ma

This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent.

Paper
Code

Understanding Self-Training for Gradual Domain Adaptation

2 code implementations • ICML 2020 • Ananya Kumar, Tengyu Ma, Percy Liang

Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces.

Ranked #1 on Unsupervised Domain Adaptation on Portraits (over time)

Unsupervised Domain Adaptation

Paper
Code

Variable-Viewpoint Representations for 3D Object Recognition

no code implementations • 8 Feb 2020 • Tengyu Ma, Joel Michelson, James Ainooson, Deepayan Sanyal, Xiaohan Wang, Maithilee Kunda

For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions.

3D Object Recognition Object

Paper
Add Code

On the Expressivity of Neural Networks for Deep Reinforcement Learning

1 code implementation • ICML 2020 • Kefan Dong, Yuping Luo, Tengyu Ma

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin

1 code implementation • 9 Oct 2019 • Colin Wei, Tengyu Ma

Unfortunately, for deep models, this relationship is less clear: existing analyses of the output margin give complicated bounds which sometimes depend exponentially on depth.

General Classification Generalization Bounds +1

Paper
Code

Bootstrapping the Expressivity with Model-based Planning

1 code implementation • 25 Sep 2019 • Kefan Dong, Yuping Luo, Tengyu Ma

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.

Paper
Code

Verified Uncertainty Calibration

3 code implementations • NeurIPS 2019 • Ananya Kumar, Percy Liang, Tengyu Ma

In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators.

Weather Forecasting

133

Paper
Code

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

1 code implementation • ICLR 2020 • Yuping Luo, Huazhe Xu, Tengyu Ma

Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently.

Imitation Learning reinforcement-learning +1

Paper
Code

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning

no code implementations • 11 Jul 2019 • Nicholas C. Landolfi, Garrett Thomas, Tengyu Ma

We then adapt the dynamical model with samples from this policy in the real environment.

Continuous Control Meta-Learning +2

Paper
Add Code

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

2 code implementations • NeurIPS 2019 • Yuanzhi Li, Colin Wei, Tengyu Ma

This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing.

Paper
Code

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

7 code implementations • NeurIPS 2019 • Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma

Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes.

Ranked #4 on Long-tail learning with class descriptors on CUB-LT

Long-tail learning with class descriptors

633

Paper
Code

On the Performance of Thompson Sampling on Logistic Bandits

no code implementations • 12 May 2019 • Shi Dong, Tengyu Ma, Benjamin Van Roy

Specifically, we establish that, when the set of feasible actions is identical to the set of possible coefficient vectors, the Bayesian regret of Thompson sampling is $\tilde{O}(d\sqrt{T})$.

Thompson Sampling

Paper
Add Code

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

1 code implementation • NeurIPS 2019 • Colin Wei, Tengyu Ma

For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.

Paper
Code

Explaining Adversarial Examples with Knowledge Representation

no code implementations • ICLR 2019 • Xingyu Zhou, Tengyu Ma, Huahong Zhang

This paper, in contrast, discusses the origin of adversarial examples from a more underlying knowledge representation point of view.

Paper
Add Code

On the Margin Theory of Feedforward Neural Networks

no code implementations • ICLR 2019 • Colin Wei, Jason Lee, Qiang Liu, Tengyu Ma

We establish: 1) for multi-layer feedforward relu networks, the global minimizer of a weakly-regularized cross-entropy loss has the maximum normalized margin among all networks, 2) as a result, increasing the over-parametrization improves the normalized margin and generalization error bounds for deep networks.

Paper
Add Code

Better Generalization with On-the-fly Dataset Denoising

no code implementations • ICLR 2019 • Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin

Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.

Denoising Memorization

Paper
Add Code

Fixup Initialization: Residual Learning Without Normalization

7 code implementations • ICLR 2019 • Hongyi Zhang, Yann N. Dauphin, Tengyu Ma

Normalization layers are a staple in state-of-the-art deep neural network architectures.

Ranked #9 on Image Classification on SVHN

General Classification Image Classification +2

149

Paper
Code

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

no code implementations • NeurIPS 2019 • Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma

We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.

Paper
Add Code

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

2 code implementations • ICLR 2019 • Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma

Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL.

Continuous Control Model-based Reinforcement Learning +3

Paper
Code

Approximability of Discriminators Implies Diversity in GANs

no code implementations • ICLR 2019 • Yu Bai, Tengyu Ma, Andrej Risteski

Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.

Paper
Add Code

The Toybox Dataset of Egocentric Visual Object Transformations

no code implementations • 15 Jun 2018 • Xiaohan Wang, Tengyu Ma, James Ainooson, Seunghwan Cha, Xiaotian Wang, Azhar Molla, Maithilee Kunda

In object recognition research, many commonly used datasets (e. g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e. g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles.

Object Object Recognition +1

Paper
Add Code

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

1 code implementation • ACL 2018 • Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.

Ranked #3 on Sentiment Analysis on MPQA

Document Classification Domain Adaptation +2

102

Paper
Code

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

no code implementations • 26 Dec 2017 • Yuanzhi Li, Tengyu Ma, Hongyang Zhang

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.

Paper
Add Code

Learning One-hidden-layer Neural Networks with Landscape Design

no code implementations • ICLR 2018 • Rong Ge, Jason D. Lee, Tengyu Ma

All global minima of $G$ correspond to the ground truth parameters.

Paper
Add Code

On the Optimization Landscape of Tensor Decompositions

no code implementations • NeurIPS 2017 • Rong Ge, Tengyu Ma

The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.

Tensor Decomposition

Paper
Add Code

Generalization and Equilibrium in Generative Adversarial Nets (GANs)

1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang

We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.

Generative Adversarial Network

Paper
Code

On the ability of neural nets to express distributions

no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora

We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.

Paper
Add Code

Provable learning of Noisy-or Networks

no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.

Tensor Decomposition Topic Models

Paper
Add Code

Identity Matters in Deep Learning

no code implementations • 14 Nov 2016 • Moritz Hardt, Tengyu Ma

An emerging design principle in deep learning is that each layer of a deep artificial neural network should be able to easily express the identity transformation.

Paper
Add Code

Finding Approximate Local Minima Faster than Gradient Descent

1 code implementation • 3 Nov 2016 • Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma

We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples.

BIG-bench Machine Learning

Paper
Code

Polynomial-time Tensor Decompositions with Sum-of-Squares

no code implementations • 6 Oct 2016 • Tengyu Ma, Jonathan Shi, David Steurer

We give new algorithms based on the sum-of-squares method for tensor decomposition.

Tensor Decomposition

Paper
Add Code

A Non-generative Framework and Convex Relaxations for Unsupervised Learning

no code implementations • NeurIPS 2016 • Elad Hazan, Tengyu Ma

We give a novel formal theoretical framework for unsupervised learning with two distinctive characteristics.

Paper
Add Code

Gradient Descent Learns Linear Dynamical Systems

no code implementations • 16 Sep 2016 • Moritz Hardt, Tengyu Ma, Benjamin Recht

We prove that stochastic gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system.

Paper
Add Code

Provable Algorithms for Inference in Topic Models

no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra

But designing provable algorithms for inference has proven to be more challenging.

Topic Models

Paper
Add Code

Matrix Completion has No Spurious Local Minimum

no code implementations • NeurIPS 2016 • Rong Ge, Jason D. Lee, Tengyu Ma

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.

Collaborative Filtering Matrix Completion +1

Paper
Add Code

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.

Information Retrieval Retrieval +1

Paper
Code

Why are deep nets reversible: A simple theory, with implications for training

no code implementations • 18 Nov 2015 • Sanjeev Arora, YIngyu Liang, Tengyu Ma

Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.

Denoising

Paper
Add Code

Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity

no code implementations • 27 Jul 2015 • Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang

We also prove a lower bound for the number of rounds of communication for a broad class of distributed first-order methods including the proposed algorithms in this paper.

Distributed Optimization

Paper
Add Code

Sum-of-Squares Lower Bounds for Sparse PCA

no code implementations • NeurIPS 2015 • Tengyu Ma, Avi Wigderson

It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms.

BIG-bench Machine Learning

Paper
Add Code

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

no code implementations • 24 Jun 2015 • Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions.

Paper
Add Code

Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms

no code implementations • 21 Apr 2015 • Rong Ge, Tengyu Ma

We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.

Tensor Decomposition

Paper
Add Code

Simple, Efficient, and Neural Algorithms for Sparse Coding

no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra

Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.

Paper
Add Code

A Latent Variable Model Approach to PMI-based Word Embeddings

4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski

Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.

Word Embeddings

Paper
Code

On Communication Cost of Distributed Statistical Estimation and Dimensionality

no code implementations • NeurIPS 2014 • Ankit Garg, Tengyu Ma, Huy L. Nguyen

We conjecture that the tradeoff between communication and squared loss demonstrated by this protocol is essentially optimal up to logarithmic factor.

Paper
Add Code

More Algorithms for Provable Dictionary Learning

no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).

Dictionary Learning

Paper
Add Code

Provable Bounds for Learning Some Deep Representations

no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma

The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.