no code implementations • ACL (RepL4NLP) 2021 • Kevin Huang, Peng Qi, Guangtao Wang, Tengyu Ma, Jing Huang
In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that jointly extracts relations and the underlying evidence sentences by using large pretrained language model (LM) as input encoder.
no code implementations • 30 Mar 2024 • Neil Band, Xuechen Li, Tengyu Ma, Tatsunori Hashimoto
Our results demonstrate that long-form generations may be calibrated end-to-end by constructing an objective in the space of the predictions that users make in downstream decision-making.
no code implementations • 20 Feb 2024 • Zhiyuan Li, Hong Liu, Denny Zhou, Tengyu Ma
Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT.
no code implementations • 7 Sep 2023 • Xiaohan Cui, Long Ma, Tengyu Ma, JinYuan Liu, Xin Fan, Risheng Liu
In this work, we try to arouse the potential of enhancer + detector.
no code implementations • 7 Jul 2023 • Arvind Mahankali, Tatsunori B. Hashimoto, Tengyu Ma
Then, we find that changing the distribution of the covariates and weight vector to a non-isotropic Gaussian distribution has a strong impact on the learned algorithm: the global minimizer of the pre-training loss now implements a single step of $\textit{pre-conditioned}$ GD.
no code implementations • 22 Jun 2023 • Khashayar Gatmiry, Zhiyuan Li, Ching-Yao Chuang, Sashank Reddi, Tengyu Ma, Stefanie Jegelka
Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions.
1 code implementation • 26 May 2023 • Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou
Our approach consists of two phases: 1) tool making: an LLM acts as the tool maker that crafts tools for a set of tasks.
3 code implementations • 23 May 2023 • Hong Liu, Zhiyuan Li, David Hall, Percy Liang, Tengyu Ma
Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training.
2 code implementations • NeurIPS 2023 • Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu
The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.
no code implementations • 15 May 2023 • Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le
We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").
no code implementations • 29 Apr 2023 • Kefan Dong, Tengyu Ma
Our key technical novelty is to prove that the degree-$k$ spherical harmonics components of a function from Gaussian random field cannot be spiky in that their $L_\infty$/$L_2$ ratios are upperbounded by $O(d \sqrt{\ln k})$ with high probability.
no code implementations • 7 Mar 2023 • Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma
We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.
1 code implementation • NeurIPS 2023 • Sang Michael Xie, Shibani Santurkar, Tengyu Ma, Percy Liang
To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space.
no code implementations • 28 Nov 2022 • Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, Denny Zhou
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.
no code implementations • 27 Nov 2022 • Jeff Z. HaoChen, Tengyu Ma
Understanding self-supervised learning is important but challenging.
no code implementations • 21 Nov 2022 • Kefan Dong, Tengyu Ma
The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift.
no code implementations • 10 Nov 2022 • Kaiyue Wen, Tengyu Ma, Zhiyuan Li
SAM intends to penalize a notion of sharpness of the model but implements a computationally efficient variant; moreover, a third notion of sharpness was used for proving generalization guarantees.
no code implementations • 25 Oct 2022 • Hong Liu, Sang Michael Xie, Zhiyuan Li, Tengyu Ma
Toward understanding this implicit bias, we prove that SGD with standard mini-batch noise implicitly prefers flatter minima in language models, and empirically observe a strong correlation between flatness and downstream performance among models with the same minimal pre-training loss.
no code implementations • 18 Jul 2022 • Ananya Kumar, Tengyu Ma, Percy Liang, aditi raghunathan
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM.
no code implementations • 16 Jun 2022 • Margalit Glasgow, Colin Wei, Mary Wootters, Tengyu Ma
Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails.
no code implementations • 6 Jun 2022 • Kefan Dong, Tengyu Ma
Past research on interactive decision making problems (bandits, reinforcement learning, etc.)
no code implementations • 22 May 2022 • Haoyuan Cai, Tengyu Ma, Simon Du
In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$.
1 code implementation • CVPR 2022 • Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, Zhongxuan Luo
Existing low-light image enhancement techniques are mostly not only difficult to deal with both visual quality and computational efficiency but also commonly invalid in unknown complex scenarios.
no code implementations • 6 Apr 2022 • Jeff Z. HaoChen, Colin Wei, Ananya Kumar, Tengyu Ma
In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target domain accurately, even though the representations of the two domains are far from each other.
no code implementations • 1 Apr 2022 • Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang
We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.
3 code implementations • 21 Feb 2022 • Ananya Kumar, aditi raghunathan, Robbie Jones, Tengyu Ma, Percy Liang
However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large.
1 code implementation • NeurIPS 2021 • Garrett Thomas, Yuping Luo, Tengyu Ma
Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences.
1 code implementation • 9 Dec 2021 • Risheng Liu, Long Ma, Tengyu Ma, Xin Fan, Zhongxuan Luo
To partially address above issues, we establish Retinex-inspired Unrolling with Architecture Search (RUAS), a general learning framework, which not only can address low-light enhancement task, but also has the flexibility to handle other more challenging downstream vision applications.
no code implementations • ICLR 2022 • Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine
In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.
1 code implementation • 22 Nov 2021 • Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu
Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.
1 code implementation • 5 Nov 2021 • Margalit Glasgow, Honglin Yuan, Tengyu Ma
In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable.
1 code implementation • ICLR 2022 • Sang Michael Xie, aditi raghunathan, Percy Liang, Tengyu Ma
At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt.
1 code implementation • ICLR 2022 • Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma
Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.
Ranked #9 on Long-tail Learning on CIFAR-10-LT (ρ=100)
no code implementations • 29 Sep 2021 • Ananya Kumar, aditi raghunathan, Tengyu Ma, Percy Liang
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy.
no code implementations • ICLR 2022 • Ananya Kumar, aditi raghunathan, Robbie Matthew Jones, Tengyu Ma, Percy Liang
It is well known that fine-tuning leads to better accuracy in-distribution (ID).
no code implementations • 29 Sep 2021 • Colin Wei, Yining Chen, Tengyu Ma
A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • NeurIPS 2021 • Yuping Luo, Tengyu Ma
This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data.
no code implementations • 28 Jul 2021 • Colin Wei, Yining Chen, Tengyu Ma
A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.
no code implementations • NeurIPS 2021 • Shengjia Zhao, Michael P. Kim, Roshni Sahoo, Tengyu Ma, Stefano Ermon
In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers.
no code implementations • 18 Jun 2021 • Yining Chen, Elan Rosenfeld, Mark Sellke, Tengyu Ma, Andrej Risteski
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments.
1 code implementation • NeurIPS 2021 • Colin Wei, Sang Michael Xie, Tengyu Ma
The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language.
no code implementations • NeurIPS 2021 • Alex Damian, Tengyu Ma, Jason D. Lee
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.
no code implementations • 9 Jun 2021 • Zichuan Lin, Jing Huang, BoWen Zhou, Xiaodong He, Tengyu Ma
Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e. g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation.
1 code implementation • NeurIPS 2021 • Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma
Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i. e., data augmentations of the same image).
no code implementations • NAACL 2021 • Lingxiao Wang, Kevin Huang, Tengyu Ma, Quanquan Gu, Jing Huang
The core of our algorithm is to introduce a novel variance reduction term to the gradient estimation when performing the task adaptation.
no code implementations • 24 Mar 2021 • Tengyu Ma
Non-convex optimization is ubiquitous in modern machine learning.
no code implementations • 9 Feb 2021 • Haike Xu, Tengyu Ma, Simon S. Du
We further show that for general MDPs, AMB suffers an additional $\frac{|Z_{mul}|}{\Delta_{min}}$ regret, where $Z_{mul}$ is the set of state-action pairs $(s, a)$'s satisfying $a$ is a non-unique optimal action for $s$.
no code implementations • NeurIPS 2021 • Kefan Dong, Jiaqi Yang, Tengyu Ma
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations.
no code implementations • 1 Jan 2021 • Yu Bai, Tengyu Ma, Huan Wang, Caiming Xiong
In this paper, we propose Neural Rank Preserving Transforms (NRPT), a new post-calibration method that adjusts the output probabilities of a trained classifier using a calibrator of higher capacity, while maintaining its prediction accuracy.
1 code implementation • ICLR 2021 • Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, Percy Liang
To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training).
no code implementations • 3 Nov 2020 • Hong Liu, Jeff Z. HaoChen, Colin Wei, Tengyu Ma
Recent works found that fine-tuning and joint training---two popular approaches for transfer learning---do not always improve accuracy on downstream tasks.
no code implementations • NeurIPS 2020 • Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge
We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2. 5l}\log d)$.
1 code implementation • 21 Oct 2020 • Wenxuan Zhou, Kevin Huang, Tengyu Ma, Jing Huang
In this paper, we propose two novel techniques, adaptive thresholding and localized context pooling, to solve the multi-label and multi-entity problems.
Ranked #6 on Relation Extraction on ReDocRED
Document-level Relation Extraction Multi-Label Classification +2
no code implementations • ICLR 2021 • Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma
Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks.
no code implementations • 28 Sep 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang
We focus on prediction problems with high-dimensional outputs that are subject to output validity constraints, e. g. a pseudocode-to-code translation task where the code must compile.
no code implementations • 27 Aug 2020 • Kevin Huang, Guangtao Wang, Tengyu Ma, Jing Huang
Document-level relation extraction is a challenging task which requires reasoning over multiple sentences in order to predict relations in a document.
Ranked #14 on Relation Extraction on DocRED
no code implementations • 9 Jul 2020 • Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang
We consider the dynamic of gradient descent for learning a two-layer neural network.
1 code implementation • ICLR 2021 • Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma
Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed.
Ranked #11 on Image Classification on WebVision-1000
2 code implementations • 29 Jun 2020 • Sang Michael Xie, Tengyu Ma, Percy Liang
Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).
no code implementations • 25 Jun 2020 • Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang
We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.
no code implementations • ICML 2020 • Shengjia Zhao, Tengyu Ma, Stefano Ermon
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized, i. e. outputting randomized credible intervals.
no code implementations • NeurIPS 2020 • Yining Chen, Colin Wei, Ananya Kumar, Tengyu Ma
In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close.
1 code implementation • NeurIPS 2020 • Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma
When the test task distribution is different from the training task distribution, the performance may degrade significantly.
1 code implementation • NeurIPS 2020 • Honglin Yuan, Tengyu Ma
We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a principled acceleration of Federated Averaging (FedAvg, also known as Local SGD) for distributed optimization.
1 code implementation • 15 Jun 2020 • Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma
We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms.
no code implementations • ICML Workshop LifelongML 2020 • Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang
We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.
6 code implementations • NeurIPS 2020 • Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
We also characterize the trade-off between the gain and risk of leaving the support of the batch data.
no code implementations • ICLR 2020 • Colin Wei, Tengyu Ma
For linear classifiers, the relationship between (normalized) output margin and generalization is captured in a clear and simple bound – a large output margin implies good generalization.
no code implementations • ECCV 2020 • Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma
We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.
Ranked #35 on Image Classification on mini WebVision 1.0
no code implementations • ICLR 2021 • Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma
Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.
1 code implementation • ICML 2020 • Colin Wei, Sham Kakade, Tengyu Ma
This implicit regularization effect is analogous to the effect of stochasticity in small mini-batch stochastic gradient descent.
2 code implementations • ICML 2020 • Ananya Kumar, Tengyu Ma, Percy Liang
Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces.
no code implementations • 8 Feb 2020 • Tengyu Ma, Joel Michelson, James Ainooson, Deepayan Sanyal, Xiaohan Wang, Maithilee Kunda
For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions.
1 code implementation • ICML 2020 • Kefan Dong, Yuping Luo, Tengyu Ma
We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.
1 code implementation • 9 Oct 2019 • Colin Wei, Tengyu Ma
Unfortunately, for deep models, this relationship is less clear: existing analyses of the output margin give complicated bounds which sometimes depend exponentially on depth.
1 code implementation • 25 Sep 2019 • Kefan Dong, Yuping Luo, Tengyu Ma
We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics.
3 code implementations • NeurIPS 2019 • Ananya Kumar, Percy Liang, Tengyu Ma
In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators.
1 code implementation • ICLR 2020 • Yuping Luo, Huazhe Xu, Tengyu Ma
Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently.
no code implementations • 11 Jul 2019 • Nicholas C. Landolfi, Garrett Thomas, Tengyu Ma
We then adapt the dynamical model with samples from this policy in the real environment.
2 code implementations • NeurIPS 2019 • Yuanzhi Li, Colin Wei, Tengyu Ma
This concept translates to a larger-scale setting: we demonstrate that one can add a small patch to CIFAR-10 images that is immediately memorizable by a model with small initial learning rate, but ignored by the model with large learning rate until after annealing.
7 code implementations • NeurIPS 2019 • Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma
Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-imbalance but the testing criterion requires good generalization on less frequent classes.
Ranked #4 on Long-tail learning with class descriptors on CUB-LT
no code implementations • 12 May 2019 • Shi Dong, Tengyu Ma, Benjamin Van Roy
Specifically, we establish that, when the set of feasible actions is identical to the set of possible coefficient vectors, the Bayesian regret of Thompson sampling is $\tilde{O}(d\sqrt{T})$.
1 code implementation • NeurIPS 2019 • Colin Wei, Tengyu Ma
For feedforward neural nets as well as RNNs, we obtain tighter Rademacher complexity bounds by considering additional data-dependent properties of the network: the norms of the hidden layers of the network, and the norms of the Jacobians of each layer with respect to all previous layers.
no code implementations • ICLR 2019 • Xingyu Zhou, Tengyu Ma, Huahong Zhang
This paper, in contrast, discusses the origin of adversarial examples from a more underlying knowledge representation point of view.
no code implementations • ICLR 2019 • Colin Wei, Jason Lee, Qiang Liu, Tengyu Ma
We establish: 1) for multi-layer feedforward relu networks, the global minimizer of a weakly-regularized cross-entropy loss has the maximum normalized margin among all networks, 2) as a result, increasing the over-parametrization improves the normalized margin and generalization error bounds for deep networks.
no code implementations • ICLR 2019 • Jiaming Song, Tengyu Ma, Michael Auli, Yann Dauphin
Memorization in over-parameterized neural networks can severely hurt generalization in the presence of mislabeled examples.
7 code implementations • ICLR 2019 • Hongyi Zhang, Yann N. Dauphin, Tengyu Ma
Normalization layers are a staple in state-of-the-art deep neural network architectures.
Ranked #9 on Image Classification on SVHN
no code implementations • NeurIPS 2019 • Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma
We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.
2 code implementations • ICLR 2019 • Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma
Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL.
no code implementations • ICLR 2019 • Yu Bai, Tengyu Ma, Andrej Risteski
Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.
no code implementations • 15 Jun 2018 • Xiaohan Wang, Tengyu Ma, James Ainooson, Seunghwan Cha, Xiaotian Wang, Azhar Molla, Maithilee Kunda
In object recognition research, many commonly used datasets (e. g., ImageNet and similar) contain relatively sparse distributions of object instances and views, e. g., one might see a thousand different pictures of a thousand different giraffes, mostly taken from a few conventionally photographed angles.
1 code implementation • ACL 2018 • Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora
Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.
Ranked #3 on Sentiment Analysis on MPQA
no code implementations • 26 Dec 2017 • Yuanzhi Li, Tengyu Ma, Hongyang Zhang
We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.
no code implementations • ICLR 2018 • Rong Ge, Jason D. Lee, Tengyu Ma
All global minima of $G$ correspond to the ground truth parameters.
no code implementations • NeurIPS 2017 • Rong Ge, Tengyu Ma
The landscape of many objective functions in learning has been conjectured to have the geometric property that "all local optima are (approximately) global optima", and thus they can be solved efficiently by local search algorithms.
1 code implementation • ICML 2017 • Sanjeev Arora, Rong Ge, YIngyu Liang, Tengyu Ma, Yi Zhang
We show that training of generative adversarial network (GAN) may not have good generalization properties; e. g., training may appear successful but the trained distribution may be far from target distribution in standard metrics.
no code implementations • 22 Feb 2017 • Holden Lee, Rong Ge, Tengyu Ma, Andrej Risteski, Sanjeev Arora
We take a first cut at explaining the expressivity of multilayer nets by giving a sufficient criterion for a function to be approximable by a neural network with $n$ hidden layers.
no code implementations • 28 Dec 2016 • Sanjeev Arora, Rong Ge, Tengyu Ma, Andrej Risteski
Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables.
no code implementations • 14 Nov 2016 • Moritz Hardt, Tengyu Ma
An emerging design principle in deep learning is that each layer of a deep artificial neural network should be able to easily express the identity transformation.
1 code implementation • 3 Nov 2016 • Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples.
no code implementations • 6 Oct 2016 • Tengyu Ma, Jonathan Shi, David Steurer
We give new algorithms based on the sum-of-squares method for tensor decomposition.
no code implementations • NeurIPS 2016 • Elad Hazan, Tengyu Ma
We give a novel formal theoretical framework for unsupervised learning with two distinctive characteristics.
no code implementations • 16 Sep 2016 • Moritz Hardt, Tengyu Ma, Benjamin Recht
We prove that stochastic gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system.
no code implementations • 27 May 2016 • Sanjeev Arora, Rong Ge, Frederic Koehler, Tengyu Ma, Ankur Moitra
But designing provable algorithms for inference has proven to be more challenging.
no code implementations • NeurIPS 2016 • Rong Ge, Jason D. Lee, Tengyu Ma
Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems.
1 code implementation • TACL 2018 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski
A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 "discourse atoms" that gives a succinct description of which other words co-occur with that word sense.
no code implementations • 18 Nov 2015 • Sanjeev Arora, YIngyu Liang, Tengyu Ma
Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer.
no code implementations • 27 Jul 2015 • Jason D. Lee, Qihang Lin, Tengyu Ma, Tianbao Yang
We also prove a lower bound for the number of rounds of communication for a broad class of distributed first-order methods including the proposed algorithms in this paper.
no code implementations • NeurIPS 2015 • Tengyu Ma, Avi Wigderson
It was also known that this quadratic gap cannot be improved by the the most basic {\em semi-definite} (SDP, aka spectral) relaxation, equivalent to a degree-2 SoS algorithms.
no code implementations • 24 Jun 2015 • Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff
We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions.
no code implementations • 21 Apr 2015 • Rong Ge, Tengyu Ma
We also give a polynomial time algorithm for certifying the injective norm of random low rank tensors.
no code implementations • 2 Mar 2015 • Sanjeev Arora, Rong Ge, Tengyu Ma, Ankur Moitra
Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.
4 code implementations • TACL 2016 • Sanjeev Arora, Yuanzhi Li, YIngyu Liang, Tengyu Ma, Andrej Risteski
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods.
no code implementations • NeurIPS 2014 • Ankit Garg, Tengyu Ma, Huy L. Nguyen
We conjecture that the tradeoff between communication and squared loss demonstrated by this protocol is essentially optimal up to logarithmic factor.
no code implementations • 3 Jan 2014 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma
In dictionary learning, also known as sparse coding, the algorithm is given samples of the form $y = Ax$ where $x\in \mathbb{R}^m$ is an unknown random sparse vector and $A$ is an unknown dictionary matrix in $\mathbb{R}^{n\times m}$ (usually $m > n$, which is the overcomplete case).
no code implementations • 23 Oct 2013 • Sanjeev Arora, Aditya Bhaskara, Rong Ge, Tengyu Ma
The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.