1 code implementation • 16 Oct 2023 • Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit Dhillon
We propose decoupled softmax loss - a simple modification to the InfoNCE loss - that overcomes the limitations of existing contrastive losses.
no code implementations • 13 Oct 2023 • Ramnath Kumar, Anshul Mittal, Nilesh Gupta, Aditya Kusupati, Inderjit Dhillon, Prateek Jain
Such techniques use a two-stage process: (a) contrastive learning to train a dual encoder to embed both the query and documents and (b) approximate nearest neighbor search (ANNS) for finding similar documents for a given query.
2 code implementations • 11 Oct 2023 • Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, KaiFeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain
Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
no code implementations • 3 Aug 2022 • Samarth Gupta, Daniel N. Hill, Lexing Ying, Inderjit Dhillon
Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model.
no code implementations • 1 Jun 2022 • Anish Acharya, Sujay Sanghavi, Li Jing, Bhargav Bhushanam, Dhruv Choudhary, Michael Rabbat, Inderjit Dhillon
We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative).
no code implementations • 21 Feb 2022 • Haoya Li, Hsiang-Fu Yu, Lexing Ying, Inderjit Dhillon
Entropy regularized Markov decision processes have been widely used in reinforcement learning.
1 code implementation • NAACL 2022 • Yuanhao Xiong, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Inderjit Dhillon
To learn the semantic embeddings of instances and labels with raw text, we propose to pre-train Transformer-based encoders with self-supervised contrastive losses.
Multi Label Text Classification Multi-Label Text Classification +2
no code implementations • NeurIPS 2021 • Pei-Hung Chen, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
In addition to compressing standard models, out method can also be used on distilled BERT models to further improve compression rate.
no code implementations • 5 Oct 2021 • Haoya Li, Samarth Gupta, HsiangFu Yu, Lexing Ying, Inderjit Dhillon
This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization.
1 code implementation • 4 Jun 2021 • Philip A. Etter, Kai Zhong, Hsiang-Fu Yu, Lexing Ying, Inderjit Dhillon
In industrial applications, these models operate at extreme scales, where every bit of performance is critical.
1 code implementation • 15 Feb 2021 • Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon
We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\mathcal{F}|T)})$, where $A$ is the total number of arms and $\mathcal{F}$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step.
Computational Efficiency Extreme Multi-Label Classification +2
no code implementations • 28 Nov 2020 • Devvrit, Minhao Cheng, Cho-Jui Hsieh, Inderjit Dhillon
Several previous attempts tackled this problem by ensembling the soft-label prediction and have been proved vulnerable based on the latest attack methods.
1 code implementation • 20 Nov 2020 • Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon
In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e. g. by means of reducing the precision of compressed information.
no code implementations • ICML 2020 • Yanyao Shen, Hsiang-Fu Yu, Sujay Sanghavi, Inderjit Dhillon
Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes.
1 code implementation • ICML 2020 • Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
The main reason is that position information among input units is not inherently encoded, i. e., the models are permutation equivalent; this problem justifies why all of the existing models are accompanied by a sinusoidal encoding/embedding layer at the input.
Ranked #5 on Semantic Textual Similarity on MRPC
no code implementations • 17 Feb 2020 • Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, Cho-Jui Hsieh
Adversarial training has become one of the most effective methods for improving robustness of neural networks.
1 code implementation • NeurIPS 2019 • Rajat Sen, Hsiang-Fu Yu, Inderjit Dhillon
Forecasting high-dimensional time series plays a crucial role in many applications such as demand forecasting and financial predictions.
2 code implementations • 7 May 2019 • Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon
However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue.
Extreme Multi-Label Classification General Classification +4
no code implementations • 1 Nov 2018 • Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon
Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression.
no code implementations • 5 Aug 2016 • Rashish Tandon, Si Si, Pradeep Ravikumar, Inderjit Dhillon
In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR).
no code implementations • 19 Feb 2016 • Prateek Jain, Nikhil Rao, Inderjit Dhillon
Several learning applications require solving high-dimensional regression problems where the relevant features belong to a small number of (overlapping) groups.
no code implementations • 4 Sep 2015 • Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, Inderjit Dhillon
Given a symmetric nonnegative matrix $A$, symmetric nonnegative matrix factorization (symNMF) is the problem of finding a nonnegative matrix $H$, usually with much fewer columns than $A$, such that $A \approx HH^T$.
1 code implementation • 1 Dec 2013 • Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan, Inderjit Dhillon
One of the key features of NOMAD is that the ownership of a variable is asynchronously transferred between processors in a decentralized fashion.
Distributed, Parallel, and Cluster Computing