Search Results for author: Amrit Singh Bedi

Found 40 papers, 3 papers with code

Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

no code implementations • 3 May 2024 • Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations.

Paper
Add Code

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

no code implementations • 20 Apr 2024 • Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers.

Hierarchical Reinforcement Learning reinforcement-learning

Paper
Add Code

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic

no code implementations • 18 Mar 2024 • Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods.

Policy Gradient Methods

Paper
Add Code

Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals

no code implementations • 14 Mar 2024 • Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha

We address this concern by inferring results on two cases for object placement: one where the objects placed follow a routine or a path, and the other where they are placed at random.

Object Visual Grounding

Paper
Add Code

On the Safety Concerns of Deploying LLMs/VLMs in Robotics: Highlighting the Risks and Vulnerabilities

no code implementations • 15 Feb 2024 • Xiyang Wu, Ruiqi Xian, Tianrui Guan, Jing Liang, Souradip Chakraborty, Fuxiao Liu, Brian Sadler, Dinesh Manocha, Amrit Singh Bedi

However, such integration can introduce significant vulnerabilities, in terms of their susceptibility to adversarial attacks due to the language models, potentially leading to catastrophic consequences.

Language Modelling

Paper
Add Code

MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

no code implementations • 14 Feb 2024 • Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.

Fairness reinforcement-learning

Paper
Add Code

Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

no code implementations • 5 Feb 2024 • Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Singh Bedi, Aniket Bera

While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems.

Decision Making Language Modelling +1

Paper
Add Code

REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback

no code implementations • 22 Dec 2023 • Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi

Current methods to mitigate this misalignment work by learning reward functions from human preferences; however, they inadvertently introduce a risk of reward overoptimization.

Bilevel Optimization Continuous Control +2

Paper
Add Code

Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

no code implementations • 23 Oct 2023 • Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

But in parallel to the development of detection frameworks, researchers have also concentrated on designing strategies to elude detection, i. e., focusing on the impossibilities of AI-generated text detection.

Misinformation Text Detection

Paper
Add Code

RealFM: A Realistic Mechanism to Incentivize Federated Participation and Contribution

1 code implementation • 20 Oct 2023 • Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang

On real-world data, RealFM improves device and server utility, as well as data contribution, by over 3 and 4 magnitudes respectively compared to baselines.

Paper
Code

PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

no code implementations • 3 Aug 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback.

Bilevel Optimization Procedure Learning +2

Paper
Add Code

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

no code implementations • 18 Jun 2023 • Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L).

Decision Making

Paper
Add Code

iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

1 code implementation • 9 Jun 2023 • Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha

Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations.

Autonomous Vehicles Collision Avoidance +3

Paper
Code

Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

no code implementations • 9 Jun 2023 • Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

On the Possibilities of AI-Generated Text Detection

no code implementations • 10 Apr 2023 • Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications.

Text Detection

Paper
Add Code

RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback

no code implementations • 14 Mar 2023 • Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor, Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, Dinesh Manocha

Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures.

Continuous Control Zero-Shot Learning

Paper
Add Code

STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

no code implementations • 28 Jan 2023 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha

Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

no code implementations • 28 Jan 2023 • Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection.

Reinforcement Learning (RL)

Paper
Add Code

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

1 code implementation • 25 Oct 2022 • Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, Furong Huang

Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms.

Federated Learning Image Classification

Paper
Code

DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments

no code implementations • 7 Sep 2022 • Aakriti Agrawal, Senthil Hariharan, Amrit Singh Bedi, Dinesh Manocha

At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD).

Reinforcement Learning (RL)

Paper
Add Code

FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

no code implementations • 22 Jun 2022 • Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program.

Federated Learning

Paper
Add Code

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

no code implementations • 12 Jun 2022 • Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function.

Paper
Add Code

Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

no code implementations • 12 Jun 2022 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.

Continuous Control OpenAI Gym

Paper
Add Code

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

no code implementations • 2 Jun 2022 • Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time.

Continuous Control Model-based Reinforcement Learning +2

Paper
Add Code

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

no code implementations • 28 Jan 2022 • Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space.

Paper
Add Code

Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

no code implementations • 22 Oct 2021 • Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, Vaneet Aggarwal

We establish that the sample complexity to obtain near-globally optimal solutions matches tight dependencies on the cardinality of the state and action spaces, and exhibits classical scalings with respect to the network in accordance with multi-agent optimization.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Projection-Free Algorithm for Stochastic Bi-level Optimization

no code implementations • 22 Oct 2021 • Zeeshan Akhtar, Amrit Singh Bedi, Srujan Teja Thomdapu, Ketan Rajawat

The proposed $\textbf{S}$tochastic $\textbf{C}$ompositional $\textbf{F}$rank-$\textbf{W}$olfe ($\textbf{SCFW}$) is shown to achieve a sample complexity of $\mathcal{O}(\epsilon^{-2})$ for convex objectives and $\mathcal{O}(\epsilon^{-3})$ for non-convex objectives, at par with the state-of-the-art sample complexities for projection-free algorithms solving single-level problems.

Denoising Matrix Completion +1

Paper
Add Code

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

no code implementations • 13 Sep 2021 • Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.

Decision Making reinforcement-learning +1

Paper
Add Code

Wasserstein-Splitting Gaussian Process Regression for Heterogeneous Online Bayesian Inference

no code implementations • 26 Jul 2021 • Michael E. Kepler, Alec Koppel, Amrit Singh Bedi, Daniel J. Stilwell

Gaussian processes (GPs) are a well-known nonparametric Bayesian inference technique, but they suffer from scalability problems for large sample sizes, and their performance can degrade for non-stationary or spatially heterogeneous data.

Bayesian Inference Gaussian Processes +1

Paper
Add Code

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

no code implementations • 15 Jun 2021 • Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.

Continuous Control Decision Making

Paper
Add Code

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

no code implementations • 29 May 2021 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".

Multi-agent Reinforcement Learning

Paper
Add Code

Conservative Stochastic Optimization with Expectation Constraints

no code implementations • 13 Aug 2020 • Zeeshan Akhtar, Amrit Singh Bedi, Ketan Rajawat

In this work, we propose the FW-CSOA algorithm that is not only projection-free but also achieves zero constraint violation with $\O\left(T^{-\frac{1}{4}}\right)$ decay of the optimality gap.

Matrix Completion Stochastic Optimization

Paper
Add Code

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

no code implementations • NeurIPS 2020 • Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Efficient Large-Scale Gaussian Process Bandits by Believing only Informative Actions

no code implementations • L4DC 2020 • Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Alec Koppel

Experimentally, we observe state of the art accuracy and complexity tradeoffs for GP bandit algorithms on various hyper-parameter tuning tasks, suggesting the merits of managing the complexity of GPs in bandit settings

Bayesian Optimization

Paper
Add Code

Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

no code implementations • 23 Mar 2020 • Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel

Doing so permits us to precisely characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $\epsilon$ for both discrete and continuous action sets.

Bayesian Optimization Decision Making +1

Paper
Add Code

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

no code implementations • 27 Feb 2020 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Optimally Compressed Nonparametric Online Learning

no code implementations • 25 Sep 2019 • Alec Koppel, Amrit Singh Bedi, Ketan Rajawat, Brian M. Sadler

Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models.

Paper
Add Code

Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony

no code implementations • 12 Sep 2019 • Amrit Singh Bedi, Alec Koppel, Ketan Rajawat, Brian M. Sadler

Prior works control dynamic regret growth only for linear models.

Meta-Learning

Paper
Add Code

Adaptive Kernel Learning in Heterogeneous Networks

no code implementations • 1 Aug 2019 • Hrusikesha Pradhan, Amrit Singh Bedi, Alec Koppel, Ketan Rajawat

We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams.

Paper
Add Code

Online Learning over Dynamic Graphs via Distributed Proximal Gradient Algorithm

no code implementations • 16 May 2019 • Rishabh Dixit, Amrit Singh Bedi, Ketan Rajawat

The empirical performance of the proposed algorithm is tested on the distributed dynamic sparse recovery problem, where it is shown to incur a dynamic regret that is close to that of the centralized algorithm.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.