no code implementations • 18 Apr 2024 • Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta
In this work, we propose a learning system that can leverage a small amount of real-world data to autonomously refine a simulation model and then plan an accurate control strategy that can be deployed in the real world.
no code implementations • 13 Dec 2023 • Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain, Kevin Jamieson
In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecting labeled data prohibits the use of large, labeled datasets.
no code implementations • 24 Apr 2023 • Andrew Wagenmaker, Dylan J. Foster
We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance.
no code implementations • 9 Nov 2022 • Andrew Wagenmaker, Aldo Pacchiano
Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy?
no code implementations • 6 Jul 2022 • Andrew Wagenmaker, Kevin Jamieson
While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning.
no code implementations • 22 Jun 2022 • Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain, Kevin Jamieson
To our knowledge, our results are the first on best-arm identification in linear bandits with safety constraints.
no code implementations • 26 Jan 2022 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson
We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\widetilde{\mathcal{O}}(d^2 H^5/\epsilon^2)$.
no code implementations • 7 Dec 2021 • Andrew Wagenmaker, Yifang Chen, Max Simchowitz, Simon S. Du, Kevin Jamieson
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some measure of the performance of the optimal policy on a given instance -- is a core question in sequential decision-making.
1 code implementation • 23 Nov 2021 • Zhenlin Wang, Andrew Wagenmaker, Kevin Jamieson
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning.
no code implementations • 5 Aug 2021 • Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
We show this is not possible -- there exists a fundamental tradeoff between achieving low regret and identifying an $\epsilon$-optimal policy at the instance-optimal rate.
no code implementations • 10 Feb 2021 • Andrew Wagenmaker, Max Simchowitz, Kevin Jamieson
Along the way, we establish that certainty equivalence decision making is instance- and task-optimal, and obtain the first algorithm for the linear quadratic regulator problem which is instance-optimal.
no code implementations • 1 Nov 2020 • Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson
In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits.
no code implementations • 2 Feb 2020 • Andrew Wagenmaker, Kevin Jamieson
We propose an algorithm to actively estimate the parameters of a linear dynamical system.