Search Results for author: Botao Hao

Found 27 papers, 3 papers with code

Efficient Exploration for LLMs

no code implementations • 1 Feb 2024 • Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models.

Efficient Exploration Thompson Sampling

Paper
Add Code

Efficient Online Learning with Offline Datasets for Infinite Horizon MDPs: A Bayesian Approach

no code implementations • 17 Oct 2023 • Dengwang Tang, Rahul Jain, Botao Hao, Zheng Wen

In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with.

Imitation Learning

Paper
Add Code

Sequential Best-Arm Identification with Application to Brain-Computer Interface

no code implementations • 17 May 2023 • Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li

A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system.

EEG ERP +2

Paper
Add Code

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

no code implementations • 20 Mar 2023 • Botao Hao, Rahul Jain, Dengwang Tang, Zheng Wen

We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Leveraging Demonstrations to Improve Online Learning: Quality Matters

no code implementations • 7 Feb 2023 • Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level.

Thompson Sampling

Paper
Add Code

Sample Efficient Deep Reinforcement Learning via Local Planning

no code implementations • 29 Jan 2023 • Dong Yin, Sridhar Thiagarajan, Nevena Lazic, Nived Rajaraman, Botao Hao, Csaba Szepesvari

One useful property of simulators is that it is typically easy to reset the environment to a previously observed state.

Montezuma's Revenge reinforcement-learning +1

Paper
Add Code

Regret Bounds for Information-Directed Reinforcement Learning

no code implementations • 9 Jun 2022 • Botao Hao, Tor Lattimore

Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Contextual Information-Directed Sampling

no code implementations • 22 May 2022 • Botao Hao, Tor Lattimore, Chao Qin

Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm.

Multi-Armed Bandits Reinforcement Learning (RL)

Paper
Add Code

Interacting Contour Stochastic Gradient Langevin Dynamics

1 code implementation • ICLR 2022 • Wei Deng, Siqi Liang, Botao Hao, Guang Lin, Faming Liang

We propose an interacting contour stochastic gradient Langevin dynamics (ICSGLD) sampler, an embarrassingly parallel multiple-chain contour stochastic gradient Langevin dynamics (CSGLD) sampler with efficient interactions.

Paper
Code

The Neural Testbed: Evaluating Joint Predictions

1 code implementation • 9 Oct 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

Predictive distributions quantify uncertainties ignored by point estimates.

188

Paper
Code

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

no code implementations • 29 Sep 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Dieterich Lawson, Brendan O'Donoghue, Botao Hao, Benjamin Van Roy

This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions.

Uncertainty Quantification

Paper
Add Code

Efficient Local Planning with Linear Function Approximation

no code implementations • 12 Aug 2021 • Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.

Paper
Add Code

Bandit Phase Retrieval

no code implementations • NeurIPS 2021 • Tor Lattimore, Botao Hao

We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector.

Retrieval

Paper
Add Code

Information Directed Sampling for Sparse Linear Bandits

no code implementations • NeurIPS 2021 • Botao Hao, Tor Lattimore, Wei Deng

Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure.

Decision Making

Paper
Add Code

Optimization Issues in KL-Constrained Approximate Policy Iteration

no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

Paper
Add Code

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

no code implementations • 6 Feb 2021 • Botao Hao, Xiang Ji, Yaqi Duan, Hao Lu, Csaba Szepesvári, Mengdi Wang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood.

Off-policy evaluation

Paper
Add Code

Online Sparse Reinforcement Learning

no code implementations • 8 Nov 2020 • Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

High-Dimensional Sparse Linear Bandits

no code implementations • NeurIPS 2020 • Botao Hao, Tor Lattimore, Mengdi Wang

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising.

Vocal Bursts Intensity Prediction

Paper
Add Code

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

no code implementations • 8 Nov 2020 • Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

To evaluate a new target policy, we analyze a Lasso fitted Q-evaluation method and establish a finite-sample error bound that has no polynomial dependence on the ambient dimension.

feature selection Model Selection +2

Paper
Add Code

Stochastic Low-rank Tensor Bandits for Multi-dimensional Online Decision Making

no code implementations • 31 Jul 2020 • Jie zhou, Botao Hao, Zheng Wen, Jingfei Zhang, Will Wei Sun

We consider two settings, tensor bandits without context and tensor bandits with context.

Decision Making Marketing

Paper
Add Code

Residual Bootstrap Exploration for Bandit Algorithms

no code implementations • 19 Feb 2020 • Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng

In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}).

Computational Efficiency Multi-Armed Bandits +1

Paper
Add Code

Adaptive Approximate Policy Iteration

1 code implementation • 8 Feb 2020 • Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvari

This is an improvement over the best existing bound of $\tilde{O}(T^{3/4})$ for the average-reward case with function approximation.

Paper
Code

Adaptive Exploration in Linear Contextual Bandit

no code implementations • 15 Oct 2019 • Botao Hao, Tor Lattimore, Csaba Szepesvari

Contextual bandits serve as a fundamental model for many sequential decision making tasks.

Decision Making Multi-Armed Bandits

Paper
Add Code

Bootstrapping Upper Confidence Bound

no code implementations • NeurIPS 2019 • Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.

Decision Making Multi-Armed Bandits

Paper
Add Code

Sparse Tensor Additive Regression

no code implementations • 31 Mar 2019 • Botao Hao, Boxiang Wang, Pengyuan Wang, Jingfei Zhang, Jian Yang, Will Wei Sun

Tensors are becoming prevalent in modern applications such as medical imaging and digital marketing.

Click-Through Rate Prediction Marketing +1

Paper
Add Code

Sparse and Low-rank Tensor Estimation via Cubic Sketchings

no code implementations • 29 Jan 2018 • Botao Hao, Anru Zhang, Guang Cheng

In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings.

regression Tensor Decomposition

Paper
Add Code

Simultaneous Clustering and Estimation of Heterogeneous Graphical Models

no code implementations • 28 Nov 2016 • Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng

We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations.

Clustering Sparse Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.