no code implementations • ICML 2020 • Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
We experiment with three structured bandit problems: cascading bandits, online learning to rank in the position-based model, and rank-1 bandits.
1 code implementation • 24 May 2024 • Matej Cief, Branislav Kveton, Michal Kompan
In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-policy evaluation.
no code implementations • 22 Apr 2024 • Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
Learning of preference models from human feedback has been central to recent advances in artificial intelligence.
no code implementations • 12 Apr 2024 • Subhojyoti Mukherjee, Anusha Lalitha, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
no code implementations • 17 Jan 2024 • Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher
Since Adam was introduced, several novel adaptive optimizers for deep learning have been proposed.
no code implementations • 22 Dec 2023 • Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras, Branislav Kveton
The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations.
1 code implementation • 30 Oct 2023 • Ziqian Lin, Hao Ding, Nghia Trong Hoang, Branislav Kveton, Anoop Deoras, Hao Wang
In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data).
no code implementations • 28 Oct 2023 • Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu
The pessimistic estimator can be optimized by policy gradients and performs well in all of our experiments.
no code implementations • 23 Oct 2023 • Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton
We propose CODE, a bandit algorithm based on a Constrained Optimal DEsign, that is interpretable and maximally reduces the uncertainty.
no code implementations • 15 Jun 2023 • Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi
In a multi-armed bandit, we obtain $O(c_\Delta \log n)$ and $O(c_h \log^2 n)$ upper bounds for an upper confidence bound algorithm, where $c_h$ and $c_\Delta$ are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively.
no code implementations • 13 Jun 2023 • Anusha Lalitha, Kousha Kalantari, Yifei Ma, Anoop Deoras, Branislav Kveton
Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances.
no code implementations • 16 Mar 2023 • Aadirupa Saha, Branislav Kveton
We lay foundations for the Bayesian setting, which incorporates prior knowledge.
no code implementations • 3 Feb 2023 • Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song
Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty.
no code implementations • 1 Feb 2023 • Sanath Kumar Krishnamurthy, Shrey Modi, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Anshuka Rangi
We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms.
no code implementations • 12 Jan 2023 • Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum
In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems.
no code implementations • 9 Dec 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
no code implementations • 15 Nov 2022 • Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton
We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget.
no code implementations • 26 Oct 2022 • Rong Zhu, Branislav Kveton
Our experiments show that RoLinTS is comparably statistically efficient to the classic methods when the misspecification is low, more robust when the misspecification is high, and significantly more computationally efficient than its naive implementation.
no code implementations • 27 Sep 2022 • Behnam Rahdari, Branislav Kveton, Peter Brusilovsky
Our analytical results show that the user can examine more items in the carousel click model than in a single ranked list, due to the structured way of browsing.
no code implementations • 8 Jun 2022 • Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton
We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them.
no code implementations • 6 Jun 2022 • Matej Cief, Branislav Kveton, Michal Kompan
Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy.
1 code implementation • 30 May 2022 • Imad Aouali, Branislav Kveton, Sumeet Katariya
The regret bound has two terms, one for learning the action parameters and the other for learning the shared effect parameters.
no code implementations • 26 Feb 2022 • Runzhe Wan, Branislav Kveton, Rui Song
High-quality data plays a central role in ensuring the accuracy of policy evaluation.
1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya
The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over $m$ bandit tasks with horizon $n$ is mere $\tilde{O}(m / \sqrt{n})$.
no code implementations • 3 Feb 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh
We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.
no code implementations • 24 Jan 2022 • Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier
This problem has been studied extensively in the setting of known objective functions.
no code implementations • 12 Nov 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh
We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.
no code implementations • 8 Nov 2021 • Ruihao Zhu, Branislav Kveton
Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy.
no code implementations • 16 Sep 2021 • Muhammad Jehangir Amjad, Christophe Diot, Dimitris Konomis, Branislav Kveton, Augustin Soule, Xiaolong Yang
We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget.
no code implementations • NeurIPS 2021 • Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári
We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.
no code implementations • 23 Jun 2021 • Rong Zhu, Branislav Kveton
It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm.
no code implementations • 10 Jun 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier
We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.
no code implementations • 9 Jun 2021 • Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh
We analyze our algorithm in linear and generalized linear models (GLMs), and propose a practical implementation based on a G-optimal design.
no code implementations • 7 Mar 2021 • Nan Wang, Branislav Kveton, Maryam Karimzadehgan
We propose a bandit algorithm that explores purely by randomizing its past observations.
no code implementations • 11 Feb 2021 • Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-Wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari
Efficient exploration in bandits is a fundamental online learning problem.
no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.
no code implementations • 1 Dec 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier
The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.
no code implementations • 9 Jul 2020 • Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel
We propose a novel framework for structured bandits, which we call an influence diagram bandit.
no code implementations • 15 Jun 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed
This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.
no code implementations • NeurIPS 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
no code implementations • 9 Jun 2020 • Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier
Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.
1 code implementation • 4 Jun 2020 • Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton
We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.
no code implementations • NeurIPS 2020 • Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.
1 code implementation • 11 Oct 2019 • Sharan Vaswani, Abbas Mehrabian, Audrey Durand, Branislav Kveton
We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and exploitation.
no code implementations • 21 Jun 2019 • Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier
The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.
no code implementations • 20 Apr 2019 • Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian
We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret.
no code implementations • 4 Apr 2019 • Chih-Wei Hsu, Branislav Kveton, Ofer Meshi, Martin Mladenov, Csaba Szepesvari
In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the average regret over problem instances sampled from a known distribution.
no code implementations • 21 Mar 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier
We evaluate our algorithms empirically and show that they are practical.
no code implementations • 26 Feb 2019 • Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Craig Boutilier
Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.
no code implementations • 13 Nov 2018 • Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore
Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards.
no code implementations • 1 Nov 2018 • Prakhar Gupta, Gaurush Hiranandani, Harvineet Singh, Branislav Kveton, Zheng Wen, Iftikhar Ahamath Burhanuddin
We assume that the user examines the list of recommended items until the user is attracted by an item, which is clicked, and does not examine the rest of the items.
no code implementations • 15 Jun 2018 • Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi
In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists.
no code implementations • NeurIPS 2018 • Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari
Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user.
no code implementations • 3 Jun 2018 • Sumeet Katariya, Branislav Kveton, Zheng Wen, Vamsi K. Potluru
In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action.
no code implementations • 24 May 2018 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori
We investigate the use of bootstrapping in the bandit setting.
no code implementations • 27 Apr 2018 • Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen
We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.
no code implementations • 11 Feb 2018 • Yang Cao, Zheng Wen, Branislav Kveton, Yao Xie
Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions.
no code implementations • 13 Dec 2017 • Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan
Many problems in computer vision and recommender systems involve low-rank matrices.
no code implementations • 21 Sep 2017 • Tong Yu, Branislav Kveton, Zheng Wen, Hung Bui, Ole J. Mengshoel
We study the problem of learning a latent variable model from a stream of data.
no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen
The probability that a user will click a search result depends both on its relevance and its position on the results page.
no code implementations • ICML 2017 • Masrour Zoghi, Tomas Tunys, Mohammad Ghavamzadeh, Branislav Kveton, Csaba Szepesvari, Zheng Wen
In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models.
no code implementations • ICML 2017 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks Lakshmanan, Mark Schmidt
We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of "seed" users to expose the product to.
no code implementations • 25 Jan 2017 • Shi Zong, Branislav Kveton, Shlomo Berkovsky, Azin Ashkan, Nikos Vlassis, Zheng Wen
To the best of our knowledge, this is the first large-scale causal study of the impact of weather on TV watching patterns.
no code implementations • 10 Aug 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen
The main challenge of the problem is that the individual values of the row and column are unobserved.
1 code implementation • NeurIPS 2017 • Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani
Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.
1 code implementation • 17 Mar 2016 • Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton
In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend $K$ most attractive items from a large set of $L$ candidate items.
1 code implementation • 9 Feb 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen
This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.
no code implementations • 9 Feb 2016 • Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun
Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables.
no code implementations • NeurIPS 2015 • Jaya Kawale, Hung H. Bui, Branislav Kveton, Long Tran-Thanh, Sanjay Chawla
Matrix factorization (MF) collaborative filtering is an effective and widely used method in recommendation systems.
no code implementations • NeurIPS 2015 • Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
The agent observes the index of the first chosen item whose weight is zero.
no code implementations • 10 Feb 2015 • Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan
We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits.
no code implementations • 13 Nov 2014 • Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, Zheng Wen
The need for diversification of recommendation lists manifests in a number of recommender systems use cases.
no code implementations • 3 Oct 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff.
no code implementations • 28 Jun 2014 • Zheng Wen, Branislav Kveton, Azin Ashkan
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff.
no code implementations • 30 May 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko
Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.
no code implementations • 20 Mar 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson
The objective in these problems is to learn how to maximize a modular function on a matroid.
no code implementations • NeurIPS 2013 • Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan
Maximization of submodular functions has wide applications in machine learning and artificial intelligence.