no code implementations • 22 Mar 2024 • Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.
no code implementations • 29 Feb 2024 • Kate Donahue, Nicole Immorlica, Meena Jagadeesan, Brendan Lucier, Aleksandrs Slivkins
To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective.
no code implementations • 20 Feb 2024 • Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur
We study "incentivized exploration" (IE) in social learning problems where the principal (a recommendation algorithm) can leverage information asymmetry to incentivize sequentially-arriving agents to take exploratory actions.
no code implementations • 13 Dec 2023 • Seyed A. Esmaeili, Suho Shin, Aleksandrs Slivkins
We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile.
no code implementations • 29 Nov 2023 • Keegan Harris, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins
After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives.
no code implementations • 13 Jun 2023 • Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins
We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions.
no code implementations • 15 Feb 2023 • Kiarash Banihashem, Mohammadtaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins
We study social learning dynamics motivated by reviews on online platforms.
no code implementations • 30 Jan 2023 • Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang
We study a game between autobidding algorithms that compete in an online advertising platform.
no code implementations • 14 Nov 2022 • Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
no code implementations • 1 Jun 2022 • Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, Zhiwei Steven Wu
The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
no code implementations • 27 May 2022 • Ian Ball, James Bono, Justin Grana, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins
We develop a model of content filtering as a game between the filter and the content consumer, where the latter incurs information costs for examining the content.
no code implementations • NeurIPS 2021 • Karthik Abinav Sankararaman, Aleksandrs Slivkins
Third, we provide a "generalreduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.
no code implementations • 28 Oct 2021 • Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins
We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies.
no code implementations • 28 Feb 2021 • Max Simchowitz, Aleksandrs Slivkins
How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$?
no code implementations • 20 Jul 2020 • Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu
Users arrive one by one and choose between the two firms, so that each firm makes progress on its bandit problem only if it is chosen.
no code implementations • 22 Jun 2020 • Chara Podimata, Aleksandrs Slivkins
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
1 code implementation • NeurIPS 2020 • Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.
1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We propose an algorithm for tabular episodic reinforcement learning with constraints.
no code implementations • 19 May 2020 • Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu
Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future.
no code implementations • 3 Feb 2020 • Mark Sellke, Aleksandrs Slivkins
The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.
no code implementations • 1 Feb 2020 • Karthik Abinav Sankararaman, Aleksandrs Slivkins
Third, we provide a general "reduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.
no code implementations • 20 Nov 2019 • Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.
1 code implementation • 15 Apr 2019 • Aleksandrs Slivkins
This book provides a more introductory, textbook-like treatment of the subject.
no code implementations • 19 Feb 2019 • Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu
We consider Bayesian Exploration: a simple model in which the recommendation system (the "principal") controls the information flow to the users (the "agents") and strives to incentivize exploration via information asymmetry.
no code implementations • 14 Feb 2019 • Guy Aridor, Kevin Liu, Aleksandrs Slivkins, Zhiwei Steven Wu
We empirically study the interplay between exploration and competition.
no code implementations • 5 Feb 2019 • Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
We study contextual bandit learning with an abstract policy class and continuous action space.
no code implementations • 28 Nov 2018 • Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins
We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work.
no code implementations • 14 Nov 2018 • Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu
We propose and design recommendation systems that incentivize efficient exploration.
no code implementations • 1 Jun 2018 • Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu
Returning to group-level effects, we show that under the same conditions, negative group externalities essentially vanish under the greedy algorithm.
no code implementations • 23 May 2017 • Karthik Abinav Sankararaman, Aleksandrs Slivkins
We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits.
no code implementations • 27 Feb 2017 • Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu
Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information.
no code implementations • 19 Jul 2016 • Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, Zhiwei Steven Wu
We are able to apply this technique to the setting of unit demand buyers despite the fact that in that setting the goods are not divisible, and the natural fractional relaxation of a unit demand valuation is not strongly concave.
no code implementations • 24 Feb 2016 • Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, Zhiwei Steven Wu
As a key technical tool, we introduce the concept of explorable actions, the actions which some incentive-compatible policy can recommend with non-zero probability.
no code implementations • 23 Feb 2015 • Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi
The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space.
no code implementations • 1 Nov 2014 • Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, Aleksandrs Slivkins
In this paper we investigate how to devise better stopping rules given such quality scores.
no code implementations • 12 May 2014 • Chien-Ju Ho, Aleksandrs Slivkins, Jennifer Wortman Vaughan
In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks.
no code implementations • 27 Feb 2014 • Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins
We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.
no code implementations • 4 Dec 2013 • Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal
In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.
no code implementations • 1 Jun 2013 • Aleksandrs Slivkins
We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities).
no code implementations • 11 May 2013 • Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins
As one example of a concrete application, we consider the problem of dynamic posted pricing with limited supply and obtain the first algorithm whose regret, with respect to the optimal dynamic policy, is sublinear in the supply.
no code implementations • 13 Feb 2013 • Ittai Abraham, Omar Alonso, Vasilis Kandylas, Aleksandrs Slivkins
This model is related to, but technically different from the well-known multi-armed bandit problem.
no code implementations • NeurIPS 2011 • Aleksandrs Slivkins
For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm.
no code implementations • 20 Aug 2011 • Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, Aleksandrs Slivkins
The performance guarantee for the same mechanism can be improved to $O(\sqrt{k} \log n)$, with a distribution-dependent constant, if $k/n$ is sufficiently small.
no code implementations • NeurIPS 2009 • Umar Syed, Aleksandrs Slivkins, Nina Mishra
Search engines today present results that are often oblivious to recent shifts in intent.
no code implementations • 23 Jul 2009 • Aleksandrs Slivkins
A particularly simple way to represent similarity information in the contextual bandit setting is via a "similarity distance" between the context-arm pairs which gives an upper bound on the difference between the respective expected payoffs.
no code implementations • 12 Dec 2008 • Moshe Babaioff, Yogeshwer Sharma, Aleksandrs Slivkins
We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful.
2 code implementations • 29 Sep 2008 • Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal
In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.