Search Results for author: Aleksandrs Slivkins

Found 47 papers, 4 papers with code

Can large language models explore in-context?

no code implementations • 22 Mar 2024 • Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.

Decision Making

Paper
Add Code

Impact of Decentralized Learning on Player Utilities in Stackelberg Games

no code implementations • 29 Feb 2024 • Kate Donahue, Nicole Immorlica, Meena Jagadeesan, Brendan Lucier, Aleksandrs Slivkins

To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective.

Chatbot Recommendation Systems

Paper
Add Code

Incentivized Exploration via Filtered Posterior Sampling

no code implementations • 20 Feb 2024 • Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur

We study "incentivized exploration" (IE) in social learning problems where the principal (a recommendation algorithm) can leverage information asymmetry to incentivize sequentially-arriving agents to take exploratory actions.

Multi-Armed Bandits

Paper
Add Code

Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

no code implementations • 13 Dec 2023 • Seyed A. Esmaeili, Suho Shin, Aleksandrs Slivkins

We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile.

Multi-Armed Bandits

Paper
Add Code

Algorithmic Persuasion Through Simulation

no code implementations • 29 Nov 2023 • Keegan Harris, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins

After a fixed number of queries, the sender commits to a messaging policy and the receiver takes the action that maximizes her expected utility given the message she receives.

Paper
Add Code

Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits

no code implementations • 13 Jun 2023 • Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins

We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions.

Multi-Armed Bandits

Paper
Add Code

Bandit Social Learning: Exploration under Myopic Behavior

no code implementations • 15 Feb 2023 • Kiarash Banihashem, Mohammadtaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins

We study social learning dynamics motivated by reviews on online platforms.

Multi-Armed Bandits

Paper
Add Code

Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics

no code implementations • 30 Jan 2023 • Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang

We study a game between autobidding algorithms that compete in an online advertising platform.

Paper
Add Code

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

no code implementations • 14 Nov 2022 • Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.

Multi-Armed Bandits regression

Paper
Add Code

Incentivizing Combinatorial Bandit Exploration

no code implementations • 1 Jun 2022 • Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, Zhiwei Steven Wu

The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.

Thompson Sampling

Paper
Add Code

Content Filtering with Inattentive Information Consumers

no code implementations • 27 May 2022 • Ian Ball, James Bono, Justin Grana, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins

We develop a model of content filtering as a game between the filter and the content consumer, where the latter incurs information costs for examining the content.

Misinformation Recommendation Systems

Paper
Add Code

Bandits with Knapsacks beyond the Worst Case

no code implementations • NeurIPS 2021 • Karthik Abinav Sankararaman, Aleksandrs Slivkins

Third, we provide a "generalreduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.

Multi-Armed Bandits

Paper
Add Code

Sayer: Using Implicit Feedback to Optimize System Policies

no code implementations • 28 Oct 2021 • Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins

We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies.

counterfactual Data Augmentation

Paper
Add Code

Exploration and Incentives in Reinforcement Learning

no code implementations • 28 Feb 2021 • Max Simchowitz, Aleksandrs Slivkins

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$?

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Competing Bandits: The Perils of Exploration Under Competition

no code implementations • 20 Jul 2020 • Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Users arrive one by one and choose between the two firms, so that each firm makes progress on its bandit problem only if it is chosen.

Multi-Armed Bandits

Paper
Add Code

Adaptive Discretization for Adversarial Lipschitz Bandits

no code implementations • 22 Jun 2020 • Chara Podimata, Aleksandrs Slivkins

We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.

Multi-Armed Bandits

Paper
Add Code

Efficient Contextual Bandits with Continuous Actions

1 code implementation • NeurIPS 2020 • Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins

We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.

Multi-Armed Bandits

61

Paper
Code

Constrained episodic reinforcement learning in concave-convex and knapsack settings

1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We propose an algorithm for tabular episodic reinforcement learning with constraints.

reinforcement-learning Reinforcement Learning (RL)

10

Paper
Code

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

no code implementations • 19 May 2020 • Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future.

Multi-Armed Bandits

Paper
Add Code

The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

no code implementations • 3 Feb 2020 • Mark Sellke, Aleksandrs Slivkins

The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Bandits with Knapsacks beyond the Worst-Case

no code implementations • 1 Feb 2020 • Karthik Abinav Sankararaman, Aleksandrs Slivkins

Third, we provide a general "reduction" from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits.

Multi-Armed Bandits

Paper
Add Code

Corruption-robust exploration in episodic reinforcement learning

no code implementations • 20 Nov 2019 • Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Introduction to Multi-Armed Bandits

1 code implementation • 15 Apr 2019 • Aleksandrs Slivkins

This book provides a more introductory, textbook-like treatment of the subject.

Multi-Armed Bandits

10

Paper
Code

Bayesian Exploration with Heterogeneous Agents

no code implementations • 19 Feb 2019 • Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

We consider Bayesian Exploration: a simple model in which the recommendation system (the "principal") controls the information flow to the users (the "agents") and strives to incentivize exploration via information asymmetry.

Recommendation Systems

Paper
Add Code

The Perils of Exploration under Competition: A Computational Modeling Approach

no code implementations • 14 Feb 2019 • Guy Aridor, Kevin Liu, Aleksandrs Slivkins, Zhiwei Steven Wu

We empirically study the interplay between exploration and competition.

Paper
Add Code

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

no code implementations • 5 Feb 2019 • Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

We study contextual bandit learning with an abstract policy class and continuous action space.

Multi-Armed Bandits

Paper
Add Code

Adversarial Bandits with Knapsacks

no code implementations • 28 Nov 2018 • Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work.

Multi-Armed Bandits Scheduling

Paper
Add Code

Incentivizing Exploration with Selective Data Disclosure

no code implementations • 14 Nov 2018 • Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

We propose and design recommendation systems that incentivize efficient exploration.

Efficient Exploration Recommendation Systems

Paper
Add Code

The Externalities of Exploration and How Data Diversity Helps Exploitation

no code implementations • 1 Jun 2018 • Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

Returning to group-level effects, we show that under the same conditions, negative group externalities essentially vanish under the greedy algorithm.

Multi-Armed Bandits

Paper
Add Code

Combinatorial Semi-Bandits with Knapsacks

no code implementations • 23 May 2017 • Karthik Abinav Sankararaman, Aleksandrs Slivkins

We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits.

Multi-Armed Bandits

Paper
Add Code

Competing Bandits: Learning under Competition

no code implementations • 27 Feb 2017 • Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information.

Paper
Add Code

Multidimensional Dynamic Pricing for Welfare Maximization

no code implementations • 19 Jul 2016 • Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, Zhiwei Steven Wu

We are able to apply this technique to the setting of unit demand buyers despite the fact that in that setting the goods are not divisible, and the natural fractional relaxation of a unit demand valuation is not strongly concave.

Paper
Add Code

Bayesian Exploration: Incentivizing Exploration in Bayesian Games

no code implementations • 24 Feb 2016 • Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, Zhiwei Steven Wu

As a key technical tool, we introduce the concept of explorable actions, the actions which some incentive-compatible policy can recommend with non-zero probability.

Paper
Add Code

Contextual Dueling Bandits

no code implementations • 23 Feb 2015 • Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space.

Paper
Add Code

How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels

no code implementations • 1 Nov 2014 • Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, Aleksandrs Slivkins

In this paper we investigate how to devise better stopping rules given such quality scores.

Multiple-choice

Paper
Add Code

Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems

no code implementations • 12 May 2014 • Chien-Ju Ho, Aleksandrs Slivkins, Jennifer Wortman Vaughan

In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks.

Multi-Armed Bandits

Paper
Add Code

Resourceful Contextual Bandits

no code implementations • 27 Feb 2014 • Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.

Multi-Armed Bandits

Paper
Add Code

Bandits and Experts in Metric Spaces

no code implementations • 4 Dec 2013 • Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

Paper
Add Code

Dynamic Ad Allocation: Bandits with Budgets

no code implementations • 1 Jun 2013 • Aleksandrs Slivkins

We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities).

Multi-Armed Bandits

Paper
Add Code

Bandits with Knapsacks

no code implementations • 11 May 2013 • Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins

As one example of a concrete application, we consider the problem of dynamic posted pricing with limited supply and obtain the first algorithm whose regret, with respect to the optimal dynamic policy, is sublinear in the supply.

Paper
Add Code

Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem

no code implementations • 13 Feb 2013 • Ittai Abraham, Omar Alonso, Vasilis Kandylas, Aleksandrs Slivkins

This model is related to, but technically different from the well-known multi-armed bandit problem.

Information Retrieval Multiple-choice +1

Paper
Add Code

Multi-armed bandits on implicit metric spaces

no code implementations • NeurIPS 2011 • Aleksandrs Slivkins

For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm.

General Classification Multi-Armed Bandits

Paper
Add Code

Dynamic Pricing with Limited Supply

no code implementations • 20 Aug 2011 • Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, Aleksandrs Slivkins

The performance guarantee for the same mechanism can be improved to $O(\sqrt{k} \log n)$, with a distribution-dependent constant, if $k/n$ is sufficiently small.

Multi-Armed Bandits

Paper
Add Code

Adapting to the Shifting Intent of Search Queries

no code implementations • NeurIPS 2009 • Umar Syed, Aleksandrs Slivkins, Nina Mishra

Search engines today present results that are often oblivious to recent shifts in intent.

Cultural Vocal Bursts Intensity Prediction

Paper
Add Code

Contextual Bandits with Similarity Information

no code implementations • 23 Jul 2009 • Aleksandrs Slivkins

A particularly simple way to represent similarity information in the contextual bandit setting is via a "similarity distance" between the context-arm pairs which gives an upper bound on the difference between the respective expected payoffs.

Multi-Armed Bandits

Paper
Add Code

Characterizing Truthful Multi-Armed Bandit Mechanisms

no code implementations • 12 Dec 2008 • Moshe Babaioff, Yogeshwer Sharma, Aleksandrs Slivkins

We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful.

Paper
Add Code

Multi-Armed Bandits in Metric Spaces

2 code implementations • 29 Sep 2008 • Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric.

Multi-Armed Bandits

3,528

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.