Alex Slivkins: publications

The Agentic Economy (May'25)
David M. Rothschild, Markus Mobius, Jake M. Hofman, Eleanor W. Dillon, Daniel G. Goldstein, Nicole Immorlica, Sonia Jaffe, Brendan Lucier, Aleksandrs Slivkins, Matthew Vogel
To appear in Communications of the ACM. We explore the implications of an "agentic economy", powered by generative AI, where assistant agents act on behalf of consumers and service agents represent businesses. These include the potential tension between agentic "walled gardens" and an open web of agents.
Generative AI as Economic Agents (Position Paper)
Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins
SIGecom Exchanges, June 2024. Traditionally, AI has been modeled within economics as a technology that impacts payoffs by reducing costs or refining information for human agents. Our position is that, in light of recent advances in generative AI, it is increasingly useful to model AI itself as an economic agent.
Exploration and Persuasion (a teachable survey), 2021.
Invited chapter in Online and Matching-Based Markets, Cambridge Univ. Press, 2023. How to incentivize self-interested agents to explore when they prefer to exploit? Incentivized Exploration (IE) addresses this issue via strategic communication between the platform and the agents. IE combines exploration from machine learning and persuasion from economics; we briefly introduce both, using IE as a common lens.
Incentivizing exploration via information asymmetry (informal article).
XRDS: Crossroads, The ACM Magazine for Students, Vol. 24(1), Fall 2017. As self-interested individuals make decisions over time, they utilize information revealed by others in the past and produce information that may help others in the future. So how can we incentivize exploration for the sake of the common good?
Multi-World Testing: A System for Experimentation, Learning, and Decision-Making (2015-2016).
Alekh Agarwal, Sarah Bird, Markus Cozowicz, Miro Dudik, Luong Hoang, John Langford, Lihong Li, Dan Melamed, Gal Oshri, Siddhartha Sen, Alex Slivkins. (The MWT project) Multi-World Testing (MWT) is a methodology for principled and efficient experimentation, learning, and decision-making. It is plausibly applicable to most services that interact with customers; in many scenarios, it is exponentially more efficient than the traditional A/B testing. The underlying research area is known as "contextual bandits" and "counterfactual evaluation".
Online Decision Making in Crowdsourcing Markets: Theoretical Challenges
Aleksandrs Slivkins and Jennifer Wortman Vaughan
SIGecom Exchanges, Dec 2013. In crowdsourcing markets, task requesters and the platform itself make repeated decisions about which tasks to assign to each worker at which price. Designing algorithms for making these decisions is a rich, emerging problem space. We survey this problem space, point out significant modeling difficulties, and identify directions to make progress.
Crowdsourcing Gold-HIT Creation at Scale: Challenges and Adaptive Exploration Approaches
I. Abraham, O. Alonso, V. Kandylas, R. Patel, S. Shelford, A. Slivkins, H. Wu
CrowdScale 2013: Workshop on Crowdsourcing at Scale Gold HITs --- Human Intelligence Tasks with known answers --- are commonly used to measure worker performance and data quality in industrial applications of crowdsourcing. We suggest adaptive exploration as a promising approach for automated, scalable Gold HIT creation. We substantiate this with initial experiments in a stylized model.

Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets (Oct'25)
G. Bansal, W. Hua, Z. Huang, A. Fourney, A. Swearngin, W. Epperson, T. Payne, J. Hofman, B. Lucier, C. Singh, M. Mobius, A. Nambi, A. Yadav, K. Gao, D. Rothschild, A. Slivkins, D. Goldstein, H. Mozannar, N. Immorlica, M. Murad, M. Vogel, S. Kambhampati, E. Horvitz, S. Amershi
- Blog post on Microsoft Research Blog.
A simulated environment for a "two-sided agentic marketplace", where "assistant agents" and "service agents" are LLM agents that represent, resp., consumers and competing businesses.
Should You Use Your Large Language Model to Explore or Exploit? (rev. Sep'25)
Keegan Harris and Aleksandrs Slivkins We evaluate the ability of current LLMs to help an agent facing exploration-exploitation tradeoff. On various contextual bandit tasks, we find that LLMs struggle to exploit (despite some in-context mitigations), but do help to explore. Namely, LLMs can fruitfully suggest suitable "candidate actions" to explore in large action spaces with inherent semantics.
Exploration and Incentivizing Participation in Randomized Trials (rev. Jan'26) [poster]
Yingkai Li and Aleksandrs Slivkins We incentivize patients' participation in a randomized controlled trial (RCT) by leveraging information asymmetry between the trial and the patients. We obtain an optimal solution in terms of the statistical performance of the trial, as expressed by an estimation error.
Algorithmic Persuasion Through Simulation (rev. Feb'25) [poster]
Keegan Harris, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins We study a Bayesian persuasion problem where an informed sender wants to persuade a receiver to take an action, such as purchasing a product. Motivated by customer surveys, user studies, and recent advances in generative AI, we allow the sender to learn more about the receiver by querying an oracle that simulates the receiver's behavior.
Bandit Social Learning: Exploration under Myopic Behavior (rev. Apr'25)
Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins
- Preliminary version (without results for >2 arms) in NeurIPS 2023 [poster],
  titled "Bandit Social Learning under Myopic Behavior".
- Best Poster Award at 15th Annual Machine Learning Symposum, New York, 2024.
Agents collectively face a simple multi-armed bandit problem. Each agent acts myopically, in a(ny) way consistent with confidence intervals. We derive stark exploration failures, and provide matching upper bounds on regret by analyzing "moderately optimistic" agents. In particular, we obtain the first general results on failure of the greedy algorithm in multi-armed bandits.
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics
Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang (rev. Nov'24)
- Preliminary version in COLT 2024: Conf. on Learning Theory.
We design a bidding algorithm for repeated auctions with budget and ROI constraints. The algorithm attains vanishing regret for a range of auctions, and state-of-art welfare-like guarantees if used by all bidders. Importantly, we do not require convergence of the dynamics.
Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence (rev. Dec'25)
Jason Gaitonde, Yingkai Li, Bar Light, Brendan Lucier, and Aleksandrs Slivkins
Revise and resubmit in Operations Research.
- Preliminary version (without numerical experiments) published in ITCS 2023 (Conf. on Innovations in Theoretical Computer Science) [poster]
We study bidding algorithms for repeated auctions with budgets. We prove that a natural bidding algorithm attains good regret bounds for a wide range of auctions, and simultaneously attains state-of-art welfare-like guarantees if all bidders use this algorithm. Importantly, we do not require convergence of the dynamics, thereby avoiding strong assumptions.
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning (rev. Jun'25) Chara Podimata and Aleksandrs Slivkins
- Preliminary version in COLT 2021 (focused on Lipschitz bandits and titled "Adaptive Discretization for Adversarial Lipschitz Bandits").
Adaptive discretization is a central theme in "Lipschitz bandits": bandits with similarity info on actions. We provide the first adaptive discretization algorithm for adversarial rewards, and derive instance-dependent regret bounds (recovering regret bounds for stochastic rewards). We apply this machinery to several scenarios in dynamic pricing and dynamic auction optimization.
Incentivizing Exploration with Selective Data Disclosure (rev. Nov'24) [poster]
Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, and Zhiwei Steven Wu
- Deep revision of a paper in EC 2020, with some additional results.
- Early version titled "Incentivizing Exploration with Unbiased Histories".
- Workshop: Learning in Presence of Strategic Behavior at EC 2019.
We incentivize exploration in recommendation systems under weaker assumptions of rationality and trust. Our algorithm constructs a partial order on the users, published ahead of time, and provides each user with feedback from all preceding users in the order (and no other info). We achieve near-optimal learning performance for a range of behavioral models.

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression (rev. Nov'24)
Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster
JMLR: Journal of Machine Learning Research, 25(394):1−37, 2024.
- Preliminary version in COLT 2023: Conf. on Learning Theory.
- The journal version features an important improvement in the main result.
We consider a generalization of contextual bandits with knapsacks (CBwK) in which the algorithm consumes and/or replenishes resources subject to packing and/or covering constraints. We provide the first algorithm for this problem (or CBwK) that is based on regression oracles, and the first vanishing-regret guarantees that extend beyond the stochastic environment.
Competing Bandits: The Perils of Exploration Under Competition (rev. Oct'24)
Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu
ACM TEAC: ACM Trans. on Economics and Computation, Vol. 13(1), 2025.
A merged, heavily revised version of papers in ITCS 2018 and EC 2019. Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We study interplay between exploration and competition---how such systems balance the exploration for learning and the competition for users.
Exploration and Incentives in Reinforcement Learning [poster]
Max Simchowitz and Aleksandrs Slivkins
Operations Research, vol. 72(3), 2024. How do you incentivize self-interested agents to explore when they prefer to exploit? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. Agents control the choice of policies, whereas an algorithm can only recommend them. However, the algorithm can incentivize the agents to explore via information asymmetry.
The first work to consider mechanism design in a (stateful) RL setting.

NeurIPS'21 workshops: Learning in Presence of Strategic Behavior (spotlight talk) and Learning and Decision-Making with Strategic Feedback.

Corruption-Robust Exploration in Episodic Reinforcement Learning
Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins and Wen Sun
Mathematics of Operations Research, 50(2), 2024.
Preliminary version in COLT 2021: Conf. on Learning Theory.
- [30-min talk by Alex: video | slides] [60-min talk by Thodoris: video | slides]
We study episodic RL where an adversary can "corrupt" some episodes, and achieve performance guarantees that adapt to the (unknown) number of corrupted episodes. Along the way, we provide a general framework for RL algorithms that depart from "optimism", and achieve the first RL algorithm for bandit feedback and non-IID transitions.
Greedy Algorithm almost Dominates in Smoothed Contextual Bandits (2018-2021)
Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaighan and Zhiwei Steven Wu
SICOMP: SIAM J. on Computing, 52(2), 2023.
Preliminary version (along with some other results) appeared in COLT 2018. We consider the greedy algorithm in linear contextual bandits. We prove that it is optimal, in a very strong sense, if the problem instance is sufficiently "smoothed".
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [poster]
Mark Sellke and Aleksandrs Slivkins
Operations Research, 71(5):1706-1732, 2023.
Preliminary version in EC 2021: ACM Symp. on Economics and Computation. We study bandit algorithms that recommend actions to self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, due incentive-compatibility. We prove that Thompson sampling is incentive-compatible if initialized with enough samples of each arm. Next, we investigate the sample complexity questions that the problem reduces to: (i) how many samples are needed, and (ii) how many rounds does it take to collect them.
Bayesian Exploration: Incentivizing Exploration in Bayesian Games [slides]
Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis and Steven Wu
Operations Research, 70(2): 1105-1127, 2022.
Deep revision of a paper in EC 2016: ACM Symp. on Economics and Computation. At each time step, multiple agents arrive, play a fixed Bayesian game, and leave forever. Agents' decisions reveal info that can help future agents, creating a tradeoff between exploration, exploitation, and agents' incentives. We design a social planner which learns over time and coordinates the agents towards socially desirable outcomes.
Adversarial Bandits with Knapsacks [poster]
Nicole Immorlica, Karthik A. Sankararaman, Aleksandrs Slivkins and Rob Schapire
J. of the ACM, 69 (6): 1-47, 2022.
Deep revision of a paper in FOCS 2019: IEEE Symp. on Foundations of Computer Science. Bandits with Knapsacks is a broad class of explore-exploit problems with knapsack-style resource utilization constraints. While all prior work is for the stochastic version, we target the adversarial version and obtain an optimal solution. We build on a new, game-theoretic interpretation (and a simpler algorithm) for the stochastic version.
Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting [poster]
Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
JMLR: J. of Machine Learning Research, 21(137):1-45, 2020.
A deep revision of a paper in COLT 2019: Conf. on Learning Theory. We study contextual bandits with an abstract policy class and continuous action space. We obtain two algorithms: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both algorithms "zoom in" on better regions of the actions space, with improved performance on "benign" problems.
Bayesian Incentive-Compatible Bandit Exploration (slides)
Yishay Mansour, Aleksandrs Slivkins and Vasilis Syrgkanis
Operations Research, 68(4): 1132-1161, 2020.
A deep revision of the paper in EC 2015. We design bandit algorithms that recommend actions to self-interested agents (who then decide which actions to take). By means of carefully designed information disclosure, we incentivize the agents to balance exploration and exploitation so as to maximize social welfare.
Bandits and Experts in Metric Spaces
Robert Kleinberg, Aleksandrs Slivkins and Eli Upfal.
A merged and heavily revised version of papers in STOC'08 and SODA'10.
J. of the ACM, Volume 66, Issue 4, May 2019. We introduce the 'Lipschitz bandits problem': a stochastic bandit problem, possibly with a very large set of arms, such that the expected payoffs obey a Lipschitz condition with respect to a given metric space. The goal is to minimize regret as a function of time, both in the worst case and for 'nice' problem instances.
Multidimensional Dynamic Pricing for Welfare Maximization
Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, and Steven Wu
Special Issue of ACM TEAC for EC 2017: 8(1), 2020. We solve a dynamic pricing problem with d divisible goods. Buyers have IID valuations over purchased bundles. We optimize welfare (including seller's production costs) in #rounds polynomial in d and the accuracy parameter. Crucially, we make assumptions (concavity and Holder-continuity) on buyers' valuations, rather than on the aggregate response to prices.
Bandits with knapsacks
Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins
J. of the ACM, Vol. 65 Issue 3, March 2018.
(Preliminary version in FOCS 2013.) We define a broad class of explore-exploit problems with knapsack-style resource utilization constraints, subsuming dynamic pricing, dynamic procurement, pay-per-click ad allocation, and many other problems. Our algorithms achieve optimal regret w.r.t. the optimal dynamic policy.
Truthful Mechanisms with Implicit Payment Computation
Moshe Babaioff, Robert Kleinberg and Aleksandrs Slivkins
J. of the ACM, Vol. 62, Issue 2, May 2015.
(Preliminary version in EC 2010 and EC 2013)
The latest revision (Nov'15) reflects some minor bug fixes compared to the JACM version. We show that payment computation does not present any obstacle in designing truthful mechanisms, even when we can only call the allocation rule once. Applying this to multi-armed bandits (MAB), we design truthful MAB mechanisms for stochastic payoffs. More generally, we open up a problem of designing monotone MAB allocation rules.
Adaptive Contract Design for Crowdsourcing Markets:
Bandit Algorithms for Repeated Principal-Agent Problems
Chien-Ju Ho, Aleksandrs Slivkins and Jennifer Wortman Vaughan.
JAIR: J. of Artificial Intelligence Research, Vol. 54, 2015. (Special Track on Human Computation)
(Preliminary version in EC 2014) We consider a repeated version of the principal-agent model in which the principal can revise the contract over time, and the agent can strategically choose the (unobservable) effort level. We treat this as a multi-armed bandit problem, and design an algorithm that adaptively refines the partition of the action space without relying on Lipschitz assumptions.
Selection and Influence in Cultural Dynamics
David Kempe, Jon Kleinberg, Sigal Oren and Aleksandrs Slivkins
Network Science, vol. 4(1), 2016.
(Preliminary version in EC 2013) Influence and selection -- tendency to, resp., become similar to one's friends and to interact with similar people -- work in opposite directions: resp., towards homogeneity and fragmentation. We analyze the societal outcomes when both forces are in effect. We consider a natural class of models from the work in political opinion formation, cultural diversity, and language evolution.
Low-distortion Inference of Latent Similarities from a Multiplex Social Network
Ittai Abraham, Shiri Chechik, David Kempe and Aleksandrs Slivkins
SICOMP: SIAM J. on Computing, Vol. 44(3), 2015.
(Preliminary version in SODA 2013.) The observed social network is a noisy signal about the latent "social space": the ways in which individuals are (dis)similar to one another. We present near-linear time algorithms which, under some standard models, can infer the social space with provable guarantees.
Dynamic pricing with limited supply
Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg and Aleksandrs Slivkins
Special issue for EC 2012: ACM Trans. on Economics and Computation, 3(1): 4 (2015). We consider dynamic pricing with limited supply and unknown demand distribution. We extend multi-armed bandit techniques to the limited supply setting, and obtain optimal regret rates.
Contextual bandits with similarity information
JMLR: J. of Machine Learning Research, 15(Jul):2533-2568, 2014.
(Preliminary version in COLT 2011.) We study contextual bandits with a known metric over context-arm pairs which upper-bounds the local change in rewards. The main algorithmic idea is to adapt the partitions of the metric space to frequent context arrivals and high-payoff regions. Our framework also handles slowly changing payoffs and variable sets of arms.
Ranked bandits in metric spaces:
learning optimally diverse rankings over large document collections
Aleksandrs Slivkins, Filip Radlinski and Sreenivas Gollapudi
JMLR: J. of Machine Learning Research, 14(Feb):399-436, 2013.
(Preliminary version in ICML 2010.) We present a learning-to-rank framework for web search that incorporates similarity and correlation between documents and thus, unlike prior work, scales to large document collections.
Triangulation and Embedding using Small Sets of Beacons
Jon Kleinberg, Aleksandrs Slivkins and Tom Wexler.
J. of the ACM, 56(6), Sept 2009.
Significantly revised merge of papers from FOCS 2004 and SODA 2005. We consider metric embeddings and triangulation-based distance estimation in a distributed framework with low load on the participating nodes. Our results provide theoretical insight into the empirical success of several recent Internet-related projects.
Characterizing Truthful Multi-Armed Bandit Mechanisms
Moshe Babaioff, Yogeshwer Sharma and Aleksandrs Slivkins
SICOMP: SIAM J. on Computing , Vol. 43, No. 1, pp. 194-230, 2014
(Preliminary version in EC 2009.) We consider a natural strategic version of multi-armed bandits (MAB), motivated by pay-per-click auctions. We show that requiring an MAB algorithm to be incentive-compatible has striking consequences both for structure and regret.
Metric Embeddings with Relaxed Guarantees
T-H. Hubert Chan, Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg and A. Slivkins
SIAM J. on Computing, 38(6): 2303-2329, March 2009.
Preliminary version in FOCS 2005. Given any x, any metric admits a low-dim embedding into L_p, p>=1 with disortion D(x) = O(log 1/x) on all but an x-fraction of edges. Moreover, any decomposable metric (e.g. any doubling metric) admits a low-dim embedding such that D(x) = O(log 1/x)^{1/p} for all x.
Distance Estimation and Object Location via Rings of Neighbors
Special issue of "Distributed Computing" for PODC 2005: Vol. 19, No. 4. (March 2007). We approach several problems on distance estimation and object location with a unified technique called ''rings of neighbors''. Using this technique on metrics of low doubling dimension, we obtain significant improvements for low-stretch routing schemes, searchable small-world networks, distance labeling, and triangulation-based distance estimation.
Network Failure Detection and Graph Connectivity
Jon Kleinberg, Mark Sandler and Aleksandrs Slivkins.
SIAM J. on Computing, 38(4): 1330-1346, Aug 2008.
(Preliminary version in SODA 2004.) We detect network partitions -- with strong provable guarantees -- using a small set of 'agents' placed randomly on nodes of the network. We parameterize our guarantees by edge- and node-connectivity of the underlying graph.
Parameterized Tractability of Edge-Disjoint Paths on DAGs
SIAM J. on Discrete Math, 24(1): 146-157, Feb 2010.
(Preliminary version in ESA 2003.) We resolve a long-standing open question about k-edge-disjoint paths: we show that this problem is W[1]-hard on DAGs, hence unlikely to admit running time f(k)*poly(n). However, such running time can be achieved if the input+demands graph is almost Eulerian.
Interleaving Schemes on Circulant Graphs with Two Offsets
Aleksandrs Slivkins and Shuki Bruck.
Discrete Mathematics 309(13): 4384-4398, July 2009.
Undergraduate research project (1999-2000), tech report (2002). We construct optimal interleaving schemes on infinite circulant graphs with two offsets. Interleaving is used for error-correcting on a bursty noisy channel.

Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure
Aleksandrs Slivkins, Yunzong Xu and Shiliang Zuo
NeurIPS 2025: Conf. on Neural Information Processing Systems [poster] We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret, in bandit problems with a known finite reward structure. Our results extend to contextual bandits and reinforcement learning. We pinpoint a partial identifiability property of the problem instance as necessary and sufficient for the "asymptotic success".
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents
Seyed A. Esmaeili, Suho Shin, Aleksandrs Slivkins
AAAI 2025: Annual AAAI Conf. on Artificial Intelligence We consider a variant of the stochastic bandit problem, where the arms are strategic agents who can improve their rewards (via costly effort) or absorb them. The principal wishes to maximize cumulative reward by incentivizing high performance, both at equilibrium and in the worst case.
Can large language models explore in-context? [poster]
Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins
NeurIPS 2024: Conf. on Neural Information Processing Systems.
Workshop: Adaptive Learning in Complex Environments (2024). Can LLMs explore "natively", without training interventions? We deploy them as agents in simple bandit environments, with environment description & interaction history given in the prompt. We experiment with GPT-3.5, GPT-4, and Llama2 , using many prompt designs. We find that the models do not robustly engage in exploration without substantial interventions.
Impact of Decentralized Learning on Player Utilities in Stackelberg Games
Kate Donahue, Nicole Immorlica, Meena Jagadeesan, Brendan Lucier, Aleksandrs Slivkins
ICML 2024: Intl. Conf. on Machine Learning.
Workshop: ESIF-AIML 2024. Two learning agents (e.g., a chatbot and a user) repeatedly interact and adapt to one another. Standard regret benchmark (Stackelberg equilibrium) results in worst-case linear regret for at least one player. We construct a relaxed benchmark and develop algorithms that achieve sublinear regret against this benchmark for both players.
Incentivized Exploration via Filtered Posterior Sampling [poster].
Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur
EC 2024: ACM Symp. on Economics and Computation.
Workshops: MIW'24, INFORMS RMP 2024, ESIF-AIML 2024. We study "incentivized exploration" (IE): the principal leverages info asymmetry to incentivize agents to explore. We propose "filtered posterior sampling", an extension of a well-known bandit algorithm, as a general-purpose solution for IE. We expand the scope of IE in several key dimensions, while also recovering existing results as special cases.
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics [poster]
Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang (rev. Nov'24)
COLT 2024: Conf. on Learning Theory.
Workshops: MIW'23, Online Ads @EC'24. We design a bidding algorithm for repeated auctions with budget and ROI constraints. The algorithm attains vanishing regret for a range of auctions, and state-of-art welfare-like guarantees if used by all bidders. Importantly, we do not require convergence of the dynamics.
Strategic Budget Selection in a Competitive Autobidding World [poster].
Yiding Feng, Brendan Lucier, Aleksandrs Slivkins
STOC 2024: Annual ACM Symp. on Theory of Computing
Workshops: MIW'24, INFORMS RMP 2024, INFORMS MSOM 2024, Online Ads @EC'24. We study the "metagame" between advertisers (with complex preferences) who strategically adjust (simple) instructions to their respective autobidders. We prove that the price of anarchy, in terms of liquid welfare, is at most 2 for pure Nash, and at most 4 for mixed Nash.
Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits [poster].
Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins
AISTATS 2024: Intl. Conf. on Artificial Intelligence and Statistics The first oracle-efficient algorithm for "pessimistic" policy optimization in contextual bandits. Both discrete and continuous actions, both theory and (extensive) experiments.
Content Filtering with Inattentive Information Consumers
Ian Ball, James Bono, Justin Grana, Nicole Immorlica, Brendan Lucier, and Alex Slivkins
AAAI 2024: Annual AAAI Conf. on Artificial Intelligence We study content filtering from game-theoretic perspective: we initiate the study of strategic interactions between the filter and the consumer when the latter has deliberation costs.
Bandit Social Learning under Myopic Behavior [poster]
Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins
NeurIPS 2023: Conf. on Neural Information Processing Systems.
- Best Poster Award at 15th Annual Machine Learning Symposum, New York, 2024
Agents collectively face a simple multi-armed bandit problem. Each agent acts myopically, in a(ny) way consistent with confidence intervals. We derive stark exploration failures, and provide matching upper bounds on regret by analyzing "moderately optimistic" agents. In particular, we obtain the first general results on failure of the greedy algorithm in multi-armed bandits.
Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression (rev. Nov'24) [slides]
Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster
COLT 2023: Conf. on Learning Theory. We consider a generalization of contextual bandits with knapsacks (CBwK) in which the algorithm consumes and/or replenishes resources subject to packing and/or covering constraints. We provide the first algorithm for this problem (or CBwK) that is based on regression oracles, and the first vanishing-regret guarantees that extend beyond the stochastic environment.
Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence (rev. Dec'25)
Jason Gaitonde, Yingkai Li, Bar Light, Brendan Lucier, and Aleksandrs Slivkins
ITCS 2023: Innovations in Theoretical Computer Science [poster] We study bidding algorithms for repeated auctions with budgets. We prove that a natural bidding algorithm attains good regret bounds for a wide range of auctions, and simultaneously attains state-of-art welfare-like guarantees if all bidders use this algorithm. Importantly, we do not require convergence of the dynamics, thereby avoiding strong assumptions.
Incentivizing Combinatorial Bandit Exploration
Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, and Zhiwei Steven Wu
NeurIPS 2022: Conf. on Neural Information Processing Systems We study "incentivized exploration": how to incentivize self-interested users to explore when they prefer to exploit? We extend it to bandit problems with large, structured action sets and highly correlated beliefs. Specifically, we focus on combinatorial semi-bandits.
Truthful Online Scheduling of Cloud Workloads under Uncertainty
Moshe Babaioff, Ronny Lempel, Brendan Lucier, Ishai Menache, A. Slivkins, Sam Chiu-Wai Wong
TheWebConf 2022: The ACM Web Conf. A cloud computing platform receives statements of work (SoWs) over time. SoWs describe future jobs whose arrival times and durations are probabilistic, and utility declines with completion time. We design a mechanism for pricing, scheduling, and eviction of jobs that optimizes for social welfare and incentivizes truthful reporting of SoWs.
Bandits with Knapsacks beyond the Worst-Case Analysis [poster]
Karthik Abinav Sankararaman and Aleksandrs Slivkins
NeurIPS 2021: Conf. on Neural Information Processing Systems. Bandits with Knapsacks is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds are well-understood, we present three results that go beyond the worst-case perspective: a full characterization for instance-dependent regret bounds, an improved result on simple regret, and a general "reduction" from BwK to bandits.
Sayer: Using Implicit Feedback to Optimize System Policies
Mathias Lecuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Aleksandrs Slivkins, Amit Sharma
SoCC 2021: ACM Symp. on Cloud Computing. System policies for resource allocation often reveal implicit feedback about counterfactuals. We develop a methodology, called Sayer, that leverages such implicit feedback (as well as randomized exploration) to evaluate and train new system policies. We successfully apply Sayer to two production scenarios in Microsoft Azure.
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [poster]
Mark Sellke and Aleksandrs Slivkins
EC 2021: ACM Symp. on Economics and Computation.
- EC'21 workshop: Operations of People-Centric Systems
- NeurIPS'21 workshop: Learning and Decision-Making with Strategic Feedback.
We study bandit algorithms that recommend actions to self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, due incentive-compatibility. We prove that Thompson sampling is incentive-compatible if initialized with enough samples of each arm. Next, we investigate the sample complexity questions that the problem reduces to: (i) how many samples are needed, and (ii) how many rounds does it take to collect them.
Corruption-Robust Exploration in Episodic Reinforcement Learning
Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins and Wen Sun
COLT 2021: Conf. on Learning Theory.
- [30-min talk by Alex: video | slides] [60-min talk by Thodoris: video | slides]
We study episodic RL where an adversary can "corrupt" some episodes, and achieve performance guarantees that adapt to the (unknown) number of corrupted episodes. Along the way, we provide a general framework for RL algorithms that depart from "optimism", and achieve the first RL algorithm for bandit feedback and non-IID transitions.
Adaptive Discretization for Adversarial Lipschitz Bandits
Chara Podimata and Aleksandrs Slivkins
COLT 2021: Conf. on Learning Theory. Adaptive discretization is a central theme in "Lipschitz bandits": bandits with similarity info on actions. We provide the first adaptive discretization for adversarial rewards, and derive instance-dependent regret bounds. We recover the worst-case optimal regret bound for adversarial rewards, and the instance-dependent regret bound for stochastic rewards.
Beating Greedy For Approximating Reserve Prices in Multi-Unit VCG Auctions
Mahsa Derakhshan, David M. Pennock, and Aleksandrs Slivkins
SODA 2021: ACM-SIAM Symp. on Discrete Algorithms We study the problem of computing personalized reserve prices for "eager" VCG auctions (from a dataset of previously submitted bids), focusing on unit-demand, correlated buyers in multi-unit auctions. We provide a polynomial-time algorithm with substantially improved approximation ratio, compared to the "greedy" algorithm from prior work.
Efficient Contextual Bandits with Continuous Actions
M. Majzoubi, C. Zhang, R. Chari, A. Krishnamurthy, J. Langford and A. Slivkins
NeurIPS 2020: Conf. on Neural Information Processing Systems We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.
Constrained episodic reinforcement learning in concave-convex and knapsack settings
K. Brantley, M. Dudik, T. Lykouris, S. Miryoosefi, M. Simchowitz, A. Slivkins, and W. Sun
NeurIPS 2020: Conf. on Neural Information Processing Systems We propose an algorithm for (several versions of) tabular episodic RL with constraints. We provide a modular analysis with strong theoretical guarantees. Our algorithm significantly outperforms the existing approaches in experiments.
Incentivizing Exploration with Selective Data Disclosure (rev. Nov'24) [poster]
Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, and Zhiwei Steven Wu
EC 2020: ACM Symp. on Economics and Computation.
- Early version titled "Incentivizing Exploration with Unbiased Histories".
- Workshop: Learning in Presence of Strategic Behavior at EC 2019.
We incentivize exploration in recommendation systems under weaker assumptions of rationality and trust. Our algorithm constructs a partial order on the users, published ahead of time, and provides each user with feedback from all preceding users in the order (and no other info). We achieve near-optimal learning performance for a range of behavioral models.
Adversarial Bandits with Knapsacks [poster]
Nicole Immorlica, Karthik A. Sankararaman, Aleksandrs Slivkins and Rob Schapire
FOCS 2019: IEEE Symp. on Foundations of Computer Science.
- INFORMS market design workshop at EC 2019.
Bandits with Knapsacks is a broad class of explore-exploit problems with knapsack-style resource utilization constraints. While all prior work is for the stochastic version, we target the adversarial version and obtain an optimal solution. We build on a new, game-theoretic interpretation (and a simpler algorithm) for the stochastic version.
The Perils of Exploration under Competition: A Computational Modeling Approach [poster]
Guy Aridor, Kevin Liu, Aleksandrs Slivkins, Zhiwei Steven Wu
EC 2019: ACM Symp. on Economics and Computation
- Workshop: Learning in Presence of Strategic Behavior at EC 2019.
Many online platforms learn from interactions with users, and explore: make potentially suboptimal choices for the sake of acquiring new information. We study the interplay between exploration and competition for users. We run extensive numerical experiments in a stylized duopoly model, asking whether/when competition incentivizes better algorithms for exploration.
Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting [poster]
Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
COLT 2019: Conf. on Learning Theory. We study contextual bandits with an abstract policy class and continuous action space. We obtain two algorithms: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both algorithms "zoom in" on better regions of the actions space, with improved performance on "benign" problems.
Bayesian Exploration with Heterogenous Agents [blog post]
Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins and Zhiwei Steven Wu
The Web Conference 2019 (with oral presentation) We incentivize exploration for heterogeneous users. We design near-optimal personalized recommendation policies for several versions of the model, depending on whether and when the user types are reported to the principal. We also investigate how the model choice and the user diversity affect the set of "explorable" actions.
The Externalities of Exploration and How Data Diversity Helps Exploitation
Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaighan and Zhiwei Steven Wu
COLT 2018: Conf. on Learning Theory. We investigate whether a "minority" may have incentives to leave a contextual bandit algorithm in scenarios like personalized content recommendations, and interpret this phenomenon as a type of unfairness. We also prove that the greedy algorithm is optimal, in a very strong sense, if the problem instance is sufficiently "smoothed".
Combinatorial Semi-Bandits with Knapsacks
Karthik Abinav Sankararaman and Aleksandrs Slivkins
AISTATS 2018: Intl. Conf. on AI and Statistics (oral presentation, top 5% of submissions). We solve a common generalization of "combinatorial semi-bandits" and "bandits with knapsacks": "arms" are subsets of "atoms" which yield rewards and consume resources.
Competing Bandits: Learning under Competition
Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu
ITCS 2018: Conf. on Innovations in Theoretical Computer Science Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We study the interplay between exploration and competition---how such systems balance the exploration for learning and the competition for users.
Harvesting Randomness to Optimize Distributed Systems
Mathias Lecuyer, Joshua Lockerman, Lamont Nelson, Sid Sen, Amit Sharma, and Alex Slivkins
HotNets 2017: ACM Workshop on Hot Topics in Networks Randomized decisions in cloud systems is a powerful resource for offline optimization. We show how to collect data from existing systems, without modifying them, to evaluate new policies, without deploying them.
Multidimensional Dynamic Pricing for Welfare Maximization
Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, and Steven Wu
EC 2017: ACM Symp. on Economics and Computation (invited to the Special Issue). We solve a dynamic pricing problem with d divisible goods. Buyers have IID valuations over purchased bundles. We optimize welfare (including seller's production costs) in #rounds polynomial in d and the accuracy parameter. Crucially, we make assumptions (concavity and Holder-continuity) on buyers' valuations, rather than on the aggregate response to prices.
A Polynomial Time Algorithm For Spatio-Temporal Security Games
Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi HajiAghayi, and Aleksandrs Slivkins.
EC 2017: ACM Symp. on Economics and Computation We study a practically important class of security games with targets and “patrols” moving on a real line. We compute the Nash equilibrium in time polynomial in the input size, and only polylogarithmic in the number of possible patrol locations (M). Prior work made substantial assumptions, e.g., a constant number of rounds, and had running times polynomial in M.
Bayesian Exploration: Incentivizing Exploration in Bayesian Games (slides)
Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis and Steven Wu
EC 2016: ACM Symp. on Economics and Computation. At each time step, multiple agents arrive, play a fixed Bayesian game, and leave forever. Agents' decisions reveal info that can help future agents, creating a tradeoff between exploration, exploitation, and agents' incentives. We design a social planner which learns over time and coordinates the agents towards socially desirable outcomes.
How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels
Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford and Aleksandrs Slivkins
SIGIR 2016: Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval.
Bayesian Incentive-Compatible Bandit Exploration (slides)
Yishay Mansour, Aleksandrs Slivkins and Vasilis Syrgkanis
EC 2015: ACM Symp. on Economics and Computation We design bandit algorithms that recommend actions to self-interested agents (who then decide which actions to take). By means of carefully designed information disclosure, we incentivize the agents to balance exploration and exploitation so as to maximize social welfare.
Contextual Dueling Bandits
Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins and Masrour Zoghi
COLT 2015: Conf. on Learning Theory. We extend "dueling bandits" (where feedback is limited to pairwise comparisons between arms) to incorporate contexts (as in "contextual bandits"). We propose a natural new solution concept, rooted in game theory, and present algorithms for approximately learning this concept.
Incentivizing High Quality Crowdwork
Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan
WWW 2015: 24th Intl. World Wide Web Conference. Nominee for Best Paper Award.
A talk at CODE@MIT 2015: Conf. on Digital Experimentation @MIT.
Short version: SIGecom Exchanges, Dec 2015. We study causal effects of performance-based payments (PBPs) on the quality of crowdwork, via randomized behavioral experiments on Amazon Mechanical Turk. We shed light on when, where, and why PBPs help improve quality.
Adaptive Contract Design for Crowdsourcing Markets:
Bandit Algorithms for Repeated Principal-Agent Problems
Chien-Ju Ho, Aleksandrs Slivkins and Jennifer Wortman Vaughan.
EC 2014: ACM Symp. on Economics and Computation We consider a repeated version of the principal-agent model in which the principal can revise the contract over time, and the agent can strategically choose the (unobservable) effort level. We treat this as a multi-armed bandit problem, and design an algorithm that adaptively refines the partition of the action space without relying on Lipschitz assumptions.
One Practical Algorithm for Both Stochastic and Adversarial Bandits
Yevgeny Seldin and Aleksandrs Slivkins
ICML 2014: Intl. Conf. on Machine Learning. We present a bandit algorithm that achieves near-optimal performance in both stochastic and adversarial regimes without prior knowledge about the environment. Our algorithm is both rigorous and practical; it is based on a new control lever that we reveal in the EXP3 algorithm.
Robust Multi-objective Learning with Mentor Feedback
Alekh Agarwal, Ashwinkumar Badanidiyuru, Miroslav Dudik, Robert E. Schapire, Aleksandrs Slivkins.
COLT 2014: Conf. on Learning Theory. We study decision-making with multiple objectives. During the training phase, we observe the actions of an outside agent (“mentor”). In the test phase, our goal is to maximally improve upon the mentor’s (unobserved) actions across all objectives. We present an algorithm with near-optimal regret compared with the best possible improvement.
Resourceful Contextual Bandits
Ashwinkumar Badanidiyuru, John Langford and Aleksandrs Slivkins
COLT 2014: Conf. on Learning Theory. Contextual bandits with resource constraints: we consider very general settings for both contextual bandits (arbitrary policy sets) and bandits with resource constraints (bandits with knapsacks), and obtain a regret guarantee with near-optimal statistical properties.
Bandits with Knapsacks
Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins
FOCS 2013: IEEE Symp. on Foundations of Computer Science.
- Workshop on Resource-Efficient Machine Learning (invited talk), at NIPS 2013.
- Bandits with Knapsacks: Dynamic procurement for crowdsourcing
  A shorter version, focused on applications to dynamic procurement and crowdsourcing.
  3rd Workshop on Social Computing & User Generated Content, at EC 2013.
  Workshop on Crowdsourcing: Theory, Algorithms & Applications, at NIPS 2013.
We define a broad class of explore-exploit problems with knapsack-style resource utilization constraints, subsuming dynamic pricing, dynamic procurement, pay-per-click ad allocation, and many other problems. Our algorithms achieve optimal regret w.r.t. the optimal dynamic policy.
Multi-parameter Mechanisms with Implicit Payment Computation
Moshe Babaioff, Robert Kleinberg and Aleksandrs Slivkins
EC 2013: ACM Symp. on Electronic Commerce We show that payment computation does not present any obstacle in designing truthful mechanisms, even for multi-parameter domains, and even when we can only call the allocation rule once. Then we study a prominent example for a multi-parameter setting in which an allocation rule can only be called once, which arises in sponsored search auctions.
Selection and Influence in Cultural Dynamics
David Kempe, Jon Kleinberg, Sigal Oren and Aleksandrs Slivkins
EC 2013: ACM Symp. on Electronic Commerce Influence and selection -- tendency to, resp., become similar to one's friends and to interact with similar people -- work in opposite directions: resp., towards homogeneity and fragmentation. We analyze the societal outcomes when both forces are in effect. We consider a natural class of models from the work in political opinion formation, cultural diversity, and language evolution.
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem [poster]
Ittai Abraham, Omar Alonso, Vasilis Kandylas and Aleksandrs Slivkins
COLT 2013: Conf. on Learning Theory.
- ICML 2012 workshop on Human Computation & Crowdsourcing.
We propose a simple model for adaptive quality control in crowdsourced multiple-choice tasks. We present several algorithms for this problem, and support them with analysis and simulations.
Low-distortion Inference of Latent Similarities from a Multiplex Social Network
Ittai Abraham, Shiri Chechik, David Kempe and Aleksandrs Slivkins
SODA 2013: ACM-SIAM Symp. on Discrete Algorithms
A short version has appeared in
- ICML 2012 workshop on Mining and Learning in Graphs.
The observed social network is a noisy signal about the latent "social space": the ways in which individuals are (dis)similar to one another. We present near-linear time algorithms which, under some standard models, can infer the social space with provable guarantees.
The best of both worlds: stochastic and adversarial bandits.
Sébastien Bubeck and Aleksandrs Slivkins
COLT 2012: Conf. on Learning Theory. We present a new bandit algorithm whose regret is optimal both for adversarial rewards and for stochastic rewards, achieving, resp., square-root regret and polylog regret. Adversarial rewards and stochastic rewards are the two main settings for (non-Bayesian) multi-armed bandits; prior work treats them separately, and does not attempt to jointly optimize for both.
Dynamic pricing with limited supply
Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg and Aleksandrs Slivkins
EC 2012: ACM Symp. on Electronic Commerce
- Workshops: Markets, Mechanisms & Multi-Agent Models, New Challenges for Explore & Exploit (both at ICML'12), Bayesian Mechanism Design (at EC'11).
We consider dynamic pricing with limited supply and unknown demand distribution. We extend multi-armed bandit techniques to the limited supply setting, and obtain optimal regret rates.
Multi-armed bandits on implicit metric spaces
NIPS 2011: Conf. on Neural Information Processing Systems. Suppose an MAB algorithm is given a tree-based classification of arms. This tree implicitly defines a "similarity distance" between arms, but the numeric distances are not revealed to the algorithm. Our algorithm (almost) matches the best known guarantees for the setting (Lipschitz MAB) in which the distances are revealed.
Contextual bandits with similarity information
COLT 2011: Conf. on Learning Theory.
JMLR: J. of Machine Learning Research, 15(Jul):2533-2568, 2014. We study contextual bandits with a known metric over context-arm pairs which upper-bounds the local change in rewards. The main algorithmic idea is to adapt the partitions of the metric space to frequent context arrivals and high-payoff regions. Our framework also handles slowly changing payoffs and variable sets of arms.
Ranked bandits in metric spaces:
learning optimally diverse rankings over large document collections
Aleksandrs Slivkins, Filip Radlinski and Sreenivas Gollapudi
ICML 2010: Intl. Conf. on Machine Learning.
Preliminary version: NIPS 2009 Ranking Workshop. We present a learning-to-rank framework for web search that incorporates similarity and correlation between documents and thus, unlike prior work, scales to large document collections.
Truthful Mechanisms with Implicit Payment Computation
Moshe Babaioff, Robert Kleinberg and Aleksandrs Slivkins
EC 2010: ACM Symp. on Electronic Commerce (Best paper award)
- Open Question at COLT 2011: Monotone bandit allocations
We show that payment computation essentially does not present any obstacle in designing truthful mechanisms for single-parameter domains, even when we can only call the allocation rule once. Applying this to multi-armed bandits (MAB), we design truthful MAB mechanisms for stochastic payoffs. More generally, we open up a problem of designing monotone MAB allocation rules.
Sharp Dichotomies for Regret Minimization in Metric Spaces
Robert Kleinberg and Aleksandrs Slivkins
SODA 2010: ACM-SIAM Symp. on Discrete Algorithms
The original full version is superseded by this version (revised & merged with the STOC'08 paper). We further study multi-armed bandits in metric spaces, focusing on the connections between online learning and metric topology. The main result is that the worst-case regret is either O(log t) or at least sqrt{t}, depending (essentially) on whether the metric space is countable.
Adapting to the Shifting Intent of Search Queries
Umar Syed, Aleksandrs Slivkins and Nina Mishra
NIPS'09: Annual Conf. on Neural Information Processing Systems Query intent may shift over time. A classifier can use the available signals to predict a shift in intent. Then a bandit algorithm can be used to find the new relevant results. We present a meta-algorithm that combines such classifier with a bandit algorithm in a feedback loop, with favorable regret guarantees.
Characterizing Truthful Multi-Armed Bandit Mechanisms
Moshe Babaioff, Yogeshwer Sharma and Aleksandrs Slivkins
EC 2009: ACM Symp. on Electronic Commerce We consider a natural strategic version of multi-armed bandits (MAB), motivated by pay-per-click auctions. We show that requiring an MAB algorithm to be incentive-compatible has striking consequences both for structure and regret.
Adapting to a Changing Environment: the Brownian Restless Bandits (bug-fix Mar'23)
Aleksandrs Slivkins and Eli Upfal.
COLT 2008: Conf. on Learning Theory. We study a version of the stochastic multi-armed bandit problem in which the expected reward of each arm evolves stochastically and gradually in time, following an independent Brownian motion or a similar process. Our benchmark is a hypothetical policy that chooses the best arm in each round.
Multi-armed Bandits in Metric Spaces
Robert Kleinberg, Aleksandrs Slivkins and Eli Upfal.
STOC 2008: ACM Symp. on Theory of Computing
The original full version is superseded by this version (revised & merged with the SODA'10 paper). We introduce the 'Lipschitz bandits problem': a stochastic bandits problem, possibly with a very large set of arms, such that the expected payoffs obey a Lipschitz condition with respect to a given metric space. The goal is to minimize regret as a function of time, both in the worst case and for 'nice' problem instances.
Towards Fast Decentralized Construction of Locality-Aware Overlay Networks
PODC 2007: ACM Symp. on Principles of Distributed Computing [slides] We provide fast (polylog-time) distributed constructions for various locality-aware (low-stretch) distributed data structures, such as distance labeling schemes, name-independent routing schemes, and multicast trees.
Oscillations with TCP-like Flow Control in Networks of Queues
Matthew Andrews and Aleksandrs Slivkins
INFOCOM 2006 IEEE Conf. on Computer Communications For a wide range of TCP-like fluid-based congestion control models, we construct a network of sessions and (almost) FIFO routers such that starting from a certain initial state, the system returns to the same state eventually. Contrasting the prior work, in our example the total sending rate of all sessions that come through any given router never exceeds its capacity.
Metric Embeddings with Relaxed Guarantees
T-H. Hubert Chan, Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg and A. Slivkins
FOCS 2005: IEEE Symp. on Foundations of Computer Science [slides]
- the FOCS'05 version is merged with (I.Abraham, Y.Bartal, O.Neiman).
Given any x, any metric admits a low-dim embedding into L_p, p>=1 with disortion D(x) = O(log 1/x) on all but an x-fraction of edges. Moreover, any decomposable metric (e.g. any doubling metric) admits a low-dim embedding such that D(x) = O(log 1/x)^{1/p} for all x.
Meridian: A Lightweight Network Location Service without Virtual Coordinates
ACM SIGCOMM 2005
Bernard Wong, Aleksandrs Slivkins and Emin G. Sirer. Meridian is a scalable overlay network for performing locality-aware node selection. The project features a live PlanetLab deployement.
Distance Estimation and Object Location via Rings of Neighbors
PODC 2005: ACM Symp. on Principles of Distributed Computing [slides]
- Best Student Paper Award (eligibility: at least one student author)
We approach several problems on distance estimation and object location with a unified technique called ''rings of neighbors''. Using this technique on metrics of low doubling dimension, we obtain significant improvements for low-stretch routing schemes, searchable small-world networks, distance labeling, and triangulation-based distance estimation.
Distributed Approaches to Triangulation and Embedding
SODA 2005: ACM-SIAM Symp. on Discrete Algorithms
[recommended version: merged journal version of the FOCS'04 paper] Following up on the FOCS'04 paper, we consider metric embeddings and triangulation-based distance estimation in a distributed framework with low load on all participating nodes.
Triangulation and Embedding using Small Sets of Beacons
Jon Kleinberg, Aleksandrs Slivkins and Tom Wexler.
FOCS 2004: The IEEE Symp. on Foundations of Computer Science
- The journal version includes results from the SODA'05 paper.
We consider metric embeddings and triangulation-based distance estimation in a distributed framework where nodes measure distances only to a small set of beacons. Our results provide theoretical insight into the empirical success of several recent Internet-related projects.
Network Failure Detection and Graph Connectivity
Jon Kleinberg, Mark Sandler and Aleksandrs Slivkins.
SODA 2004: The ACM-SIAM Symp. on Discrete Algorithms [slides] We detect network partitions -- with strong provable guarantees -- using a small set of 'agents' placed randomly on nodes of the network. We parameterize our guarantees by edge- and node-connectivity of the underlying graph.
Parameterized Tractability of Edge-Disjoint Paths on DAGs
ESA 2003: The European Symp. on Algorithms [slides] We resolve a long-standing open question about k-edge-disjoint paths: we show that this problem is W[1]-hard on DAGs, hence unlikely to admit running time f(k)*poly(n). However, such running time can be achieved if the input+demands graph is almost Eulerian.

Making Contextual Decisions with Low Technical Debt (2017)
Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, and Aleksandrs Slivkins Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Contextual bandit algorithms can be very effective in these settings, but applying them in practice is fraught with technical debt. We create the first general system for contextual bandit learning, called the Decision Service.
Dynamic Ad Allocation: Bandits with Budgets (2013) This brief note is on dynamic allocation of pay-per-click ads with advertisers' budgets. We define and analyze a natural extension of UCB1 to per-arm budgets.
Approximate Matching for Peer-to-Peer Overlays with Cubit (2008).
Bernard Wong, Aleksandrs Slivkins and Emin G. Sirer Cubit is a system that provides fully decentralized approximate keyword search capabilities to a peer-to-peer network. You can use Cubit to find a movie, song or artist even if you misspell the title or the name.

Should You Use Your Large Language Model to Explore or Exploit? (rev. Sep'25)
Keegan Harris and Aleksandrs Slivkins We evaluate the ability of current LLMs to help an agent facing exploration-exploitation tradeoff. On various contextual bandit tasks, we find that LLMs struggle to exploit (despite some in-context mitigations), but do help to explore. Namely, LLMs can fruitfully suggest suitable "candidate actions" to explore in large action spaces with inherent semantics.
Can large language models explore in-context? [poster]
Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins
NeurIPS 2024: Conf. on Neural Information Processing Systems.
Workshop: Adaptive Learning in Complex Environments (2024). Can LLMs explore "natively", without training interventions? We deploy them as agents in simple bandit environments, with environment description & interaction history given in the prompt. We experiment with GPT-3.5, GPT-4, and Llama2 , using many prompt designs. We find that the models do not robustly engage in exploration without substantial interventions.
Impact of Decentralized Learning on Player Utilities in Stackelberg Games
Kate Donahue, Nicole Immorlica, Meena Jagadeesan, Brendan Lucier, Aleksandrs Slivkins
ICML 2024: Intl. Conf. on Machine Learning.
Workshop: ESIF-AIML 2024. Two learning agents (e.g., a chatbot and a user) repeatedly interact and adapt to one another. Standard regret benchmark (Stackelberg equilibrium) results in worst-case linear regret for at least one player. We construct a relaxed benchmark and develop algorithms that achieve sublinear regret against this benchmark for both players.

Incentivized Exploration (IE): How do you incentivize self-interested agents to explore, for the sake of common good, when they prefer to exploit? We design bandit algorithms that recommend actions to self-interested agents, who then decide which actions to take. Information asymmetry incentivizes the agents to follow recommendations.

Surveys

Exploration and Persuasion (a teachable survey), 2021.
Invited chapter in Online and Matching-Based Markets, Cambridge Univ. Press, 2023. IE combines exploration from machine learning and persuasion from economics; we briefly introduce both, using IE as a common lens, and proceed to survey the basics of IE.
Literature review on "exploration and incentives": see Chapter 11 in my bandits book.
Incentivizing exploration via information asymmetry (informal article).
XRDS: Crossroads, The ACM Magazine for Students, Vol. 24(1), Fall 2017.

Research Papers

Bayesian Incentive-Compatible Bandit Exploration (slides)
Yishay Mansour, Aleksandrs Slivkins and Vasilis Syrgkanis
Operations Research, 68(4): 1132-1161, 2020.
A deep revision of the paper in EC 2015. IE for K-armed bandits. We achieve optimal regret, both per-instance and in the worst case. Further, we construct a general reduction from bandits to IE.
Bayesian Exploration: Incentivizing Exploration in Bayesian Games (2016) (slides)
Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis and Steven Wu
Operations Research, 70(2): 1105-1127, 2022.
Deep revision of a paper in EC 2016: ACM Symp. on Economics and Computation. At each time step, multiple agents arrive, play a fixed Bayesian game, and leave forever. Agents' decisions reveal info that can help future agents, creating a tradeoff between exploration, exploitation, and agents' incentives. We design a social planner which learns over time and coordinates the agents towards socially desirable outcomes.
Bayesian Exploration with Heterogenous Agents
Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins and Zhiwei Steven Wu
WWW 2019: The Web Conference We extend IE to heterogeneous users with public or private types. We also investigate how the model choice and the user diversity affect the set of "explorable" actions.
Incentivizing Exploration with Selective Data Disclosure (rev. Nov'24) [poster]
Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, and Zhiwei Steven Wu
- Deep revision of a paper in EC 2020, with some additional results.
- Early version titled "Incentivizing Exploration with Unbiased Histories".
- Workshop: Learning in Presence of Strategic Behavior at EC 2019.
IE in recommendation systems under weaker assumptions of rationality and trust. Our algorithm constructs a partial order on the users, published ahead of time, and provides each user with feedback from all preceding users in the order (and no other info). We achieve near-optimal learning performance for a range of behavioral models.
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [poster]
Mark Sellke and Aleksandrs Slivkins
Operations Research, 71(5):1706-1732, 2023.
Preliminary version in EC 2021: ACM Symp. on Economics and Computation. IE for K-armed bandits. We focus on the "price of incentives": the loss in performance, broadly construed, due Bayesian incentive-compatibility (BIC). We prove that Thompson sampling is BIC if initialized with enough samples of each arm. Next, we investigate the key sample complexity question: how many rounds does it take to collect one sample of each arm.
Exploration and Incentives in Reinforcement Learning [poster] (2021-2023)
Max Simchowitz and Aleksandrs Slivkins
Operations Research, vol. 72(3), 2024. IE for complex exploration problems, where each agent faces the same (but unknown) MDP. Agents control the choice of policies, whereas an algorithm can only recommend them. However, the algorithm can incentivize the agents to explore via information asymmetry.
Incentivizing Combinatorial Bandit Exploration
Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, and Zhiwei Steven Wu
NeurIPS 2022: Conf. on Neural Information Processing Systems. IE for bandit problems with large, structured action sets and highly correlated beliefs. Specifically, we focus on combinatorial semi-bandits.
Incentivized Exploration via Filtered Posterior Sampling [poster].
Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur
EC 2024: ACM Symp. on Economics and Computation.
Workshops: MIW'24, INFORMS RMP 2024, ESIF-AIML 2024. We propose "filtered posterior sampling", an extension of a well-known bandit algorithm, as a general-purpose solution for IE. We expand the scope of IE in several key dimensions, while also recovering existing results as special cases.
Exploration and Incentivizing Participation in Randomized Trials (rev. Jan'26) [poster]
Yingkai Li and Aleksandrs Slivkins We incentivize patients' participation in a randomized controlled trial (RCT) by leveraging information asymmetry between the trial and the patients. We obtain an optimal solution in terms of the statistical performance of the trial, as expressed by an estimation error. On a technical level, this is IE with a different goal: near-uniform exploration.

Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure
Aleksandrs Slivkins, Yunzong Xu and Shiliang Zuo.
NeurIPS 2025: Conf. on Neural Information Processing Systems. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret, in bandit problems with a known finite reward structure. Our results extend to contextual bandits and reinforcement learning. We pinpoint a partial identifiability property of the problem instance as necessary and sufficient for the "asymptotic success".
Bandit Social Learning: Exploration under Myopic Behavior (rev. Apr'25) [poster]
Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins
Preliminary version in NeurIPS 2023, titled "Bandit Social Learning under Myopic Behavior"
- Best Poster Award at 15th Annual Machine Learning Symposum, New York, 2024
Agents collectively face a simple multi-armed bandit problem. Each agent acts myopically, in a(ny) way consistent with confidence intervals. We derive stark exploration failures, and provide matching upper bounds on regret by analyzing "moderately optimistic" agents. In particular, we obtain the first general results on failure of the greedy algorithm in multi-armed bandits.
Incentivizing Exploration with Selective Data Disclosure (rev. Nov'24) [poster]
Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, and Zhiwei Steven Wu
- Deep revision of a paper in EC 2020, with some additional results.
- Early version titled "Incentivizing Exploration with Unbiased Histories".
- Workshop: Learning in Presence of Strategic Behavior at EC 2019.
Incentivized Exploration in recommendation systems under weaker assumptions of rationality and trust. Our algorithm constructs a partial order on the users, published ahead of time, and provides each user with feedback from all preceding users in the order (and no other info). Pur differently, we construct a social network and the users engage in "bandit social learning" on this network. We achieve near-optimal learning performance for a range of behavioral models.
Greedy Algorithm almost Dominates in Smoothed Contextual Bandits (2018-2021)
Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaighan and Zhiwei Steven Wu
SICOMP: SIAM J. on Computing, 52(2), 2023.
Preliminary version (along with some other results) appeared in COLT 2018. We consider the greedy algorithm in linear contextual bandits. We prove that it is optimal, in a very strong sense, if the problem instance is sufficiently "smoothed".

Competing Bandits: Learning under Competition
Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu
ITCS 2018: Conf. on Innovations in Theoretical Computer Science
The Perils of Exploration under Competition: A Computational Modeling Approach [poster]
Guy Aridor, Kevin Liu, Aleksandrs Slivkins, Zhiwei Steven Wu
EC 2019: ACM Symp. on Economics and Computation
Competing Bandits: The Perils of Exploration Under Competition (rev. Oct'24)
Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu
A merged, heavily revised version of papers in ITCS 2018 and EC 2019.
ACM TEAC: ACM Trans. on Economics and Computation, Vol. 13(1), 2025. Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We study interplay between exploration and competition---how such systems balance the exploration for learning and the competition for users.

Characterizing Truthful Multi-Armed Bandit Mechanisms
Moshe Babaioff, Yogeshwer Sharma and Aleksandrs Slivkins
EC 2009: ACM Symp. on Electronic Commerce
SICOMP: SIAM J. on Computing, Vol. 43, No. 1, pp. 194-230, 2014
We consider a natural strategic version of the MAB problem motivated by pay-per-click auctions. We show that requiring an MAB algorithm to be incentive-compatible has striking consequences both for structure and regret.
Truthful Mechanisms with Implicit Payment Computation
Moshe Babaioff, Robert Kleinberg and Aleksandrs Slivkins
J. of the ACM, Vol. 62, Issue 2, May 2015.
EC 2010: ACM Symp. on Electronic Commerce (Best paper award)
- Partially subsumes the paper in EC 2013.
- Open Question at COLT 2011: Monotone bandit allocations
- The latest revision (Nov'15) reflects some minor bug fixes in the proof of Lemma 7.10.
We show that payment computation does not present any obstacle in designing truthful mechanisms for single-parameter domains, even when we can only call the allocation rule once. Applying this to multi-armed bandits (MAB), we design truthful MAB mechanisms for stochastic payoffs. More generally, we open up a problem of designing monotone MAB allocation rules.
Multi-parameter Mechanisms with Implicit Payment Computation
Moshe Babaioff, Robert Kleinberg and Aleksandrs Slivkins
EC 2013: ACM Symp. on Electronic Commerce We show that payment computation essentially does not present any obstacle in designing truthful mechanisms, even for multi-parameter domains, and even when we can only call the allocation rule once. Then we study a prominent example for a multi-parameter setting in which an allocation rule can only be called once, which arises in sponsored search auctions.

Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence (rev. Dec'25)
Jason Gaitonde, Yingkai Li, Bar Light, Brendan Lucier, and Aleksandrs Slivkins
ITCS 2023: Innovations in Theoretical Computer Science [poster] We study bidding algorithms for repeated auctions with budgets. We prove that a natural bidding algorithm attains good regret bounds for a wide range of auctions, and simultaneously attains state-of-art welfare-like guarantees if all bidders use this algorithm. Importantly, we do not require convergence of the dynamics, thereby avoiding strong assumptions.
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics
Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang (rev. Nov'24)
COLT 2024: Conf. on Learning Theory.
Workshops: MIW'23, Online Ads @EC'24. We design a bidding algorithm for repeated auctions with budget and ROI constraints. The algorithm attains vanishing regret for a range of auctions, and state-of-art welfare-like guarantees if used by all bidders. Importantly, we do not require convergence of the dynamics.
Strategic Budget Selection in a Competitive Autobidding World [poster].
Yiding Feng, Brendan Lucier, Aleksandrs Slivkins
STOC 2024: Annual ACM Symp. on Theory of Computing
Workshops: MIW'24, INFORMS RMP 2024, INFORMS MSOM 2024, Online Ads @EC'24. We study the "metagame" between advertisers (with complex preferences) who strategically adjust (simple) instructions to their respective autobidders. We prove that the price of anarchy, in terms of liquid welfare, is at most 2 for pure Nash, and at most 4 for mixed Nash.

Dynamic pricing with limited supply
Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg and Aleksandrs Slivkins
EC 2012: ACM Symp. on Electronic Commerce
Special issue for EC 2012: ACM Trans. on Economics and Computation, 3(1): 4 (2015). We consider dynamic pricing with limited supply and unknown demand distribution. We extend multi-armed bandit techniques to the limited supply setting, and obtain optimal regret rates.
Bandits with Knapsacks
Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins
FOCS 2013: IEEE Symp. on Foundations of Computer Science.
J. of the ACM, Vol. 65 Issue 3, March 2018. We define a broad class of explore-exploit problems with knapsack-style resource utilization constraints, subsuming dynamic pricing, dynamic procurement, pay-per-click ad allocation, and many other problems. Our algorithms achieve optimal regret w.r.t. the optimal dynamic policy.
Dynamic Ad Allocation: Bandits with Budgets (2013) This brief note is on dynamic allocation of pay-per-click ads with advertisers' budgets. We define and analyze a natural extension of UCB1 to per-arm budgets.
Resourceful Contextual Bandits
Ashwinkumar Badanidiyuru, John Langford and Aleksandrs Slivkins
COLT 2014: Conf. on Learning Theory. Contextual bandits with resource constraints: we consider very general settings for both contextual bandits (arbitrary policy sets) and bandits with resource constraints (bandits with knapsacks), and obtain a regret guarantee with near-optimal statistical properties.
Combinatorial Semi-Bandits with Knapsacks
Karthik Abinav Sankararaman and Aleksandrs Slivkins
AISTATS 2018: Intl. Conf. on AI and Statistics (oral presentation, top 5% of submissions). We solve a common generalization of "combinatorial semi-bandits" and "bandits with knapsacks": "arms" are subsets of "atoms" which yield rewards and consume resources.
Adversarial Bandits with Knapsacks [poster]
Nicole Immorlica, Karthik A. Sankararaman, Aleksandrs Slivkins and Rob Schapire
J. of the ACM, 69 (6): 1-47, 2022.
Deep revision of a paper in FOCS 2019: IEEE Symp. on Foundations of Computer Science. Bandits with Knapsacks is a broad class of explore-exploit problems with knapsack-style resource utilization constraints. While all prior work is for the stochastic version, we target the adversarial version and obtain an optimal solution. We build on a new, game-theoretic interpretation (and a simpler algorithm) for the stochastic version.
Constrained episodic reinforcement learning in concave-convex and knapsack settings
K. Brantley, M. Dudik, T. Lykouris, S. Miryoosefi, M. Simchowitz, A. Slivkins, and W. Sun
NeurIPS 2020: Conf. on Neural Information Processing Systems We propose an algorithm for (several versions of) tabular episodic RL with constraints. We provide a modular analysis with strong theoretical guarantees. Our algorithm significantly outperforms the existing approaches in experiments.
Bandits with Knapsacks beyond the Worst-Case Analysis [poster]
Karthik Abinav Sankararaman and Aleksandrs Slivkins
NeurIPS 2021: Conf. on Neural Information Processing Systems. Bandits with Knapsacks is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds are well-understood, we present three results that go beyond the worst-case perspective: a full characterization for instance-dependent regret bounds, an improved result on simple regret, and a general "reduction" from BwK to bandits.
Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression (rev. Nov'24)
Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster
JMLR: Journal of Machine Learning Research, 25(394):1−37, 2024.
- Preliminary version in COLT 2023: Conf. on Learning Theory.
- The journal version features an important improvement in the main result.
We consider a generalization of contextual bandits with knapsacks (CBwK) in which the algorithm consumes and/or replenishes resources subject to packing and/or covering constraints. We provide the first algorithm for this problem (or CBwK) that is based on regression oracles, and the first vanishing-regret guarantees that extend beyond the stochastic environment.

Bandits and Experts in Metric Spaces
Robert Kleinberg, Aleksandrs Slivkins and Eli Upfal.
A merged and heavily revised version of papers in STOC'08 and SODA'10.
J. of the ACM, Volume 66, Issue 4, May 2019. We introduce the 'Lipschitz bandits problem': a stochastic bandit problem, possibly with a very large set of arms, such that the expected payoffs obey a Lipschitz condition with respect to a given metric space. The goal is to minimize regret as a function of time, both in the worst case and for 'nice' problem instances.
Ranked bandits in metric spaces:
learning optimally diverse rankings over large document collections
Aleksandrs Slivkins, Filip Radlinski and Sreenivas Gollapudi
ICML 2010: Intl. Conf. on Machine Learning.
JMLR: J. of Machine Learning Research, 14(Feb):399-436, 2013.
We present a learning-to-rank framework for web search that incorporates similarity and correlation between documents and thus, unlike prior work, scales to large document collections.
Contextual bandits with similarity information
COLT 2011: Conf. on Learning Theory.
JMLR: J. of Machine Learning Research, 15(Jul):2533-2568, 2014.
We study contextual bandits with a known metric over context-arm pairs which upper-bounds the local change in rewards. The main algorithmic idea is to adapt the partitions of the metric space to frequent context arrivals and high-payoff regions. Our framework also handles slowly changing payoffs and variable sets of arms.
Multi-armed bandits on implicit metric spaces
NIPS 2011: Conf. on Neural Information Processing Systems. Suppose an MAB algorithm is given a tree-based classification of arms. This tree implicitly defines a "similarity distance" between arms, but the numeric distances are not revealed to the algorithm. Our algorithm (almost) matches the best known guarantees for the setting (Lipschitz MAB) in which the distances are revealed.
Adaptive Contract Design for Crowdsourcing Markets:
Bandit Algorithms for Repeated Principal-Agent Problems
Chien-Ju Ho, Aleksandrs Slivkins and Jennifer Wortman Vaughan.
EC 2014: ACM Symp. on Economics and Computation
JAIR (J. of Artificial Intelligence Research), Vol. 54, 2015. We consider a repeated version of the principal-agent model in which the principal can revise the contract over time, and the agent can strategically choose the (unobservable) effort level. We treat this as a multi-armed bandit problem, and design an algorithm that adaptively refines the partition of the action space without relying on Lipschitz assumptions.
Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting [poster]
Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
COLT 2019: Conf. on Learning Theory.
JMLR: J. of Machine Learning Research, 21(137):1-45, 2020. We study contextual bandits with an abstract policy class and continuous action space. We obtain two algorithms: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both algorithms "zoom in" on better regions of the actions space, with improved performance on "benign" problems.
Efficient Contextual Bandits with Continuous Actions
M. Majzoubi, C. Zhang, R. Chari, A. Krishnamurthy, J. Langford and A. Slivkins
NeurIPS 2020: Conf. on Neural Information Processing Systems We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning (rev. Jun'25) Chara Podimata and Aleksandrs Slivkins
- Preliminary version in COLT 2021 (focused on Lipschitz bandits and titled "Adaptive Discretization for Adversarial Lipschitz Bandits").
Adaptive discretization is a central theme in "Lipschitz bandits": bandits with similarity info on actions. We provide the first adaptive discretization algorithm for adversarial rewards, and derive instance-dependent regret bounds (recovering regret bounds for stochastic rewards). We apply this machinery to several scenarios in dynamic pricing and dynamic auction optimization.

Bandit Social Learning: Exploration under Myopic Behavior (rev. Apr'25) [poster]
Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins
Preliminary version in NeurIPS 2023: Conf. on Neural Information Processing Systems.
- Best Poster Award at 15th Annual Machine Learning Symposum, New York, 2024
Agents collectively face a simple multi-armed bandit problem. Each agent acts myopically, in a(ny) way consistent with confidence intervals. We derive stark exploration failures, and provide matching upper bounds on regret by analyzing "moderately optimistic" agents. In particular, we obtain the first general results on failure of the greedy algorithm in multi-armed bandits.
Low-distortion Inference of Latent Similarities from a Multiplex Social Network
Ittai Abraham, Shiri Chechik, David Kempe and Aleksandrs Slivkins
SODA 2013: ACM-SIAM Symp. on Discrete Algorithms
SICOMP: SIAM J. on Computing, Vol. 44(3), 2015. The observed social network is a noisy signal about the latent "social space": the ways in which individuals are (dis)similar to one another. We present near-linear time algorithms which, under some standard models, can infer the social space with provable guarantees.
Selection and Influence in Cultural Dynamics
David Kempe, Jon Kleinberg, Sigal Oren and Aleksandrs Slivkins
EC 2013: ACM Symp. on Electronic Commerce
Network Science, vol. 4(1), 2016. Influence and selection -- tendency to, resp., become similar to one's friends and to interact with similar people -- work in opposite directions: resp., towards homogeneity and fragmentation. We analyze the societal outcomes when both forces are in effect. We consider a natural class of models from the work in political opinion formation, cultural diversity, and language evolution.

Surveys and position papers

Online Decision Making in Crowdsourcing Markets: Theoretical Challenges
Aleksandrs Slivkins and Jennifer Wortman Vaughan
SIGecom Exchanges, Dec 2013. In crowdsourcing markets, task requesters and the platform itself make repeated decisions about which tasks to assign to each worker at which price. Designing algorithms for making these decisions is a rich, emerging problem space. We survey this problem space, point out significant modeling difficulties, and identify directions to make progress.
Crowdsourcing Gold-HIT Creation at Scale: Challenges and Adaptive Exploration Approaches
Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, A. Slivkins, Hai Wu
CrowdScale 2013: Workshop on Crowdsourcing at Scale Gold HITs --- Human Intelligence Tasks with known answers --- are commonly used to measure worker performance and data quality in industrial applications of crowdsourcing. We suggest adaptive exploration as a promising approach for automated, scalable Gold HIT creation. We substantiate this with initial experiments in a stylized model.

Conference and journal publications

How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels
Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford and Aleksandrs Slivkins
SIGIR 2016: Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval.
Incentivizing High Quality Crowdwork
Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan
WWW 2015: 24th Intl. World Wide Web Conference. Nominee for Best Paper Award.
A talk at CODE@MIT 2015: Conf. on Digital Experimentation @MIT.
Short version: SIGecom Exchanges, Dec 2015. We study causal effects of performance-based payments (PBPs) on the quality of crowdwork, via randomized behavioral experiments on Amazon Mechanical Turk. We shed light on when, where, and why PBPs help improve quality.
Adaptive Contract Design for Crowdsourcing Markets:
Bandit Algorithms for Repeated Principal-Agent Problems
Chien-Ju Ho, Aleksandrs Slivkins and Jennifer Wortman Vaughan.
EC 2014: ACM Symp. on Economics and Computation
JAIR: J. of Artificial Intelligence Research, Vol. 54, 2015. (Special Track on Human Computation) We consider a repeated version of the principal-agent model in which the principal can revise the contract over time, and the agent can strategically choose the (unobservable) effort level. We treat this as a multi-armed bandit problem, and design an algorithm that adaptively refines the partition of the action space without relying on Lipschitz assumptions.
Bandits with Knapsacks: Dynamic procurement for crowdsourcing
Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins
A version of the FOCS'13 paper, focused on applications to crowdsourcing markets.
3rd Workshop on Social Computing and User Generated Content, at EC 2013.
Workshop on Crowdsourcing: Theory, Algorithms & Applications, at NIPS 2013.
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem [poster]
Ittai Abraham, Omar Alonso, Vasilis Kandylas and Aleksandrs Slivkins
COLT 2013: Conf. on Learning Theory. We propose a simple model for adaptive quality control in crowdsourced multiple-choice tasks. We present several algorithms for this problem, and support them with analysis and simulations.

Corruption-Robust Exploration in Episodic Reinforcement Learning
Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins and Wen Sun
COLT 2021: Conf. on Learning Theory.
Mathematics of Operations Research, 50(2), 2024.
- [30-min talk by Alex: video | slides] [60-min talk by Thodoris: video | slides]
We study episodic RL where an adversary can "corrupt" some episodes, and achieve performance guarantees that adapt to the (unknown) number of corrupted episodes. Along the way, we provide a general framework for RL algorithms that depart from "optimism", and achieve the first RL algorithm for bandit feedback and non-IID transitions.
One Practical Algorithm for Both Stochastic and Adversarial Bandits
Yevgeny Seldin and Aleksandrs Slivkins
ICML 2014: Intl. Conf. on Machine Learning. We present a bandit algorithm that achieves near-optimal performance in both stochastic and adversarial regimes without prior knowledge about the environment. Our algorithm is both rigorous and practical; it is based on a new control lever that we reveal in the EXP3 algorithm.
The best of both worlds: stochastic and adversarial bandits.
Sébastien Bubeck and Aleksandrs Slivkins
COLT 2012: Conf. on Learning Theory. We present a new bandit algorithm whose regret is optimal both for adversarial rewards and for stochastic rewards, achieving, resp., square-root regret and polylog regret. Adversarial rewards and stochastic rewards are the two main settings for (non-Bayesian) multi-armed bandits; prior work treats them separately, and does not attempt to jointly optimize for both.
Contextual bandits with similarity information
COLT 2011: Conf. on Learning Theory.
JMLR: J. of Machine Learning Research, 15(Jul):2533-2568, 2014.
We study contextual bandits with a known metric over context-arm pairs which upper-bounds the local change in rewards. The main algorithmic idea is to adapt the partitions of the metric space to frequent context arrivals and high-payoff regions. Our framework also handles slowly changing payoffs and variable sets of arms.
Adapting to the Shifting Intent of Search Queries
Umar Syed, Aleksandrs Slivkins and Nina Mishra
NIPS'09: Annual Conf. on Neural Information Processing Systems
Query intent may shift over time. A classifier can use the available signals to predict a shift in intent. Then a bandit algorithm can be used to find the new relevant results. We present a meta-algorithm that combines such classifier with a bandit algorithm in a feedback loop, with favorable regret guarantees.
Adapting to a Changing Environment: the Brownian Restless Bandits
Aleksandrs Slivkins and Eli Upfal.
COLT 2008:Conf. on Learning Theory.
We study a version of the stochastic MAB problem in which the expected reward of each arm evolves stochastically and gradually in time, following an independent Brownian motion or a similar process. Our benchmark is a hypothetical policy that chooses the best arm in each round.

Systems

Multi-World Testing: A System for Experimentation, Learning, And Decision-Making (rev. Jul'16)
Alekh Agarwal, Sarah Bird, Markus Cozowicz, Miro Dudik, John Langford, Lihong Li, Luong Hoang, Dan Melamed, Sid Sen, Robert Schapire, Alex Slivkins. (The MWT project) Multi-World Testing (MWT) is a methodology for principled and efficient experimentation, learning, and decision-making. It is plausibly applicable to most services that interact with customers; in many scenarios, it is exponentially more efficient than the traditional A/B testing. The underlying research area is known as "contextual bandits" and "counterfactual evaluation".
Making Contextual Decisions with Low Technical Debt (2017)
Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, and Aleksandrs Slivkins Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Contextual bandit algorithms can be very effective in these settings, but applying them in practice is fraught with technical debt. We create the first general system for contextual bandit learning, called the Decision Service.
Harvesting Randomness to Optimize Distributed Systems
Mathias Lecuyer, Joshua Lockerman, Lamont Nelson, Sid Sen, Amit Sharma, and Alex Slivkins
HotNets 2017: ACM Workshop on Hot Topics in Networks Randomized decisions in cloud systems is a powerful resource for offline optimization. We show how to collect data from existing systems, without modifying them, to evaluate new policies, without deploying them.
Sayer: Using Implicit Feedback to Optimize System Policies
Mathias Lecuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Aleksandrs Slivkins, Amit Sharma
SoCC 2021: ACM Symp. on Cloud Computing. System policies for resource allocation often reveal implicit feedback about counterfactuals. We develop a methodology, called Sayer, that leverages such implicit feedback (as well as randomized exploration) to evaluate and train new system policies. We successfully apply Sayer to two production scenarios in Microsoft Azure.

Algorithms

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting (2019) [poster]
Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
COLT 2019: Conf. on Learning Theory.
JMLR: J. of Machine Learning Research, 21(137):1-45, 2020. We study contextual bandits with an abstract policy class and continuous action space. We obtain two algorithms: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both algorithms "zoom in" on better regions of the actions space, with improved performance on "benign" problems.
Contextual Dueling Bandits
Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins and Masrour Zoghi
COLT 2015: Conf. on Learning Theory. We extend "dueling bandits" (where feedback is limited to pairwise comparisons between arms) to incorporate contexts (as in "contextual bandits"). We propose a natural new solution concept, rooted in game theory, and present algorithms for approximately learning this concept.
Resourceful Contextual Bandits
Ashwinkumar Badanidiyuru, John Langford and Aleksandrs Slivkins
COLT 2014: Conf. on Learning Theory. Contextual bandits with resource constraints: we consider very general settings for both contextual bandits (arbitrary policy sets) and bandits with resource constraints (bandits with knapsacks), and obtain a regret guarantee with near-optimal statistical properties.

Network Failure Detection and Graph Connectivity
Jon Kleinberg, Mark Sandler and Aleksandrs Slivkins.
SIAM J. on Computing, 38(4): 1330-1346, Aug 2008.
SODA 2004: The ACM-SIAM Symp. on Discrete Algorithms [slides]
We detect network partitions -- with strong provable guarantees -- using a small set of 'agents' placed randomly on nodes of the network. We parameterize our guarantees by edge- and node-connectivity of the underlying graph.
Triangulation and Embedding using Small Sets of Beacons
Jon Kleinberg, Aleksandrs Slivkins and Tom Wexler.
J. of the ACM, 56(6), Sept 2009.
FOCS 2004: The IEEE Symp. on Foundations of Computer Science
- The journal version subsumes the paper
  "Distributed Approaches to Triangulation and Embedding" (Slivkins, SODA 2005).
- For a fuller story see Chapter 3 of my thesis.
We consider metric embeddings and triangulation-based distance estimation in a distributed framework with low load on the participating nodes. Our results provide theoretical insight into the empirical success of several recent Internet-related projects.
Metric Embeddings with Relaxed Guarantees
T-H.H. Chan, K. Dhamdhere, A. Gupta, J. Kleinberg and A. Slivkins
SIAM J. on Computing, 38(6): 2303-2329, March 2009.
FOCS 2005: IEEE Symp. on Foundations of Computer Science [slides]
- The FOCS'05 version is merged with (I.Abraham, Y.Bartal, O.Neiman).
Given any x, any metric admits a low-dim embedding into L_p, p>=1 with disortion D(x) = O(log 1/x) on all but an x-fraction of edges. Moreover, any decomposable metric (e.g. any doubling metric) admits a low-dim embedding such that D(x) = O(log 1/x)^{1/p} for all x.
Distance Estimation and Object Location via Rings of Neighbors
PODC 2005: ACM Symp. on Principles of Distributed Computing [slides]
- Best Student Paper Award (eligibility: at least one student author)
Special issue of "Distributed Computing" Vol. 19, No. 4 (March 2007).
We approach several problems on distance estimation and object location with a unified technique called ''rings of neighbors''. Using this technique on metrics of low doubling dimension, we obtain significant improvements for low-stretch routing schemes, searchable small-world networks, distance labeling, and triangulation-based distance estimation.
Meridian: A Lightweight Network Location Service without Virtual Coordinates [project]
ACM SIGCOMM 2005
Bernard Wong, Aleksandrs Slivkins and Emin G. Sirer.
Meridian is a scalable overlay network for performing locality-aware node selection. The project featured a large-scale PlanetLab deployment.
Towards Fast Decentralized Construction of Locality-Aware Overlay Networks
PODC 2007: ACM Symp. on Principles of Distributed Computing [slides]
We provide fast (polylog-time) distributed constructions for various locality-aware (low-stretch) distributed data structures, such as distance labeling schemes, name-independent routing schemes, and multicast trees.
Oscillations with TCP-like Flow Control in Networks of Queues
Matthew Andrews and Aleksandrs Slivkins
INFOCOM 2006 IEEE Conf. on Computer Communications
For a wide range of TCP-like fluid-based congestion control models, we construct a network of sessions and (almost) FIFO routers such that starting from a certain initial state, the system returns to the same state eventually. Contrasting the prior work, in our example the total sending rate of all sessions that come through any given router never exceeds its capacity.
Approximate Matching for Peer-to-Peer Overlays with Cubit (2009)
Bernard Wong, Aleksandrs Slivkins and Emin G. Sirer.
Cubit is a system that provides fully decentralized approximate keyword search capabilities to a peer-to-peer network. You can use Cubit to find a movie, song or artist even if you misspell the title or the name.

Aleksandrs Slivkins: publications

By Topic / Research Program

Surveys and position papers

Conference and journal publications

Systems

Algorithms