no code implementations • 7 May 2024 • Omar Besbes, Yash Kanoria, Akshit Kumar
To study these questions, we introduce a model of repeated user consumption in which, at each interaction, users select between an outside option and the best option from a recommendation set.
1 code implementation • 20 Jun 2023 • Matias Alvo, Daniel Russo, Yash Kanoria
The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods.
no code implementations • 15 Mar 2016 • Ramesh Johari, Vijay Kamble, Yash Kanoria
We introduce a benchmark model with heterogeneous "workers" (demand) and a limited supply of "jobs" that arrive over time.