no code implementations • 2 Mar 2023 • Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour
To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.
no code implementations • 27 Nov 2022 • Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour
To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting.
no code implementations • 22 Jul 2022 • Orin Levy, Yishay Mansour
For the latter, our algorithm obtains regret bound of $\widetilde{O}( (H+{1}/{p_{min}})H|S|^{3/2}\sqrt{|A|T\log(\max\{|\mathcal{G}|,|\mathcal{P}|\}/\delta)})$ with probability $1-\delta$, where $\mathcal{P}$ and $\mathcal{G}$ are finite and realizable function classes used to approximate the dynamics and rewards respectively, $p_{min}$ is the minimum reachability parameter, $S$ is the set of states, $A$ the set of actions, $H$ the horizon, and $T$ the number of episodes.
no code implementations • 2 Mar 2022 • Orin Levy, Yishay Mansour
We study learning contextual MDPs using a function approximation for both the rewards and the dynamics.