no code implementations • 24 Oct 2023 • REDA ALAMI, Mohammed Mahfoud, Mastane Achab
In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$.
no code implementations • 1 Apr 2023 • REDA ALAMI, Mohammed Mahfoud, Eric Moulines
We consider the problem of learning in a non-stationary reinforcement learning (RL) environment, where the setting can be fully described by a piecewise stationary discrete-time Markov decision process (MDP).
no code implementations • 11 Feb 2023 • Mohamed El Amine Seddik, Mohammed Mahfoud, Merouane Debbah
Relying on recently developed random tensor tools, this paper deals precisely with the non-orthogonal case by deriving an asymptotic analysis of a parameterized deflation procedure performed on an order-three and rank-two spiked tensor.