no code implementations • 31 Jan 2022 • Luca Pasqualini, Gianluca Amato, Marco Fantozzi, Rosa Gini, Alessandro Marchetti, Carlo Metta, Francesco Morandin, Maurizio Parton
In the last years, the DeepMind algorithm AlphaZero has become the state of the art to efficiently tackle perfect information two-player zero-sum games with a win/lose outcome.
1 code implementation • 17 Sep 2021 • Enrico Meloni, Matteo Tiezzi, Luca Pasqualini, Marco Gori, Stefano Melacci
In the last few years, the scientific community showed a remarkable and increasing interest towards 3D Virtual Environments, training and testing Machine Learning-based models in realistic virtual worlds.
1 code implementation • 2 Mar 2021 • Fabio Saggese, Luca Pasqualini, Marco Moretti, Andrea Abrardo
Assuming that each eMBB codeword can tolerate a certain limited amount of puncturing beyond which is in outage, we show that the policy devised by the DRL agent never violates the latency requirement of URLLC traffic and, at the same time, manages to keep the number of eMBB codewords in outage at minimum levels, when compared to other state-of-the-art schemes.
no code implementations • 8 Feb 2021 • Andrea Zugarini, Luca Pasqualini, Stefano Melacci, Marco Maggini
Writers, poets, singers usually do not create their compositions in just one breath.
1 code implementation • 31 Oct 2020 • Luca Pasqualini, Maurizio Parton
This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence and the observation at each time step is the last sequence of bits appended to such state.
1 code implementation • 16 Jul 2020 • Enrico Meloni, Luca Pasqualini, Matteo Tiezzi, Marco Gori, Stefano Melacci
Recently, researchers in Machine Learning algorithms, Computer Vision scientists, engineers and others, showed a growing interest in 3D simulators as a mean to artificially create experimental settings that are very close to those in the real world.
2 code implementations • 15 Dec 2019 • Luca Pasqualini, Maurizio Parton
In this context, N is the length of the period of the generated sequence, and the policy is iteratively improved using the average value of an appropriate test suite run over that period.