no code implementations • 31 Dec 2013 • Steve N'Guyen, Clément Moulin-Frier, Jacques Droulez
Schematically, two main approaches have been followed: either the agent learns which option is the correct one to choose in a given situation by trial and error, or the agent already has some knowledge on the possible consequences of his decisions; this knowledge being generally expressed as a conditional probability distribution.