Q-Learning

Q-Learning is an off-policy temporal difference control algorithm:

$$Q\left(S_{t}, A_{t}\right) \leftarrow Q\left(S_{t}, A_{t}\right) + \alpha\left[R_{t+1} + \gamma\max_{a}Q\left(S_{t+1}, a\right) - Q\left(S_{t}, A_{t}\right)\right] $$

The learned action-value function $Q$ directly approximates $q_{*}$, the optimal action-value function, independent of the policy being followed.

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Reinforcement Learning (RL)	205	34.34%
Decision Making	48	8.04%
Multi-agent Reinforcement Learning	27	4.52%
Offline RL	26	4.36%
Management	26	4.36%
Atari Games	17	2.85%
OpenAI Gym	13	2.18%
Continuous Control	11	1.84%
Autonomous Driving	10	1.68%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Off-Policy TD Control