1 code implementation • 29 Sep 2020 • Vihang P. Patil, Markus Hofmarcher, Marius-Constantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. Arjona-Medina, Sepp Hochreiter
For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks.
General Reinforcement Learning Multiple Sequence Alignment +1
no code implementations • 25 Sep 2019 • Leila Arras, Jose A. Arjona-Medina, Michael Widrich, Grégoire Montavon, Michael Gillhofer, Klaus-Robert Müller, Sepp Hochreiter, Wojciech Samek
While neural networks have acted as a strong unifying force in the design of modern AI systems, the neural network architectures themselves remain highly heterogeneous due to the variety of tasks to be solved.
2 code implementations • NeurIPS 2019 • Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter
In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards.
Ranked #9 on Atari Games on Atari 2600 Bowling
2 code implementations • 1 Jun 2016 • Michael Treml, Jose A. Arjona-Medina, Thomas Unterthiner, Rupesh Durgesh, Felix Friedmann, Peter Schuberth, Andreas Mayr, Martin Heusel, Markus Hofmarcher, Michael Widrich, Bernhard Nessler, Sepp Hochreiter
We propose a novel deep network architecture for image segmentation that keeps the high accuracy while being efficient enough for embedded devices.