no code implementations • 15 Feb 2022 • Haoyi You, Beichen Yu, Haiming Jin, Zhaoxing Yang, Jiahui Sun
Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric.
4 code implementations • 10 Nov 2021 • Zhaoxing Yang, Rong Ding, Haiming Jin, Yifei Wei, Haoyi You, Guiyun Fan, Xiaoying Gan, Xinbing Wang
In addition, with such modularization, the training algorithm of DeCOM separates the original constrained optimization into an unconstrained optimization on reward and a constraints satisfaction problem on costs.
Multi-agent Reinforcement Learning reinforcement-learning +1