no code implementations • 17 Dec 2023 • Longchao Da, Porter Jenkins, Trevor Schwantes, Jeffrey Dotson, Hua Wei
In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking.