no code implementations • 6 Dec 2023 • Aaron J. Snoswell, Lucinda Nelson, Hao Xue, Flora D. Salim, Nicolas Suzor, Jean Burgess
Generic `toxicity' classifiers continue to be used for evaluating the potential for harm in natural language generation, despite mounting evidence of their shortcomings.
no code implementations • 3 Jun 2021 • Aaron J. Snoswell, Surya P. N. Singh, Nan Ye
Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a reward function ensemble to rationalize demonstrations of different but unlabelled intents.
1 code implementation • 1 Dec 2020 • Aaron J. Snoswell, Surya P. N. Singh, Nan Ye
This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model.