no code implementations • 5 May 2024 • Libing Yang, Yang Li, Long Chen
In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes.