no code implementations • ICML 2020 • Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. Kumar
We propose a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation (BMLE) method originally appearing in the adaptive control literature.
no code implementations • 27 Mar 2024 • He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, Ping-Chun Hsieh, Chung-Chi Tsai
The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain).
no code implementations • 19 Mar 2024 • Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng
(iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors.
no code implementations • 19 Dec 2023 • Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu
Our findings highlight the $O(1/\sqrt{T})$ min-iterate convergence rate specifically in the context of neural function approximation.
no code implementations • 18 Oct 2023 • Yen-ju Chen, Nai-Chieh Huang, Ping-Chun Hsieh
In response to this gap, we adapt the celebrated Nesterov's accelerated gradient (NAG) method to policy optimization in RL, termed \textit{Accelerated Policy Gradient} (APG).
no code implementations • 17 Oct 2023 • Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar
We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping.
no code implementations • 27 Sep 2023 • Kuo-Hao Ho, Ping-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu
In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength.
no code implementations • 10 Dec 2022 • Hsin-En Su, Yen-ju Chen, Ping-Chun Hsieh, Xi Liu
In this paper, we rethink off-policy learning via Coordinate Ascent Policy Optimization (CAPO), an off-policy actor-critic algorithm that decouples policy improvement from the state distribution of the behavior policy without using the policy gradient.
no code implementations • 6 Dec 2022 • Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu
Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives.
no code implementations • 27 Sep 2022 • Yung-Han Ho, Chia-Hao Kao, Wen-Hsiao Peng, Ping-Chun Hsieh
Recently, the dual-critic design is proposed to update the actor by alternating the rate and distortion critics.
no code implementations • 8 Mar 2022 • Yu-Heng Hung, Ping-Chun Hsieh
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.
no code implementations • 26 Oct 2021 • Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, Hsuan-Yu Yao, Kai-Chun Hu, Liang-Chun Ouyang, I-Chen Wu
Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness.
1 code implementation • NeurIPS 2021 • Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai
This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices.
no code implementations • NeurIPS 2021 • Bing-Jing Hsieh, Ping-Chun Hsieh, Xi Liu
While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy.
no code implementations • 22 Feb 2021 • Jyun-Li Lin, Wei Hung, Shang-Hsuan Yang, Ping-Chun Hsieh, Xi Liu
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints.
no code implementations • NeurIPS Workshop ICBINB 2020 • Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu
Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.
no code implementations • 8 Oct 2020 • Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.
no code implementations • 27 Jan 2020 • Xi Liu, Li Li, Ping-Chun Hsieh, Muhe Xie, Yong Ge, Rui Chen
With the explosive growth of online products and content, recommendation techniques have been considered as an effective tool to overcome information overload, improve user experience, and boost business revenue.
no code implementations • 2 Jul 2019 • Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar
To choose the bias-growth rate $\alpha(t)$ in RBMLE, we reveal the nontrivial interplay between $\alpha(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits.
no code implementations • 14 Nov 2018 • Xi Liu, Ping-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, Xidao Wen
Thus the approach of adapting the existing methods to the streaming environment faces non-trivial technical challenges.
1 code implementation • 29 Oct 2018 • Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar
Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection.