Search Results for author: Ping-Chun Hsieh

Found 21 papers, 2 papers with code

Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits

no code implementations • ICML 2020 • Xi Liu, Ping-Chun Hsieh, Yu Heng Hung, Anirban Bhattacharya, P. Kumar

We propose a new family of bandit algorithms, that are formulated in a general way based on the Biased Maximum Likelihood Estimation (BMLE) method originally appearing in the adaptive control literature.

Multi-Armed Bandits

Paper
Add Code

Image Deraining via Self-supervised Reinforcement Learning

no code implementations • 27 Mar 2024 • He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, Ping-Chun Hsieh, Chung-Chi Tsai

The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain).

Denoising Dictionary Learning +3

Paper
Add Code

Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion

no code implementations • 19 Mar 2024 • Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng

(iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors.

Imitation Learning Inductive Bias +1

Paper
Add Code

PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping

no code implementations • 19 Dec 2023 • Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, I-Chen Wu

Our findings highlight the $O(1/\sqrt{T})$ min-iterate convergence rate specifically in the context of neural function approximation.

Paper
Add Code

Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

no code implementations • 18 Oct 2023 • Yen-ju Chen, Nai-Chieh Huang, Ping-Chun Hsieh

In response to this gap, we adapt the celebrated Nesterov's accelerated gradient (NAG) method to policy optimization in RL, termed \textit{Accelerated Policy Gradient} (APG).

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs

no code implementations • 17 Oct 2023 • Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar

We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping.

Model-based Reinforcement Learning

Paper
Add Code

Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games

no code implementations • 27 Sep 2023 • Kuo-Hao Ho, Ping-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu

In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength.

Decision Making FPS Games +2

Paper
Add Code

Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees

no code implementations • 10 Dec 2022 • Hsin-En Su, Yen-ju Chen, Ping-Chun Hsieh, Xi Liu

In this paper, we rethink off-policy learning via Coordinate Ascent Policy Optimization (CAPO), an off-policy actor-critic algorithm that decouples policy improvement from the state distribution of the behavior policy without using the policy gradient.

counterfactual

Paper
Add Code

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

no code implementations • 6 Dec 2022 • Wei Hung, Bo-Kai Huang, Ping-Chun Hsieh, Xi Liu

Many real-world continuous control problems are in the dilemma of weighing the pros and cons, multi-objective reinforcement learning (MORL) serves as a generic framework of learning control policies for different preferences over objectives.

Continuous Control Multi-Objective Reinforcement Learning

Paper
Add Code

Neural Frank-Wolfe Policy Optimization for Region-of-Interest Intra-Frame Coding with HEVC/H.265

no code implementations • 27 Sep 2022 • Yung-Han Ho, Chia-Hao Kao, Wen-Hsiao Peng, Ping-Chun Hsieh

Recently, the dual-critic design is proposed to update the actor by alternating the rate and distortion critics.

Reinforcement Learning (RL)

Paper
Add Code

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

no code implementations • 8 Mar 2022 • Yu-Heng Hung, Ping-Chun Hsieh

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.

Multi-Armed Bandits

Paper
Add Code

Neural PPO-Clip Attains Global Optimality: A Hinge Loss Perspective

no code implementations • 26 Oct 2021 • Nai-Chieh Huang, Ping-Chun Hsieh, Kuo-Hao Ho, Hsuan-Yu Yao, Kai-Chun Hu, Liang-Chun Ouyang, I-Chen Wu

Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and one example is the proximal policy optimization algorithm with a clipped surrogate objective (PPO-Clip), which has been popularly used in deep reinforcement learning due to its simplicity and effectiveness.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

1 code implementation • NeurIPS 2021 • Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai

This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices.

Paper
Code

Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization

no code implementations • NeurIPS 2021 • Bing-Jing Hsieh, Ping-Chun Hsieh, Xi Liu

While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy.

Bayesian Optimization Few-Shot Learning

Paper
Add Code

Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement Learning via Frank-Wolfe Policy Optimization

no code implementations • 22 Feb 2021 • Jyun-Li Lin, Wei Hung, Shang-Hsuan Yang, Ping-Chun Hsieh, Xi Liu

Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications, such as scheduling in networked systems with resource constraints and control of a robot with kinematic constraints.

Reinforcement Learning (RL) Scheduling

Paper
Add Code

Rethinking Deep Policy Gradients via State-Wise Policy Improvement

no code implementations • NeurIPS Workshop ICBINB 2020 • Kai-Chun Hu, Ping-Chun Hsieh, Ting Han Wei, I-Chen Wu

Deep policy gradient is one of the major frameworks in reinforcement learning, and it has been shown to improve parameterized policies across various tasks and environments.

Policy Gradient Methods Value prediction

Paper
Add Code

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

no code implementations • 8 Oct 2020 • Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems.

Computational Efficiency

Paper
Add Code

Developing Multi-Task Recommendations with Long-Term Rewards via Policy Distilled Reinforcement Learning

no code implementations • 27 Jan 2020 • Xi Liu, Li Li, Ping-Chun Hsieh, Muhe Xie, Yong Ge, Rui Chen

With the explosive growth of online products and content, recommendation techniques have been considered as an effective tool to overcome information overload, improve user experience, and boost business revenue.

Knowledge Distillation Multi-Task Learning +2

Paper
Add Code

Exploration Through Reward Biasing: Reward-Biased Maximum Likelihood Estimation for Stochastic Multi-Armed Bandits

no code implementations • 2 Jul 2019 • Xi Liu, Ping-Chun Hsieh, Anirban Bhattacharya, P. R. Kumar

To choose the bias-growth rate $\alpha(t)$ in RBMLE, we reveal the nontrivial interplay between $\alpha(t)$ and the regret bound that generally applies in both the Exponential Family as well as the sub-Gaussian/Exponential family bandits.

Multi-Armed Bandits

Paper
Add Code

Streaming Network Embedding through Local Actions

no code implementations • 14 Nov 2018 • Xi Liu, Ping-Chun Hsieh, Nick Duffield, Rui Chen, Muhe Xie, Xidao Wen

Thus the approach of adapting the existing methods to the streaming environment faces non-trivial technical challenges.

Clustering Multi-class Classification +1

Paper
Add Code

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

1 code implementation • 29 Oct 2018 • Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection.

Decision Making Multi-Armed Bandits

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.