publications | Yipu Chen 陈逸璞

* denotes equal contribution

2026

REFINE-DP: Diffusion Policy Fine-tuning for Humanoid Loco-manipulation via Reinforcement Learning

Zhaoyuan Gu* , Yipu Chen*, Zimeng Chai* , Alfred Cueva , Thong Nguyen , Yifan Wu , Huishu Xue , Minji Kim , Isaac Legene , Fukang Liu , Matthew Kim , Ayan Barula , Yongxin Chen , and Ye Zhao

arXiv preprint, Mar 2026

Abs arXiv PDF Video Website

Humanoid loco-manipulation requires coordinated high-level motion plans with stable, low-level whole-body execution under complex robot-environment dynamics and long-horizon tasks. While diffusion policies (DPs) show promise for learning from demonstrations, deploying them on humanoids poses critical challenges: the motion planner trained offline is decoupled from the low-level controller, leading to poor command tracking, compounding distribution shift, and task failures. The common approach of scaling demonstration data is prohibitively expensive for high-dimensional humanoid systems. To address this challenge, we present REFINE-DP (REinforcement learning FINE-tuning of Diffusion Policy), a hierarchical framework that jointly optimizes a DP high-level planner and an RL-based low-level loco-manipulation controller. The DP is fine-tuned via a PPO-based diffusion policy gradient to improve task success rate, while the controller is simultaneously updated to accurately track the planner’s evolving command distribution, reducing the distributional mismatch that degrades motion quality. We validate REFINE-DP on a humanoid robot performing loco-manipulation tasks, including door traversal and long-horizon object transport. REFINE-DP achieves an over 90% success rate in simulation, even in out-of-distribution cases not seen in the pre-trained data, and enables smooth autonomous task execution in real-world dynamic environments. Our proposed method substantially outperforms pre-trained DP baselines and demonstrates that RL fine-tuning is key to reliable humanoid loco-manipulation.

2025

Geometry-Aware Demonstration Augmentation for Scalable Robotic Manipulation

Michael Sha , Yipu Chen, Minghao Guo , Yunsheng Tian , Chuang Gan , and Wojciech Matusik

In Workshop on Learning Meets Model-Based Methods for Contact-Rich Manipulation, ICRA , May 2025

Abs PDF

Generalizing robot behaviours across object geometries remains a core challenge in manipulation. We propose a framework for geometry-aware demonstration augmentation that enables robust policy learning under shape variation. Starting from a single input mesh, our fully automatic geometry-augmentation pipeline produces a rich spectrum of shape variants by applying controlled stretching, compression, and local bulging while strictly preserving contact surfaces and support regions, so each instance remains structurally sound and manipulation-ready. Crucially, the augmentation process intrinsically guarantees a dense point-wise correspondence between the original and deformed geometries, allowing direct transfer of demonstration trajectories to every new shape. This yields a scalable foundation for extensive, grounded datasets of augmented demonstrations without additional human effort. We validate the pipeline on a diverse suite of household and industrial objects, generating varied shape augmentations and replaying pick-and-place demonstrations.
Robust-Locomotion-By-Logic: Perturbation-Resilient Bipedal Locomotion via Signal Temporal Logic Guided Model Predictive Control

Zhaoyuan Gu , Yuntian Zhao , Yipu Chen, Rongming Guo , Jennifer K. Leestma , Gregory S. Sawicki , and Ye Zhao

IEEE Transactions on Robotics, May 2025

Abs arXiv PDF Video Website

This study introduces a robust planning framework that utilizes a model predictive control (MPC) approach, enhanced by incorporating signal temporal logic (STL) specifications. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion, specifically designed to handle both translational and orientational perturbations. Existing recovery strategies often struggle with reasoning complex task logic and evaluating locomotion robustness systematically, making them susceptible to failures caused by inappropriate recovery strategies or lack of robustness. To address these issues, we design an analytical stability metric for bipedal locomotion and quantify this metric using STL specifications, which guide the generation of recovery trajectories to achieve maximum robustness degree. To enable safe and computational-efficient crossed-leg maneuver, we design data-driven self-leg-collision constraints that are 1000 times faster than the traditional inverse-kinematics-based approach. Our framework outperforms a state-of-the-art locomotion controller, a standard MPC without STL, and a linear-temporal-logic-based planner in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. In addition, the Cassie bipedal robot achieves robust performance under horizontal and orientational perturbations, such as those observed in ship motions. These environments are validated in simulations and deployed on hardware. Furthermore, our proposed method demonstrates versatility on stepping stones and terrain-agnostic features on inclined terrains.

2024

Genesis: A Universal and Generative Physics Engine for Robotics and Beyond

Genesis Authors

Dec 2024

Code Website
Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies

Yipu Chen*, Haotian Xue* , and Yongxin Chen

In The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS) , Dec 2024

Abs arXiv PDF Code Website

Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC). Diffusion policies (DP) based on DMs have elevated BC performance to new heights, demonstrating robust efficacy across diverse tasks, coupled with their inherent flexibility and ease of implementation. Despite the increasing adoption of DP as a foundation for policy generation, the critical issue of safety remains largely unexplored. While previous attack attempts have targeted deep policy networks, DP used diffusion models as the policy network, making it ineffective to be attacked using previous methods because of its chained structure and randomness injected. In this paper, we undertake a comprehensive examination of DP safety concerns by introducing adversarial scenarios, encompassing offline and online attacks, and global and patch-based attacks. We propose DP-Attacker, a suite of algorithms that can craft effective adversarial attacks across all aforementioned scenarios. We conduct attacks on pre-trained diffusion policies across various manipulation tasks. Through extensive experiments, we demonstrate that DP-Attacker has the capability to significantly decrease the performance of DP for all scenarios. Particularly in offline scenarios, DP-Attacker can generate highly transferable perturbations applicable to all frames. Furthermore, we illustrate the creation of adversarial physical patches that, when applied to the environment, effectively deceive the model. Video results are put in: https://sites.google.com/view/diffusion-policy-attacker.
Walking-by-Logic: Signal Temporal Logic-Guided Model Predictive Control for Bipedal Locomotion Resilient to External Perturbations

Zhaoyuan Gu , Rongming Guo , William Yates , Yipu Chen, Yuntian Zhao , and Ye Zhao

In 2024 IEEE International Conference on Robotics and Automation (ICRA) , May 2024

Abs PDF

This study proposes a novel planning framework based on a model predictive control formulation that incorporates signal temporal logic (STL) specifications for task completion guarantees and robustness quantification. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion push recovery, where the robot experiences unexpected disturbances. Existing recovery strategies often struggle with complex task logic reasoning and locomotion robustness evaluation, making them susceptible to failures caused by inappropriate recovery strategies or insufficient robustness. To address this issue, the STL-guided framework generates optimal and safe recovery trajectories that simultaneously satisfy the task specification and maximize the locomotion robustness. Our framework outperforms a state-of-the-art locomotion controller in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. Furthermore, it demonstrates versatility in tasks such as locomotion on stepping stones, where the robot must select from a set of disjointed footholds to maneuver successfully.