Jingwen Gu

I am a senior undergraduate student at Cornell University. My research ranges over RLHF, RLVR, and robotics. I am fortunate enough to have worked with Prof. Wen Sun, Prof. Abhishek Gupta, and Prof. Timur Dogan. My ultimate research objective is to develop reinforcement learning paradigms that enable agents to think, feel, and act in interesting ways.

Email | Google Scholar | GitHub | CV

News

September 2025 Paper ReasonFlux-PRM was accepted to NeurIPS 2025!
August 2025 Presented VHM at Building Simulation 2025!
June 2025 Paper on CoT Self-Correction was accepted to the PUT workshop at ICML 2025!
June 2025 Started my research internship at WEIRD Lab, University of Washington, advised by Prof. Abhishek Gupta!
May 2025 Paper VHM was accepted to Building Simulation 2025!

Publications

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Jingwen Gu* (equal contribution), Yiting He*, Zhishuai Liu*, Pan Xu

Under Review

[Paper]

ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Jiaru Zou*, Ling Yang*, Jingwen Gu* (equal contribution), Jiahao Qiu, Ke Shen, Jingrui He, Mengdi Wang

NeurIPS 2025

[Paper] | [Code]

Learning to Self-Correct through Chain-of-Thought Verification

Bradley Guo, Jingwen Gu, Jin Peng Zhou, Wen Sun

ICML 2025, 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

[Paper]

Orchestrating LLMs with Different Personalizations

Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q. Weinberger, Wen Sun

arXiv preprint, 2024

[Paper]

Virtual Horizon Method: Fast shading calculations for UBEM using lidar data rasterization

Jingwen Gu, Timur Dogan

Building Simulation, 2025

[Paper]

Projects

KV-Cache Management with Reinforcement Learning

Course project for CS4756: Robot Learning, advised by Prof.Sanjiban Choudhury. Devised a method that trains an RL policy to intelligently compress the KV-cache of a transformer LLM during inference, enabling near-constant space usage for LLM deployment.

[Report] | [Code]

Jingwen Gu I am a senior undergraduate student at Cornell University. My research ranges over RLHF, RLVR, and robotics. I am fortunate enough to have worked with Prof. Wen Sun, Prof. Abhishek Gupta, and Prof. Timur Dogan. My ultimate research objective is to develop reinforcement learning paradigms that enable agents to think, feel, and act in interesting ways. Email \| Google Scholar \| GitHub \| CV
News September 2025 Paper ReasonFlux-PRM was accepted to NeurIPS 2025! August 2025 Presented VHM at Building Simulation 2025! June 2025 Paper on CoT Self-Correction was accepted to the PUT workshop at ICML 2025! June 2025 Started my research internship at WEIRD Lab, University of Washington, advised by Prof. Abhishek Gupta! May 2025 Paper VHM was accepted to Building Simulation 2025!
Publications Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation Jingwen Gu* (equal contribution), Yiting He, Zhishuai Liu, Pan Xu Under Review [Paper] ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Jiaru Zou, Ling Yang, Jingwen Gu* (equal contribution), Jiahao Qiu, Ke Shen, Jingrui He, Mengdi Wang NeurIPS 2025 [Paper] \| [Code] Learning to Self-Correct through Chain-of-Thought Verification Bradley Guo, Jingwen Gu, Jin Peng Zhou, Wen Sun ICML 2025, 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT) [Paper] Orchestrating LLMs with Different Personalizations Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q. Weinberger, Wen Sun arXiv preprint, 2024 [Paper] Virtual Horizon Method: Fast shading calculations for UBEM using lidar data rasterization Jingwen Gu, Timur Dogan Building Simulation, 2025 [Paper]
Projects KV-Cache Management with Reinforcement Learning Course project for CS4756: Robot Learning, advised by Prof.Sanjiban Choudhury. Devised a method that trains an RL policy to intelligently compress the KV-cache of a transformer LLM during inference, enabling near-constant space usage for LLM deployment. [Report] \| [Code]