Kai Chen avatar

Yunhao Gou (苟耘豪)

Ph.D. Candidate @ HKUST

Email  /  CV  /  Github  /  Google Scholar
About Me

I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised jointly by Prof. James T. Kwok and Prof. Yu Zhang. Previously, I was an undergraduate student majoring in Software Engineering in University of Electronic Science and Technology of China (UESTC).

My current research interests include:

News
  • [2025.05] One paper (MoTE) accepted by ACL 2025! See you in Vienna!
  • [2025.02] One paper (EMOVA) accepted by CVPR 2025! See you in Nashville!
  • [2025.02] One paper (ECSO) accepted by ECCV 2024! See you in Milano!
  • [2024.03] Code and checkpoints of MoCLE and ECSO have been released. Welcome to try!
  • [2024.03] Our work ECSO, the first work that makes MLLM safe without neither training nor any external models, is on Arxiv!
  • [2023.12] Our work MoCLE is reported by QbitAI
  • [2023.12] Our work MoCLE, the first MLLM with MoE architecture for instruction customization and generalization, is on Arxiv!
  • [2023.02] One paper accepted by CVPR 2023!
  • [2022.09] Join HKUST CSE for PhD study.
  • [2022.01] Joint Bytedance AILab as an intern researcher.
  • [2022.07] One paper accepted by ECCV 2022!
  • [2021.07] One paper accepted by CIKM 2021!
Selected Publications

Full publication list on Google Scholar. (* denotes equal contribution)

racro.png

Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang

Scaling reasoning MLLMs via adopting any advanced LLM reasoners during inference time!

Arxiv preprint, 2025

[PDF] [Demo] [Code] GitHub Repo stars
mote.png

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu*, Yunhao Gou*, Kai Chen*, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

Annual Meeting of the Association for Computational Linguistics (ACL), 2025.

[PDF]
val-ppl.png

Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning

Yunhao Gou*, Hansi Yang*, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu, James T Kwok, Yu Zhang.

Arxiv preprint, 2025.

[PDF]

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen*, Yunhao Gou*, Runhui Huang*, Zhili Liu*, Daxin Tan* and other 26 authors

Fully open-sourced Omni-modal LLMs with SoTA vision-language and speech abilities!

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025

[PDF] [Webpage] [Talk] [Talk (Chinese)] [Wechat Post] [Code] GitHub Repo stars
ecso.png

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

1) Make MLLM safe without neither training nor any external models!

2) Free data engine for MLLM alignment on its own!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page]
mocle.png

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

First MLLM with MoE for instruction customization and generalization!

Arxiv preprint, 2023.

[PDF] [Project page] [Wechat Post] [Talk]
epic.png

Leveraging per Image-Token Consistency for Vision-Language Pre-training

Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang.

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.

[PDF]
hgrnet.png

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny.

European Conference on Computer Vision (ECCV), 2022.

[PDF] [Project page]
rsan.png

Region semantically aligned network for zero-shot learning

Ziyang Wang*, Yunhao Gou*, Jingjing Li, Yu Zhang, Yang Yang

International Conference on Information & Knowledge Management (CIKM), 2021.

[PDF]
Academic Services
Reviewer:
  • Conference: NeurIPS 2025, EMNLP 2025 (ARR May), ACL 2025 (ARR Feb), ICML 2025, ECCV 2022, AAAI 2024.
Talks
  • [TechBeat Online] Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning. [Recording]
Experiences
National University of Singapore
July 2019 - Aug. 2019
International exchange student
King Abdullah University of Science and Technology
July. 2021 - Dec. 2021
Visiting student, working with Prof. Mohammed Elhoseiny
ByteDance AI Lab
Jan. 2022 - Mar. 2023
Research Intern, working with Tom Ko
Huawei Noah’s Ark Lab (AI Theory group)
Oct. 2023 - Now
Research Intern, working with Lanqing Hong
Selected Awards

Research Travel Grant HKUST

2023

Postgraduate Scholarship HKUST

2022

Oversea Visiting Student Stipend of UESTC

2019

National Scholarship

2019