Yunhao Gou's Homepage

Email / CV / Github / Google Scholar

About Me

I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised jointly by Prof. James T. Kwok and Prof. Yu Zhang. I am also currently an intern researcher at Tongyi Lab (Qwen-VL). Previously, I was an undergraduate student majoring in Software Engineering in University of Electronic Science and Technology of China (UESTC).

My current research interests include:

Post-training of MLLMs/LLMs: Corrupted but not Broken, EMOVA, ECSO, MoTE
New paradigms of MLLMs: RAPID, MoCLE
Vision and Language Representation Learning: EPIC, HGR-Net, RSAN

News

[2026.03] One paper (MoCLE) accepted by TIP 2026!
[2026.01] One paper (RAPID) accepted by ICLR 2026! See you in Rio de Janeiro, Brazil!
[2025.08] One paper (Corrupted but not Broken) accepted by EMNLP 2025 (Main) as oral presentation! See you in Suzhou!
[2025.05] One paper (MoTE) accepted by ACL 2025! See you in Vienna!
[2025.02] One paper (EMOVA) accepted by CVPR 2025! See you in Nashville!
[2025.02] One paper (ECSO) accepted by ECCV 2024! See you in Milano!
[2024.03] Code and checkpoints of MoCLE and ECSO have been released. Welcome to try!
[2024.03] Our work ECSO, the first work that makes MLLM safe without neither training nor any external models, is on Arxiv!
[2023.12] Our work MoCLE is reported by QbitAI
[2023.12] Our work MoCLE, the first MLLM with MoE architecture for instruction customization and generalization, is on Arxiv!
[2023.02] One paper accepted by CVPR 2023!
[2022.09] Join HKUST CSE for PhD study.
[2022.01] Joint Bytedance AILab as an intern researcher.
[2022.07] One paper accepted by ECCV 2022!
[2021.07] One paper accepted by CIKM 2021!

Selected Publications

Full publication list on Google Scholar. (* denotes equal contribution)

Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang

Scaling reasoning MLLMs via adopting any advanced LLM reasoners during inference time!

International Conference on Learning Representations (ICLR), 2026

[PDF] [Code] GitHub Repo stars

Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning

Yunhao Gou*, Hansi Yang*, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu, James T Kwok, Yu Zhang.

Conference on Empirical Methods in Natural Language Processing (EMNLP Main, Oral)

, 2025. [PDF]

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

1) Make MLLM safe without neither training nor any external models!

2) Free data engine for MLLM alignment on its own!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page]

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu*, Yunhao Gou*, Kai Chen*, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

Annual Meeting of the Association for Computational Linguistics (ACL), 2025.

[PDF]

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

First MLLM with MoE for instruction customization and generalization!

IEEE Transactions on Image Processing (TIP), 2026.

[PDF] [Project page] [Wechat Post] [Talk]

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen*, Yunhao Gou*, Runhui Huang*, Zhili Liu*, Daxin Tan* and other 26 authors

Fully open-sourced Omni-modal LLMs with SoTA vision-language and speech abilities!

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025

[PDF] [Webpage] [Talk] [Talk (Chinese)] [Wechat Post] [Code] GitHub Repo stars

Leveraging per Image-Token Consistency for Vision-Language Pre-training

Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang.

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.

[PDF]

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

Kai Yi, Xiaoqian Shen, Yunhao Gou, Mohamed Elhoseiny.

European Conference on Computer Vision (ECCV), 2022.

[PDF] [Project page]

Region semantically aligned network for zero-shot learning

Ziyang Wang*, Yunhao Gou*, Jingjing Li, Yu Zhang, Yang Yang

International Conference on Information & Knowledge Management (CIKM), 2021.

[PDF]

Academic Services

Reviewer:

Conference: NeurIPS 2025, EMNLP 2025 (ARR May), ACL 2025 (ARR Feb), ICML 2025, ECCV 2022, AAAI 2024.

Talks

[TechBeat Online] Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning. [Recording]

Experiences

National University of Singapore

July 2019 - Aug. 2019

International exchange student

King Abdullah University of Science and Technology

July. 2021 - Dec. 2021

Visiting student, working with Prof. Mohammed Elhoseiny

ByteDance AI Lab

Jan. 2022 - Mar. 2023

Research Intern, working with Tom Ko

Tongyi Lab (Qwen-VL group)

Jan. 2026 - Now

Research Intern, working with Shuai Bai

Selected Awards

Research Travel Grant HKUST

2023

Postgraduate Scholarship HKUST

2022

Oversea Visiting Student Stipend of UESTC

2019

National Scholarship

2019

Yunhao Gou (苟耘豪)

Ph.D. Candidate @ HKUST