Jiahan (Han) Zhang
🎓 Master's student in Computer Science
🏛️ Johns Hopkins University
About Me
I am a Master’s student in Computer Science at Johns Hopkins University. I have been fortunate to collaborate with Prof. Lei Feng at Southeast University, Prof. Alan Yuille and Ph.D. candidate Jieneng Chen at Johns Hopkins University.
At present, I focus on integrating richer real-world knowledge into generative models. Earlier, I worked on adversarial robustness and weakly supervised learning for large multimodal models, which laid the groundwork for my current research.
Research Interests
My research focuses on scalable world models and generative models for embodied agents. I am especially interested in:
-
How should we evaluate the effectiveness and robustness of world models for embodied agents?
Unlike entertainment applications that emphasize visual appearance, which metrics best capture real utility for embodied agents—visual fidelity, physical accuracy, embodied task performance, or something else?
-
How can we incorporate more real-world knowledge (physics, semantics, dynamics) into generative models?
Current video generation models are trained on diverse web-scale videos that encode real-world knowledge in visual form. Are these visual priors sufficient for physically accurate modeling? If not, how can models learn stronger physical priors from web data, and how can we support this process?
-
How do we transform a video generation model into a unified, scalable world model?
How can we leverage web-scale video data and video generation models to build a unified, scalable world model for diverse tasks (e.g., motion generation, robotic policy learning, and 3d reconstruction)?
If you share these interests, please feel free to contact me by email!
Selected Publications & Manuscripts
Recent / By Year * denotes equal contribution
2025
World-in-World: World Models in a Closed-Loop World
Jiahan Zhang*, Muqing Jiang*, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, Jieneng Chenâ€
Under review 2025
By grounding evaluation in embodied task success rather than visual metrics, World-in-World provides a principled yardstick and a comprehensive framework for assessing the real-world utility of generative world models in embodied settings.
Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models
Under review 2025
We propose Proxy Targeted Attack (PTA), enabling adversarial examples to generalize to semantically similar targets while remaining on-manifold to evade anomaly detection, revealing a new vulnerability in large multimodal models.
2024
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
ICML 2024 · Oral (top 1.4%)
Candidate Pseudolabel Learning (CPL) fine-tunes VLMs with limited labeled data using candidate label sets and partial-label losses, achieving consistent gains over hard pseudolabeling across nine datasets and three learning paradigms.
All Publications & Manuscripts
-
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
ICML 2024 · Oral (top 1.4%)
-
Influence-Based Fair Selection for Sample-Discriminative Backdoor Attack
AAAI 2025 · Oral
-
World-in-World: World Models in a Closed-Loop World
Jiahan Zhang*, Muqing Jiang*, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du, Jieneng Chen†(2025)
Under review
-
Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models
Under review
-
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
Jiahao Wang, Luoxin Ye, TaiMing Lu, Junfei Xiao, Jiahan Zhang, Yuxiang Guo, Xijun Liu, Rama Chellappa, Cheng Peng, Alan Yuille, Jieneng Chen†(2025)
Under review
Contact
jhanzhang01@gmail.com
·
GitHub
·
LinkedIn
·
Google Scholar
© Jiahan (Han) Zhang · Back to top