Dongzhi Jiang

I'm a PhD student in Multimedia Lab, CUHK, supervised by Prof. Hongsheng Li.

Previously, I obtained my bachelor degree from the Computer Science Department, Harbin Institute of Technology, Shenzhen. I was supervised by Prof. Jingyong Su there.

Email  /  Google Scholar  /  Github

profile photo

Research

I am interested in AIGC. Currently, I am focusing on Text-to-Image Diffusion models and Multimodal Large Language Model (MLLM).

💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li
arxiv, 2024
arXiv / website / GitHub

A fine-tuning strategy to address the text-to-image misalignment issue with image-to-text concept matching. The training data only includes text prompts.

Logo MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang*, Dongzhi Jiang*, Yichi Zhang*, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Aojun Zhou, Kai-Wei Chang, Peng Gao, Hongsheng Li
arxiv, 2024
arXiv / website / dataset / GitHub

We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in deducing answers without truly interpreting the input diagrams.

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zhuofan Zong*, Dongzhi Jiang*, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu
ICCV, 2023
arXiv / GitHub

We design a plug-and-play approach to enhance the temporal modeling capability of BEV detectors with no additional inference cost.


The source code of this website is adapted from Jon Barron's website.