Zhao Zhang (张 钊)

I'm currently working as a Vision-Language Researcher at ByteDance, with a focus on multimodal LLMs and their applications. I completed my Master's degree at Nankai University, where I was under the supervision of Ming-Ming Cheng. Please feel free to contact me at (📮: zzhang🥳mail🔅nankai🔅edu🔅cn)

You can also find me in and

Recent News

  • [03/2025] 🌳 We're seeking interns passionate about Graphic Design to collaborate on impactful research projects. Please feel free to contact me via email.
  • [05/2025] Our Unified-MLLM for image layer decomposition was was accepted by ICML-2025. The technical report will be released soon.
  • [01/2025] RelationLMM was accepted by TPAMI-25.
  • [12.2024] Our GRES paper is selected as an "Excellent Science & Technology Academic Paper" for the 2024 Shenzhen 4th Excellent Science & Technology Academic Paper Selection.
  • [04/2024] 🔥 Graphist was accepted by AAAI 2025. We have unleashed the potential of MLLM in graphic design.
  • [02/2024] One paper was accepted by CVPR 2024.
  • [09/2023] One paper was accepted by NeurlPS 2023.
  • [08/2023] We released Link-Context Learning for MLLMs as well as an interesting dataset ISEKAI.
  • [07/2023] GRES was accepted by ICCV 2023.
  • [06/2023] 🔥 We released Shikra, an awesome MLLM for Referential Dialogue.
  • [01/2023] One paper was accepted by TPAMI.
  • [08/2022] One paper was accepted by ECCV 2022.
  • [07/2022] Two papers was accepted by ACM MM 2022.
  • [07/2022] I am working as vision-language researcher in SenseTime Research.
  • [06/2022] One paper was accepted by CVMJ.
  • [03/2022] One paper was accepted by CVPR 2022 as oral presentation.
  • [03/2022] I have served as a reviewer for ECCV.
  • [11/2021] I have served as a reviewer for CVPR.
  • [04/2021] I have served as a reviewer for ACM MM.
  • [02/2021] I have served as a reviewer for ICCV.
  • [01/2021] I have served as a reviewer for IEEE TIP.
  • [12/2020] I am working as an intern in Tencent Youtu Lab.
  • [12/2020] One paper was accepted by TIP 2021.
  • [11/2020] I have served as a reviewer for IEEE TCSVT.
  • [08/2020] I have served as a reviewer for IEEE Transactions on Medical Imaging (TMI).
  • [07/2020] One paper was accepted by ECCV 2020.
  • [05/2020] One paper was accepted by TNNLS 2020.
  • [04/2020] One paper was accepted by CVPR 2020.
  • [09/2019] I have joined the Media Computing Lab under the supervision of Prof. Ming-Ming Cheng!
  • [06/2019] I graduated from Yangzhou University, and received my bachelor degree.
  • [02/2018] One paper was accepted by ICASSP 2018.
  • [07/2017] Two paper were accepted by ICONIP 2017, one of which is an oral paper.

Experiences

  • Expert Researcher in Vision & Language
    2023 - Now
    Intelligent Creation
    ByteDance
  • Researcher in Vision & Language
    2022 - 2023
    Smart City Group (SCG)
    SenseTime
  • Internship in Computer Vision
    2020 - 2021
    Youtu Lab
    CSIG, Tencent
  • M.S. in Computer Science
    2019 - 2022
    Media Computing Lab (supervised by Prof Ming-Ming Cheng)
    School of Computer Science, Nankai University
  • B.S. in Computer Science
    2015 - 2019
    College of Innovation and Entrepreneurship (Elite College)
    School of Information Engineerin, Yangzhou University

Publications

    Decomposition of Graphic Design with Large Multimodal Model
    Hui Nie, Zhao Zhang, Yutao Cheng, Maoke Yang, Gonglei Shi, Qingsong Xie, Jie Shao, Xinglong Wu
    ICML 2025   [Repo Coming Soon]

    Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
    Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang
    ArXiv   [PDF] [Project] [bib]

    RelationLMM: Large Multimodal Model as Open and Versatile Visual Relationship Generalist
    Chi Xie, Shuang Liang, Jie Li, Zhao Zhang, Feng Zhu, Rui Zhao
    TPAMI 2025   [Paper] [bib]

    Graphic Design with Large Multimodal Model
    Yutao Cheng*, Zhao Zhang*, Maoke Yang*, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao
    AAAI 2025   [PDF] [Project] [bib]

    Link-Context Learning for Multimodal LLMs
    Yan Tai, Weichen Fan, Zhao Zhang, Ziwei Liu
    CVPR 2024   [PDF] [Code] [bib]

    Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic
    Keqin Chen, Zhao Zhang*, Weili Zeng, Richong Zhang, Feng Zhu, Rui Zhao
    arXiv   [PDF] [Code] [bib]

    Described Object Detection: Liberating Object Detection with Flexible Expressions
    Chi Xie*, Zhao Zhang*, Yixuan Wu, Feng Zhu, Rui Zhao, Shuang Liang
    NeurlPS 2023   [PDF] [Code] [bib]

    Advancing Referring Expression Segmentation Beyond Single Image
    Yixuan Wu*, Zhao Zhang*, Chi Xie, Feng Zhu, Rui Zhao
    ICCV 2023   [PDF] [Code] [bib]

    User-Oriented Interactive Style Transfer
    Zheng Lin, Zhao Zhang, Kang-Rui Zhang, Bo Ren, Ming-Ming Cheng
    CVMJ 2025   [PDF] [Code] [中译版] [bib]

    Image Harmonization by Matching Regional References
    Ziyue Zhu, Zhao Zhang, Zheng Lin, Ruiqi Wu, Chunle Guo
    arXiv   [PDF] [Code] [中译版] [bib]

    Co-Salient Object Detection with Co-Representation Purification
    Ziyue Zhu*, Zhao Zhang*, Zheng Lin, Xing Sun, Ming-Ming Cheng
    TPAMI 2023   [PDF] [Code] [中译版] [bib]

    PAC-Net: Highlight Your Video via History Preference Modeling
    Hang Wang, Penghao Zhou, Chong Zhou, Zhao Zhang, Xing Sun
    ECCV 2022   [PDF] [bib]

    Multi-Mode Interactive Image Segmentation
    Zheng Lin, Zhao Zhang*, Ling-Hao Han, Shao-Ping Lu
    ACM MM 2022   [PDF] [Code] [中译版]

    KnifeCut: Refining Thin Part Segmentation with Cutting Lines
    Zheng Lin, Zheng-Peng Duan, Zhao Zhang, Chunle Guo, Ming-Ming Cheng
    ACM MM 2022 (Oral)   [PDF] [Code] [中译版]

    Sequential Interactive Image Segmentation
    Zheng Lin, Zhao Zhang, Zi-Yue Zhu, Deng-Ping Fan, Xia-Lei Liu
    CVMJ 2022   [PDF] [Code] [中译版]

    FocusCut: Diving into a Focus View in Interactive Segmentation
    Zheng Lin, Zheng-Peng Duan, Zhao Zhang, Chun-Le Guo, Ming-Ming Cheng
    CVPR 2022 (Oral)   [PDF] [Code] [中译版] [bib]

    Bilateral Attention Network for RGB-D Salient Object Detection
    Zhao Zhang, Zheng Lin, Jun Xu, Wenda Jin, Shao-Ping Lu, and Deng-Ping Fan
    TIP 2021   [PDF] [Code] [bib]

    Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks
    Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, Ming-Ming Cheng
    TNNLS 2020   [PDF] [Code] [Project] [bib]

    Interactive Image Segmentation with First Click Attention
    Zheng Lin, Zhao Zhang, Lin-Zhuo Chen, Ming-Ming Cheng, Shao-Ping Lu
    CVPR 2020   [PDF] [Code] [Project] [bib]

    Low Resolution Face Recognition and Reconstruction via Deep Canonical Correlation Analysis
    Zhao Zhang Yun-Hao Yuan, Xiao-bo Shen, Yun Li
    ICASSP 2018   [PDF] [bib]

    Face Hallucination and Recognition Using Kernel Canonical Correlation Analysis
    Zhao Zhang Yun-Hao Yuan, Yun Li, Bin Li, Ji-Peng Qiang
    ICONIP 2017 (Oral))   [PDF] [Slides] [bib]

    Supervised Deep Canonical Correlation Analysis for Multiview Feature Learning
    Yan Liu, Yun Li, Yun-Hao Yuan, Ji-Peng Qiang, Min Ruan, Zhao Zhang
    ICONIP 2017   [PDF] [bib]


Services

  • Reviewer for T-PAMI, TIP, TMI, TMM, TCSVT, CVPR, ICCV, ECCV, NeurIPS, EMNLP, ACMMM, etc.