Zhao (Joey) Zhang (张 钊)

I'm currently working as a Senior Research Scientist at Canva Canva , with a focus on Image Generation and Multimodal LLMs. I completed my Master's degree at Nankai University, where I was under the supervision of Ming-Ming Cheng. Please feel free to contact me at (📮: zzhang🥳mail🔅nankai🔅edu🔅cn)

You can also find me in and

Recent News

  • [🔝/2026] Canva We're seeking interns passionate about Image Generation / MLLM to collaborate on impactful research projects. Please feel free to contact me via email.
  • [03/2026] Magic Layers Magic Layers is now live on Canva, turning flat or AI-generated images into fully editable, multi-layer designs.
  • [02/2026] Two papers were accepted by CVPR 2026. See you in Denver, Colorado!
  • [06/2025] We released CreatiPoster logo, an AI-driven graphic design generation system for multi-layer and editable compositions with strong visual appeal.
  • [05/2025] Our Unified-MLLM for image layer decomposition was was accepted by ICML 2025. The technical report will be released soon.
  • [01/2025] RelationLMM was accepted by TPAMI-25.
  • [12.2024] Our GRES is selected as an "Excellent Science & Technology Academic Paper" for the 2024 Shenzhen 4th Excellent Science & Technology Academic Paper Selection.
  • [04/2024] 🔥 Graphist logo was accepted by AAAI 2025. We have unleashed the potential of MLLM in graphic design.
  • [02/2024] One paper was accepted by CVPR 2024.
  • [09/2023] D-Cube was accepted by NeurIPS 2023.
  • [08/2023] We released Link-Context Learning for MLLMs as well as an interesting dataset ISEKAI.
  • [07/2023] GRES was accepted by ICCV 2023.
  • [06/2023] 🔥 We released Shikra logo, an awesome MLLM for Referential Dialogue.
  • [01/2023] One paper was accepted by TPAMI.
  • [08/2022] One paper was accepted by ECCV 2022.
  • [07/2022] Two papers was accepted by ACM MM 2022.
  • [07/2022] I am working as vision-language researcher in SenseTime Research.
  • [06/2022] One paper was accepted by CVMJ.
  • [03/2022] One paper was accepted by CVPR 2022 as oral presentation.
  • [12/2020] I am working as an intern in Tencent Youtu Lab.
  • [12/2020] One paper was accepted by TIP 2021.
  • [07/2020] One paper was accepted by ECCV 2020.
  • [05/2020] One paper was accepted by TNNLS 2020.
  • [04/2020] One paper was accepted by CVPR 2020.
  • [09/2019] I have joined the Media Computing Lab under the supervision of Prof. Ming-Ming Cheng!
  • [06/2019] I graduated from Yangzhou University, and received my bachelor degree.

Experiences

  • Senior Research Scientist
    2025 - Now
    Canva Original Research Exploration (CORE)
    Canva
  • Expert Researcher in Vision & Language
    2023 - 2025
    Intelligent Creation
    ByteDance
  • Researcher in Vision & Language
    2022 - 2023
    Smart City Group (SCG)
    SenseTime
  • Internship in Computer Vision
    2020 - 2021
    Youtu Lab
    CSIG, Tencent
  • M.S. in Computer Science
    2019 - 2022
    Media Computing Lab (supervised by Prof Ming-Ming Cheng)
    School of Computer Science, Nankai University
  • B.S. in Computer Science
    2015 - 2019
    College of Innovation and Entrepreneurship (Elite College)
    School of Information Engineering, Yangzhou University

Applications

Magic Layers

Now live on

Magic Layers turns flat or AI-generated images into fully editable, multi-layer designs, enabling users to modify text, objects, and layouts without regenerating the whole image.

Related paper: Masked Region Transformer, CVPR 2026.

Layout to Design

Available on

Layout to Design generates complete and visually appealing designs from user-provided text, assets, and an initial layout, helping users quickly create polished posters, product displays, and promotional content.

Related papers: CreatiPoster and CreatiDesign.

Layout to Design preview

Selected Publications

    Magic Layers logo Masked Region Transformer for Layered Image
    Generation and Editing at Scale

    Zhicong Tang, Jingye Chen, Zhao Zhang, Mohan Zhou, Yuchi Liu
    Yifan Pu, Yalong Bai, Ethan Smith, Yuhui Yuan

    CVPR 2026   [Paper]

    Layton: Latent Consistency Tokenizer
    for 1024-pixel Image Reconstruction and Generation by 256 Tokens

    Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang
    CVPR 2026   [PDF] [Project] [bib]

    logo CreatiPoster: Towards Editable and Controllable
    Multi-Layer Graphic Design Generation

    Zhao Zhang, Yutao Cheng, Dexiang Hong, Maoke Yang
    Gonglei Shi, Lei Ma, Hui Zhang, Jie Shao, Xinglong Wu

    arXiv 2025   [Repo] [Paper]

    CreatiDesign: A Unified Multi-Conditional Diffusion Transformer
    for Creative Graphic Design

    Hui Zhang, Dexiang Hong, Maoke Yang, Yutao Cheng, Zhao Zhang
    Jie Shao, Xinglong Wu, Zuxuan Wu, and Yu-Gang Jiang

    ICLR 2026   [Repo] [Project page] [Paper] [bib]

    Decomposition of Graphic Design with Unified Multimodal Model
    Hui Nie, Zhao Zhang, Yutao Cheng, Maoke Yang, Gonglei Shi, Qingsong Xie, Jie Shao, Xinglong Wu
    ICML 2025   [Repo Coming Soon]

    RelationLMM: Large Multimodal Model
    as Open and Versatile Visual Relationship Generalist

    Chi Xie, Shuang Liang, Jie Li, Zhao Zhang, Feng Zhu, Rui Zhao
    TPAMI 2025   [Paper] [bib]

    logo Graphic Design with Large Multimodal Model
    Yutao Cheng*, Zhao Zhang*, Maoke Yang*, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao
    AAAI 2025   [PDF] [Project] [bib]

    Link-Context Learning for Multimodal LLMs
    Yan Tai, Weichen Fan, Zhao Zhang, Ziwei Liu
    CVPR 2024   [PDF] [Code] [bib]

    logo Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic
    Keqin Chen, Zhao Zhang*, Weili Zeng, Richong Zhang, Feng Zhu, Rui Zhao
    arXiv 2023   [PDF] [Code] [bib]

    Described Object Detection: Liberating Object Detection with Flexible Expressions
    Chi Xie*, Zhao Zhang*, Yixuan Wu, Feng Zhu, Rui Zhao, Shuang Liang
    NeurIPS 2023   [PDF] [Code] [bib]

    Advancing Referring Expression Segmentation Beyond Single Image
    Yixuan Wu*, Zhao Zhang*, Chi Xie, Feng Zhu, Rui Zhao
    ICCV 2023   [PDF] [Code] [bib]

    User-Oriented Interactive Style Transfer
    Zheng Lin, Zhao Zhang, Kang-Rui Zhang, Bo Ren, Ming-Ming Cheng
    CVMJ 2025   [PDF] [Code] [中译版] [bib]

    Image Harmonization by Matching Regional References
    Ziyue Zhu, Zhao Zhang, Zheng Lin, Ruiqi Wu, Chunle Guo
    arXiv   [PDF] [Code] [中译版] [bib]

    Co-Salient Object Detection with Co-Representation Purification
    Ziyue Zhu*, Zhao Zhang*, Zheng Lin, Xing Sun, Ming-Ming Cheng
    TPAMI 2023   [PDF] [Code] [中译版] [bib]

    PAC-Net: Highlight Your Video via History Preference Modeling
    Hang Wang, Penghao Zhou, Chong Zhou, Zhao Zhang, Xing Sun
    ECCV 2022   [PDF] [bib]

    Multi-Mode Interactive Image Segmentation
    Zheng Lin, Zhao Zhang*, Ling-Hao Han, Shao-Ping Lu
    ACM MM 2022   [PDF] [Code] [中译版]

    KnifeCut: Refining Thin Part Segmentation with Cutting Lines
    Zheng Lin, Zheng-Peng Duan, Zhao Zhang, Chunle Guo, Ming-Ming Cheng
    ACM MM 2022 (Oral)   [PDF] [Code] [中译版]

    Sequential Interactive Image Segmentation
    Zheng Lin, Zhao Zhang, Zi-Yue Zhu, Deng-Ping Fan, Xia-Lei Liu
    CVMJ 2022   [PDF] [Code] [中译版]

    FocusCut: Diving into a Focus View in Interactive Segmentation
    Zheng Lin, Zheng-Peng Duan, Zhao Zhang, Chun-Le Guo, Ming-Ming Cheng
    CVPR 2022 (Oral)   [PDF] [Code] [中译版] [bib]

    Bilateral Attention Network for RGB-D Salient Object Detection
    Zhao Zhang, Zheng Lin, Jun Xu, Wenda Jin, Shao-Ping Lu, and Deng-Ping Fan
    TIP 2021   [PDF] [Code] [bib]

    Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks
    Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, Ming-Ming Cheng
    TNNLS 2020   [PDF] [Code] [Project] [bib]

    Interactive Image Segmentation with First Click Attention
    Zheng Lin, Zhao Zhang, Lin-Zhuo Chen, Ming-Ming Cheng, Shao-Ping Lu
    CVPR 2020   [PDF] [Code] [Project] [bib]


Services

  • Reviewer for T-PAMI, TIP, TMI, TMM, TCSVT, CVPR, ICCV, ECCV, NeurIPS, EMNLP, ACMMM, etc.