T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Kaiyi Huang1 Kaiyue Sun1 Enze Xie2 Zhenguo Li2 Xihui Liu1

1 The University of Hong Kong 2 Huawei Noah's Ark Lab

T2I-CompBench Statistics Properties.

 


 

 

 

Abstract

Despite the stunning ability to generate high-quality images by recent text-to-image generation models, current approaches often fail to compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image synthesis, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-image generation models, and explore the potential of multimodal LLM for evaluation. We propose an improved baseline, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and validate the effectiveness of our proposed evaluation metrics and GORS approach.

 

Introduction

 

Evaluation


BLIP-VQA for attribute binding evaluation, UniDet for spatial relationship evaluation, and MiniGPT4-CoT as a potential unified metric.

 

Method


GORS for Compositional Text-to-image Generation.

 

Evaluation Results

 

Qualitative Comparison


 

Bibtex


   @article{huang2023t2icompbench,
	  title={T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation},
	  author={Huang, Kaiyi and Sun, Kaiyue and Xie, Enze and Li, Zhenguo and Liu, Xihui},
	  journal={Advances in Neural Information Processing Systems},
	  volume={36},
	  pages={78723--78747},
	  year={2023}
	}     

    @article{huang2025t2icompbench++,
	author={Huang, Kaiyi and Duan, Chengqi and Sun, Kaiyue and Xie, Enze and Li, Zhenguo and Liu, Xihui},
	        journal={IEEE Transactions on Pattern Analysis Machine Intelligence },
	        title={{T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-Image Generation }},
	        year={5555},
	        number={01},
	        ISSN={1939-3539},
	        pages={1-17},
	        url = {https://doi.ieeecomputersociety.org/10.1109/TPAMI.2025.3531907},
	        publisher={IEEE Computer Society},
	        address={Los Alamitos, CA, USA},
	        month=jan,
	}