T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Kaiyue Sun1 Kaiyi Huang1 Xian Liu2 Yue Wu3 Zihan Xu1 Zhenguo Li3 Xihui Liu1

1 The University of Hong Kong 2 The Chinese University of Hong Kong 3 Huawei Noah's Ark Lab

T2V-CompBench Prompt Suite.

 

Overview:

Introduction

 

 

Evaluation Metrics


MLLM-based evaluation metrics for consistent and dynamic attribute binding, action binding and object interactions.
Detection-based evaluation metrics for spatial relationships and object interactions.
Tracking-based evaluation metrics for motion binding.

 

Evaluation Results

 

Benchmarking open-sourced T2V Models with a radar chart.

T2V-CompBench evaluation results with proposed metrics for 20 T2V generation models (13 open-source models and 7 commercial models).
Bold stands for the best score, red indicates the best score across 7 commercial models, yellow indicates the best score across 13 open-sourced models.


 

Bibtex


   @misc{sun2024t2vcompbenchcomprehensivebenchmarkcompositional,
      title={T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation}, 
      author={Kaiyue Sun and Kaiyi Huang and Xian Liu and Yue Wu and Zihan Xu and Zhenguo Li and Xihui Liu},
      year={2024},
      eprint={2407.14505},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.14505}, 
    }