🌍 OpenEvents V1: A Real-World News-Centric Dataset for Vision–Language Research
OpenEvents V1 is a large-scale, event-driven dataset built to bridge the gap between visual content and real-world news understanding. Collected from over a decade of reporting by two major global outlets — CNN and The Guardian — this dataset captures the dynamic intersection of images, events, and storytelling. It aims to foster research in context-aware image understanding, cross-modal retrieval, and news-grounded visual reasoning.
📰 What’s Inside?
- 200,000+ news articles with 400,000+ images
A rich and diverse database spanning 2011–2022, covering politics, climate, technology, culture, sports, and more. - 30,000+ annotated image–event caption pairs
Expertly curated and split into training, public test, and private test sets for benchmarking and experimentation.
📦 Support for two tasks:
- Event-Based Image Retrieval: Given a description, retrieve related image from the news database.
- Event-Enriched Image Captioning: Given an image and it related article, generate a caption enriched with real event details.
🚀 Support for EVENTA Grand Challenge @ ACM Multimedia 2025
OpenEvents V1 powers two brand-new tasks in the EVENTA 2025 Grand Challenge.
Click on an image to view its description.
We provide multiple versions of the OpenEvents V1 dataset to accommodate different computational resources. You can choose and download the version that best fits your setup.
📦 Access the dataset here: Google Drive – OpenEvents V1
đź”’ License: OpenEvents V1 is published under the CC-BY-NC-SA 4.0 License. The dataset is made available strictly for research and academic purposes only. You must not use this work for commercial purposes. If you alter or build upon this dataset, you have to distribute the resulting work only under the same license. If you are interested in commercial usage, please contact us.
đź“– If you use OpenEvents V1 in your work, please cite our paper(s) appropriately. Your citation helps support future updates and research efforts.
TBA