🌍 OpenEvents V1: A Real-World News-Centric Dataset for Vision–Language Research
OpenEvents V1 is a large-scale, event-driven dataset built to bridge the gap between visual content and real-world news understanding. Collected from over a decade of reporting by two major global outlets — CNN and The Guardian — this dataset captures the dynamic intersection of images, events, and storytelling. It aims to foster research in context-aware image understanding, cross-modal retrieval, and news-grounded visual reasoning.
📰 What’s Inside?
- 200,000+ news articles with 400,000+ images
A rich and diverse database spanning 2011–2022, covering politics, climate, technology, culture, sports, and more. - 30,000+ annotated image–event caption pairs
Expertly curated and split into training, public test, and private test sets for benchmarking and experimentation.
🚀 Support for EVENTA Grand Challenge @ ACM Multimedia 2025
OpenEvents V1 powers two brand-new tasks in the EVENTA 2025 Grand Challenge:
📸 Event-Enriched Image Captioning
Can a model generate a news-savvy caption?
Given an image, retrieve related articles from the news database and generate a caption enriched with real event details and context.

🔍 Event-Based Image Retrieval
Can a model visualize the news?
Given a caption that describes a real-world event, retrieve the most relevant image(s) from the news database.

We provide multiple versions of the OpenEvents V1 dataset to accommodate different computational resources. You can choose and download the version that best fits your setup.
📦 Access the dataset here: Google Drive – OpenEvents V1
đź”’ Usage Notice: The dataset is made available strictly for research and academic purposes only. Commercial use is not permitted.
đź“– If you use OpenEvents V1 in your work, please cite our paper(s) appropriately. Your citation helps support future updates and research efforts.
TBA