Home Overview Explore Download Publications Authors

🌍 OpenEvents V1: A Real-World News-Centric Dataset for Vision–Language Research

OpenEvents V1 is a large-scale, event-driven dataset built to bridge the gap between visual content and real-world news understanding. Collected from over a decade of reporting by two major global outlets — CNN and The Guardian — this dataset captures the dynamic intersection of images, events, and storytelling. It aims to foster research in context-aware image understanding, cross-modal retrieval, and news-grounded visual reasoning.

📰 What’s Inside?

200,000+ news articles with 400,000+ images
A rich and diverse database spanning 2011–2022, covering politics, climate, technology, culture, sports, and more.
30,000+ annotated image–event caption pairs
Expertly curated and split into training, public test, and private test sets for benchmarking and experimentation.

📦 Support for three tasks:

Event-Based Image Retrieval: Given a description, retrieve related image from the news database.
Event-Based Article Retrieval: Given an image, retrieve related articles from the news database.
Event-Enriched Image Captioning: Given an image and it related article, generate a caption enriched with real event details.

Benchmarking is hosted on Codabench via the EVENTA 2025 Challenge.

🚀 Support for EVENTA Grand Challenge @ ACM Multimedia 2025

OpenEvents V1 powers two brand-new tasks in the EVENTA 2025 Grand Challenge.

Explore

Click on an image to view its description.

Download

We provide multiple versions of the OpenEvents V1 dataset to accommodate different computational resources. You can choose and download the version that best fits your setup.

📦 Access the dataset here: Google Drive – OpenEvents V1

🔒 License: OpenEvents V1 is published under the CC-BY-NC-SA 4.0 License. The dataset is made available strictly for research and academic purposes only. You must not use this work for commercial purposes. If you alter or build upon this dataset, you have to distribute the resulting work only under the same license. If you are interested in commercial usage, please contact us.

📖 If you use OpenEvents V1 in your work, please cite our paper(s) appropriately. Your citation helps support future updates and research efforts.

Publications

Hieu Nguyen, Phuc-Tan Nguyen, Thien-Phuc Tran, Minh-Quang Nguyen, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le, "OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding", ACM International Conference on Multimedia, 2025. [arXiv]

Authors