OpenEvents V1 Dataset

🌍 OpenEvents V1: A Real-World News-Centric Dataset for Vision–Language Research

OpenEvents V1 is a large-scale, event-driven dataset built to bridge the gap between visual content and real-world news understanding. Collected from over a decade of reporting by two major global outlets — CNN and The Guardian — this dataset captures the dynamic intersection of images, events, and storytelling. It aims to foster research in context-aware image understanding, cross-modal retrieval, and news-grounded visual reasoning.

📰 What’s Inside?

  • 200,000+ news articles with 400,000+ images
    A rich and diverse database spanning 2011–2022, covering politics, climate, technology, culture, sports, and more.
  • 30,000+ annotated image–event caption pairs
    Expertly curated and split into training, public test, and private test sets for benchmarking and experimentation.

📦 Support for two tasks:

  • Event-Based Image Retrieval: Given a description, retrieve related image from the news database.
  • Event-Enriched Image Captioning: Given an image and it related article, generate a caption enriched with real event details.

🚀 Support for EVENTA Grand Challenge @ ACM Multimedia 2025

OpenEvents V1 powers two brand-new tasks in the EVENTA 2025 Grand Challenge.

Explore

Click on an image to view its description.


Download

We provide multiple versions of the OpenEvents V1 dataset to accommodate different computational resources. You can choose and download the version that best fits your setup.

📦 Access the dataset here: Google Drive – OpenEvents V1

đź”’ License: OpenEvents V1 is published under the CC-BY-NC-SA 4.0 License. The dataset is made available strictly for research and academic purposes only. You must not use this work for commercial purposes. If you alter or build upon this dataset, you have to distribute the resulting work only under the same license. If you are interested in commercial usage, please contact us.

đź“– If you use OpenEvents V1 in your work, please cite our paper(s) appropriately. Your citation helps support future updates and research efforts.

Publications

TBA

Authors
Trung-Nghia Le
Trung-Nghia Le
University of Science, Vietnam
Minh-Triet Tran
Minh-Triet Tran
University of Science, Vietnam
Tam Nguyen
Tam Nguyen
University of Dayton, US
Phuc-Tan Nguyen
Phuc-Tan Nguyen
University of Science, Vietnam
Trong-Hieu Nguyen
Trong-Hieu Nguyen
University of Science, Vietnam
Thien-Phuc Tran
Thien-Phuc Tran
University of Science, Vietnam
Minh-Quang Nguyen
Minh-Quang Nguyen
University of Science, Vietnam

Contact: ltnghia@fit.hcmus.edu.vn