OpenEvents V1 Dataset

🌍 OpenEvents V1: A Real-World News-Centric Dataset for Vision–Language Research

OpenEvents V1 is a large-scale, event-driven dataset built to bridge the gap between visual content and real-world news understanding. Collected from over a decade of reporting by two major global outlets — CNN and The Guardian — this dataset captures the dynamic intersection of images, events, and storytelling. It aims to foster research in context-aware image understanding, cross-modal retrieval, and news-grounded visual reasoning.

📰 What’s Inside?

  • 200,000+ news articles with 400,000+ images
    A rich and diverse database spanning 2011–2022, covering politics, climate, technology, culture, sports, and more.
  • 30,000+ annotated image–event caption pairs
    Expertly curated and split into training, public test, and private test sets for benchmarking and experimentation.

🚀 Support for EVENTA Grand Challenge @ ACM Multimedia 2025

OpenEvents V1 powers two brand-new tasks in the EVENTA 2025 Grand Challenge:

📸 Event-Enriched Image Captioning

Can a model generate a news-savvy caption?
Given an image, retrieve related articles from the news database and generate a caption enriched with real event details and context.

🔍 Event-Based Image Retrieval

Can a model visualize the news?
Given a caption that describes a real-world event, retrieve the most relevant image(s) from the news database.

Explore
Download

We provide multiple versions of the OpenEvents V1 dataset to accommodate different computational resources. You can choose and download the version that best fits your setup.

📦 Access the dataset here: Google Drive – OpenEvents V1

đź”’ Usage Notice: The dataset is made available strictly for research and academic purposes only. Commercial use is not permitted.

đź“– If you use OpenEvents V1 in your work, please cite our paper(s) appropriately. Your citation helps support future updates and research efforts.

Publications

TBA

Authors
Trung-Nghia Le
Trung-Nghia Le
University of Science, Vietnam
Phuc-Tan Nguyen
Phuc-Tan Nguyen
University of Science, Vietnam
Trong-Hieu Nguyen
Trong-Hieu Nguyen
University of Science, Vietnam
Thien-Phuc Tran
Thien-Phuc Tran
University of Science, Vietnam
Minh-Quang Nguyen
Minh-Quang Nguyen
University of Science, Vietnam

Contact: ltnghia@fit.hcmus.edu.vn