COVID-19 & Election

Introduced by Bilal et al. in Evaluation of Thematic Coherence in Microblogs

These datasets were used in the paper 'Evaluation of Thematic Coherence in Microblogs' (ACL, 2021). The data is structured as follows: each file represents a cluster of tweets which contains the tweet IDs, the journalist annotations for quality evaluation and issue identification, as well as the metric evaluation scores. Note that a set of 50 clusters, equally split between COVID-19 and Election domains, is shared between the 3 annotators and thus contains 3 labels.

Each cluster of tweets is evaluated for its thematic coherence quality (3-point scale) and for its issue identification (Intruded, Chained or Random). For more information about the annotation scheme, please refer to the complete annotation guidelines (available at https://doi.org/10.6084/m9.figshare.14703471) or the paper.

Potential uses for these datasets are in the evaluation of thematic coherence, topic modelling and text summarisation fields.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Usage

License

CC-BY license

Modalities

Texts

Languages

English

COVID-19 & Election

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit