COVID-19 & Election

Introduced by Bilal et al. in Evaluation of Thematic Coherence in Microblogs

These datasets were used in the paper 'Evaluation of Thematic Coherence in Microblogs' (ACL, 2021). The data is structured as follows: each file represents a cluster of tweets which contains the tweet IDs, the journalist annotations for quality evaluation and issue identification, as well as the metric evaluation scores. Note that a set of 50 clusters, equally split between COVID-19 and Election domains, is shared between the 3 annotators and thus contains 3 labels.

Each cluster of tweets is evaluated for its thematic coherence quality (3-point scale) and for its issue identification (Intruded, Chained or Random). For more information about the annotation scheme, please refer to the complete annotation guidelines (available at https://doi.org/10.6084/m9.figshare.14703471) or the paper.

Potential uses for these datasets are in the evaluation of thematic coherence, topic modelling and text summarisation fields.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • CC-BY license

Modalities


Languages