COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus

DOI10.5281/zenodo.4024177ZenodoMaRDI QIDQ6725161FDO

Publication date 11 September 2020

Copyright license Creative Commons Attribution 4.0 International

This dataset contains653 996tweets related to the Coronavirus topic and highlighted by hashtags suchas: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks. The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets. The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields: Author: the user who posted the tweet Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field Tweet: the full content of the tweet Hashtags: the list of hashtags present in the tweet Language: the language of the tweet Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc. Location: the country of the author of the tweet, which is unfortunately not always available Date: the publication date of the tweet Source: the device or platform used to send the tweet The dataset can as well be used to construct a social graph since it includes the relations Replies to, Retweet, MentionsInRetweet andMentions.

This page was built for dataset: COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus