Dataset of discussion threads from Meneame

From MaRDI portal



DOI10.5281/zenodo.2536218Zenodo2536218MaRDI QIDQ6696319FDOQ6696319

Dataset published at Zenodo repository.

Aragón Pablo, Andreas Kaltenbrunner, Vicenç Gómez

Publication date: 14 January 2019

Copyright license: Creative Commons Attribution 4.0 International



Dataset from our ICWSM 2017 paper. When using this resource, please use the following citation: Aragn P., Gmez V., Kaltenbrunner A. (2017) To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion, ICWSM-17- 11th International AAAI Conference on Web and Social Media, Montreal, Canada. @inproceedings {aragon2017ICWSM, author = {Arag\on, Pablo and G\omez, Vicen\c{c} and Kaltenbrunner, Andreas}, title = {To Thread or Not to Thread: The Impact of Conversation Threading on Online Discussion}, booktitle = {ICWSM-17 - 11th International AAAI Conference on Web and Social Media}, publisher = {The AAAI Press}, location = {Montreal, Canada}, year = 2017 } More info about this dataset can also be found at: Aragn P., Gmez V., Kaltenbrunner A., (2017) Detecting Platform Effects in Online Discussions, Policy Internet, 9, 2017. @article{aragon2017PI, author = {Arag\on, Pablo and G\omez, Vicen\c{c} and Kaltenbrunner, Andreas}, title = {Detecting Platform Effects in Online Discussions}, journal = {Policy \ Internet}, volume = {9}, number = {4}, pages = {420-443}, doi = {10.1002/poi3.158}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/poi3.158}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/poi3.158}, year = {2017} } Crawling process We built a crawling process that collects all the stories in the front page of Meneame from 2011 to 2015 (both years included). We then performed a second crawling process to collect every comment from the discussion thread of each story. From both crawling processes, we obtained 72,005 stories and 5,385,324 comments. It is important to highlight two issues taken into account when the crawler was designed. First, the machine-readable robots.txt file on Meneame does not disallow this process. Second, the footnote of Meneame indicates the licenses of the code, graphics and content of the website. The license for content is Attribution 3.0 Spain (CC BY 3.0 ES) which allows us to release this dataset. Fields Every discussion thread is stored in a JSON file named with the URL slug of the corresponding story in Meneame, located in a yyyy-mm-dd folder. The JSON file is an array of elements with the following fields: id (string): ID of the story/comment sent (timestamp): Date of the story/comment as yyyy-MM-ddThh:mm:ssZ. message (string): Text of the story/comment user (string): Username of the authoring story/comment karma (number): Karma score of the comment when the crawling was performed comments_count (number): Number of comments in reply to the story/post votes (number): Number of votes to the story/comment thread (string): URL of the thread thread_id (string): Sequential arriving order to the thread (0 if story, =1 if comment) depth (string): Depth within the thread (0 if story, =1 if comment) url (string): URL of the specific story/comment title (string): Title, only available for stories. published (string): Date when published on the front page, only available for stories. tags (string): Tags, only available for stories. clics (string): Number of clicks, only available for stories. users (string): Number of user votes, only available for stories. anonymous (string): Number of anonymous votes, only available for stories. negatives (string): Number of negative votes, only available for stories. in_reply_to_id (string): ID of the parent story/comment, only available for comments. in_reply_to_user (string): Authoring user of the parent story/comment, only available for comments. in_reply_to_thread_id (string): Sequential arriving order to the thread of of the parent story/comment, only available for comments. Acknowledgment This work is supported by the Spanish Ministry of Economy and Competitiveness under the Mara de Maeztu Units of Excellence Programme (MDM-2015-0502).







This page was built for dataset: Dataset of discussion threads from Meneame