DOI10.5281/zenodo.7729546Zenodo7729546MaRDI QIDQ6717872FDOQ6717872
Dataset published at Zenodo repository.
Fabio Massacci, Francesco Ciclosi, Silvia Vidor
Publication date: 16 March 2023
Copyright license: Creative Commons Attribution 4.0 International
The dataset consists of three different privacy policy corpora (in English and Italian) composed of 81 unique privacy policy texts spanning the period 2018-2021. This dataset makes available an example of three corpora of privacy policies. The first corpus is the English-language corpus, the original used in the study by Tang et al. [2]. The other two are cross-language corpora built (one, the source corpus, in English, and the other, the replication corpus, in Italian, which is the language of a potential replication study) from the first corpus. The policies were collected from: the Alexa top 10 Italy and U.S. websites rank; the Play Store apps rank in the most profitable games category of the Play Store for Italy and the U.S. We manually analyzed the Alexa top 10 Italy websites as of November 2021. Analogously, we analyzed selected apps that, in the same period, had ranked better in the most profitable games category of the Play Store for Italy. All the privacy policies are ANSI-encoded text files and have been manually read and verified. The dataset is helpful as a starting point for building comparable cross-language privacy policies corpora. The availability of these comparable cross-language privacy policies corpora helps replicate studies in different languages. Details on the methodology can be found in the accompanying paper. The available files are as follows: policies-texts.zip --contains a directory of text files with the policy texts. File names are the SHA1 hashes of the policy text. policy-metadata.csv --Contains a CSV filewith the metadatafor each privacy policy. This dataset is the original dataset used in the publication [1]. The original English U.S. corpus is described in the publication [2]. [1] F. Ciclosi, S. Vidor and F. Massacci. Building cross-language corpora for humanunderstanding of privacy policies. Workshop on Digital Sovereignty in Cyber Security: New Challenges in Future Vision. Communications in Computer and Information Science. Springer International Publishing, 2023, In press. [2] J. Tang, H. Shoemaker, A. Lerner, and E. Birrell. Defining Privacy: How UsersInterpret Technical Terms in Privacy Policies. Proceedings on Privacy EnhancingTechnologies, 3:7094, 2021.
This page was built for dataset: Cross-language corpora of privacy policies