Cross-language corpora of privacy policies (Q6717872)

From MaRDI portal





Dataset published at Zenodo repository.
Language Label Description Also known as
default for all languages
No label defined
    English
    Cross-language corpora of privacy policies
    Dataset published at Zenodo repository.

      Statements

      0 references
      The dataset consists of three different privacy policy corpora (in English and Italian) composed of 81 unique privacy policy texts spanning the period 2018-2021. This dataset makes available an example of three corpora of privacy policies. The first corpus is the English-language corpus, the original used in the study by Tang et al. [2]. The other two are cross-language corpora built (one, the source corpus, in English, and the other, the replication corpus, in Italian, which is the language of a potential replication study) from the first corpus. The policies were collected from: the Alexa top 10 Italy and U.S. websites rank; the Play Store apps rank in the most profitable games category of the Play Store for Italy and the U.S. We manually analyzed the Alexa top 10 Italy websites as of November 2021. Analogously, we analyzed selected apps that, in the same period, had ranked better in the most profitable games category of the Play Store for Italy. All the privacy policies are ANSI-encoded text files and have been manually read and verified. The dataset is helpful as a starting point for building comparable cross-language privacy policies corpora. The availability of these comparable cross-language privacy policies corpora helps replicate studies in different languages. Details on the methodology can be found in the accompanying paper. The available files are as follows: policies-texts.zip --contains a directory of text files with the policy texts. File names are the SHA1 hashes of the policy text. policy-metadata.csv --Contains a CSV filewith the metadatafor each privacy policy. This dataset is the original dataset used in the publication [1]. The original English U.S. corpus is described in the publication [2]. [1] F. Ciclosi, S. Vidor and F. Massacci. Building cross-language corpora for humanunderstanding of privacy policies. Workshop on Digital Sovereignty in Cyber Security: New Challenges in Future Vision. Communications in Computer and Information Science. Springer International Publishing, 2023, In press. [2] J. Tang, H. Shoemaker, A. Lerner, and E. Birrell. Defining Privacy: How UsersInterpret Technical Terms in Privacy Policies. Proceedings on Privacy EnhancingTechnologies, 3:7094, 2021.
      0 references
      16 March 2023
      0 references
      0 references
      0 references
      0 references
      1.0
      0 references

      Identifiers

      0 references