okcupid_stem (Q6037887)

From MaRDI portal
OpenML dataset with id 45067
Language Label Description Also known as
English
okcupid_stem
OpenML dataset with id 45067

    Statements

    0 references
    0 references
    User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics Education, 23(2).]. The curated dataset was downloaded from [https://github.com/rudeboybert/JSE_OkCupid]. The original dataset was created with the use of a python script that pulled the data from public profiles on www.okcupid.com on 06/30/2012. It includes people (n = 59946) within a 25 mile radius of San Francisco, who were online in the last year (06/30/2011), with at least one profile picture. Permission to use this data was obtained by the author of the original paper from OkCupid president and co-founder Christian Rudder under the condition that the dataset remains public. As target, the variable 'job' was collapsed into three categories: 'stem', 'non_stem', and 'student'. STEM jobs were defined as 'job' %in% c('computer / hardware / software', 'science / tech / engineering'). Observations with 'job' %in% c('unemployed', 'retired', 'rather not say') or missing values in 'job' were removed. The factor labels of the variable 'speaks' had to be changed to integers to prevent a bug which would not allow the upload of the variable as a nominal feature. The original dataset also included ten open text variables 'essay0' to 'essay9', which were removed from the dataset uploaded here. The dataset further includes the date/time variable 'last_online' (ignored by default) which could be used to construct additional features. Using OkCupid data for predicting STEM jobs was inspired by Max Kuhns book 'Feature Engineering and Selection: A Practical Approach for Predictive Models' [https://bookdown.org/max/FES/].
    0 references
    Yoontae Hwang, Youngbin Lee, Yongjae Lee
    0 references
    01-27-2023
    0 references
    27 January 2023
    0 references
    class
    0 references
    9c2f6a59736eecfdf05fd64c566be1c5
    0 references
    1
    0 references
    3
    0 references
    14
    0 references
    26,677
    0 references
    0
    0 references
    11
    0 references
    0 references