Experimental Data Set for the study "Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy"

DOI10.5281/zenodo.3886816Zenodo3886816MaRDI QIDQ6690431FDOQ6690431

Dataset published at Zenodo repository.

Benjamin Doerr, Quentin Renau, Johann Dreo, Carola Doerr

Publication date: 9 June 2020

Copyright license: Creative Commons Attribution 4.0 International

This are the feature values used in the study Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy. The dataset regroups feature values for every cheap features available in the R package flacco and are computed using 5 sampling strategies and in dimension $$d=5$$: Random: the classical Mersenne-Twister algorithm; Randu: a random number generator that is notoriously bad; LHS: a centered Latin Hypercube Design; iLHS: an improved Latin Hypercube Design; Sobol: points extracted from a Sobol low-discrepancy sequence. The csv file features_summury_dim_5_ppsn.csv regroups 100 values for every features whereas features_summury_dim_5_ppsn_median.csv regroups for every feature the median of the 100 values. In the folder PPSN_feature_plots are the histograms of feature values on the 24 COCO functions for 3 sampling strategies: Random, LHS and Sobol. The Python file sampling_ppsn.py is the code used to generate the sample points from which the feature values are computed. The file stats50_knn_dt.csv provide the raw data of median and IQR (inter quartile interval) for the heatmaps and boxplots available in the paper. Finally, the files results_classif_knn100.csv (resp. dt) provide the accuracy of 100 classifications for every settings.

This page was built for dataset: Experimental Data Set for the study "Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy"