nomao (Q6034064)

From MaRDI portal
OpenML dataset with id 1486
Language Label Description Also known as
English
nomao
OpenML dataset with id 1486

    Statements

    0 references
    0 references
    **Author**: Nomao Labs\N\N**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Nomao)\N\N**Please cite**: Laurent Candillier and Vincent Lemaire. Design and Analysis of the Nomao Challenge - Active Learning in the Real-World. In: Proceedings of the ALRA : Active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK.\N\N1. Data set title:\NNomao Data Set \N\N\N2. Abstract: \NNomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place. Instances in the dataset compare 2 spots.\N\N3. Data Set Characteristics: \N\N- Univariate\N- Area: Computer\N- Attribute Characteristics: Real\N- Associated Tasks: Classification\N- Missing Values?: Yes\N\N\N4. Source:\N\N(a) Original owner of database (name / phone / snail address / email address) \NNomao / 00 33 5 62 48 33 90 / 1 avenue Jean Rieux, 31500 Toulouse / challenge '@' nomao.com \N(b) Donor of database (name / phone / snail address / email address) \NLaurent Candillier / - / 1 avenue Jean Rieux, 31500 Toulouse / laurent '@' nomao.com\N\N\N5. Data Set Information:\N\NThe dataset has been enriched during the Nomao Challenge: organized along with the ALRA workshop (Active Learning in Real-world Applications): held at the ECML-PKDD 2012 conference.\N\N5.1. Number of Instances\N\N34,465 instances, mix of continuous and nominal, labeled by human expert.\N\NFirst 29,104 instances have been labeled with "human prior".\NSee the corresponding article described in section "3. Past Usage" for more details.\N\NNext 917 instances have been labeled using the active learning method called "marg".\NNext 964 instances refer to the active method called "wmarg".\NNext 995 instances refer to the active method called "wmarg5".\NNext 1,985 instances refer to the active method called "rand" (random selection).\N\NLast instances have been labeled during the corresponding challenge.\NMore details can be found in http://www.nomao.com/labs/challenge\NNext 163 instances refer to the active method called "baseline".\NNext 167 instances refer to the active method called "nomao".\NAnd last 170 instances refer to the active method called "tsun".\N\N5.2. Number of Attributes \N\N120 attributes: 89 continuous, 31 nominal (including the attributes 'label' and 'id'). \N\NThe features are separated by comma.\N\N5.3. Attribute Information: \N\NMissing data are allowed, represented by question marks '?'.\N\NLabels are +1 if the concerned spots must be merged, -1 if they do not refer to the same entity.\N\N1 id: name is composed of the names of the spots that are compared, separated by a sharp (#). \N2 clean_name_intersect_min: continuous. \N3 clean_name_intersect_max: continuous. \N4 clean_name_levenshtein_sim: continuous. \N5 clean_name_trigram_sim: continuous. \N6 clean_name_levenshtein_term: continuous. \N7 clean_name_trigram_term: continuous. \N8 clean_name_including: n,s,m. \N9 clean_name_equality: n,s,m. \N10 city_intersect_min: continuous. \N11 city_intersect_max: continuous. \N12 city_levenshtein_sim: continuous. \N13 city_trigram_sim: continuous. \N14 city_levenshtein_term: continuous. \N15 city_trigram_term: continuous. \N16 city_including: n,s,m. \N17 city_equality: n,s,m. \N18 zip_intersect_min: continuous. \N19 zip_intersect_max: continuous. \N20 zip_levenshtein_sim: continuous. \N21 zip_trigram_sim: continuous. \N22 zip_levenshtein_term: continuous. \N23 zip_trigram_term: continuous. \N24 zip_including: n,s,m. \N25 zip_equality: n,s,m. \N26 street_intersect_min: continuous. \N27 street_intersect_max: continuous. \N28 street_levenshtein_sim: continuous. \N29 street_trigram_sim: continuous. \N30 street_levenshtein_term: continuous. \N31 street_trigram_term: continuous. \N32 street_including: n,s,m. \N33 street_equality: n,s,m. \N34 website_intersect_min: continuous. \N35 website_intersect_max: continuous. \N36 website_levenshtein_sim: continuous. \N37 website_trigram_sim: continuous. \N38 website_levenshtein_term: continuous. \N39 website_trigram_term: continuous. \N40 website_including: n,s,m. \N41 website_equality: n,s,m. \N42 countryname_intersect_min: continuous. \N43 countryname_intersect_max: continuous. \N44 countryname_levenshtein_sim: continuous. \N45 countryname_trigram_sim: continuous. \N46 countryname_levenshtein_term: continuous. \N47 countryname_trigram_term: continuous. \N48 countryname_including: n,s,m. \N49 countryname_equality: n,s,m. \N50 geocoderlocalityname_intersect_min: continuous. \N51 geocoderlocalityname_intersect_max: continuous. \N52 geocoderlocalityname_levenshtein_sim: continuous. \N53 geocoderlocalityname_trigram_sim: continuous. \N54 geocoderlocalityname_levenshtein_term: continuous. \N55 geocoderlocalityname_trigram_term: continuous. \N56 geocoderlocalityname_including: n,s,m. \N57 geocoderlocalityname_equality: n,s,m. \N58 geocoderinputaddress_intersect_min: continuous. \N59 geocoderinputaddress_intersect_max: continuous. \N60 geocoderinputaddress_levenshtein_sim: continuous. \N61 geocoderinputaddress_trigram_sim: continuous. \N62 geocoderinputaddress_levenshtein_term: continuous. \N63 geocoderinputaddress_trigram_term: continuous. \N64 geocoderinputaddress_including: n,s,m. \N65 geocoderinputaddress_equality: n,s,m. \N66 geocoderoutputaddress_intersect_min: continuous. \N67 geocoderoutputaddress_intersect_max: continuous. \N68 geocoderoutputaddress_levenshtein_sim: continuous. \N69 geocoderoutputaddress_trigram_sim: continuous. \N70 geocoderoutputaddress_levenshtein_term: continuous. \N71 geocoderoutputaddress_trigram_term: continuous. \N72 geocoderoutputaddress_including: n,s,m. \N73 geocoderoutputaddress_equality: n,s,m. \N74 geocoderpostalcodenumber_intersect_min: continuous. \N75 geocoderpostalcodenumber_intersect_max: continuous. \N76 geocoderpostalcodenumber_levenshtein_sim: continuous. \N77 geocoderpostalcodenumber_trigram_sim: continuous. \N78 geocoderpostalcodenumber_levenshtein_term: continuous. \N79 geocoderpostalcodenumber_trigram_term: continuous. \N80 geocoderpostalcodenumber_including: n,s,m. \N81 geocoderpostalcodenumber_equality: n,s,m. \N82 geocodercountrynamecode_intersect_min: continuous. \N83 geocodercountrynamecode_intersect_max: continuous. \N84 geocodercountrynamecode_levenshtein_sim: continuous. \N85 geocodercountrynamecode_trigram_sim: continuous. \N86 geocodercountrynamecode_levenshtein_term: continuous. \N87 geocodercountrynamecode_trigram_term: continuous. \N88 geocodercountrynamecode_including: n,s,m. \N89 geocodercountrynamecode_equality: n,s,m. \N90 phone_diff: continuous. \N91 phone_levenshtein: continuous. \N92 phone_trigram: continuous. \N93 phone_equality: n,s,m. \N94 fax_diff: continuous. \N95 fax_levenshtein: continuous. \N96 fax_trigram: continuous. \N97 fax_equality: n,s,m. \N98 street_number_diff: continuous. \N99 street_number_levenshtein: continuous. \N100 street_number_trigram: continuous. \N101 street_number_equality: n,s,m. \N102 geocode_coordinates_long_diff: continuous. \N103 geocode_coordinates_long_levenshtein: continuous. \N104 geocode_coordinates_long_trigram: continuous. \N105 geocode_coordinates_long_equality: n,s,m. \N106 geocode_coordinates_lat_diff: continuous. \N107 geocode_coordinates_lat_levenshtein: continuous. \N108 geocode_coordinates_lat_trigram: continuous. \N109 geocode_coordinates_lat_equality: n,s,m. \N110 coordinates_long_diff: continuous. \N111 coordinates_long_levenshtein: continuous. \N112 coordinates_long_trigram: continuous. \N113 coordinates_long_equality: n,s,m. \N114 coordinates_lat_diff: continuous. \N115 coordinates_lat_levenshtein: continuous. \N116 coordinates_lat_trigram: continuous. \N117 coordinates_lat_equality: n,s,m. \N118 geocode_coordinates_diff: continuous. \N119 coordinates_diff: continuous. \N120 label: +1,-1.\N\NRelevant Papers: Laurent Candillier and Vincent Lemaire. Design and Analysis of the Nomao Challenge - Active Learning in the Real-World. In: Proceedings of the ALRA : Active Learning in Real-world Applications, Workshop ECML-PKDD 2012, Friday, September 28, 2012, Bristol, UK.
    0 references
    25 May 2015
    0 references
    Class
    0 references
    0 references
    0 references
    8fc1ac73fbe5236892e166f9f24d7221
    0 references
    3
    0 references
    2
    0 references
    119
    0 references
    34,465
    0 references
    0
    0 references
    89
    0 references
    30
    0 references
    0 references