Data for "How do ecologists estimate occupancy in practice?" by Goldstein et al.

From MaRDI portal
Dataset:6683482



DOI10.5281/zenodo.10080469Zenodo10080469MaRDI QIDQ6683482FDOQ6683482

Dataset published at Zenodo repository.

Abigail G. Keller, Benjamin R. Goldstein, Mitchell W. Serota, Felipe Montealegre-Mora, Amy van Scoyoc, Kendall L. Calhoun, Kristin J. Barker, Perry de Valpine, Chelsea Andreozzi, Phoebe Parker-Shames

Publication date: 21 December 2023

Copyright license: Creative Commons Attribution 4.0 International



Data for review of occupancy estimtaion methods by Goldstein et al. Please see the accompanying manuscript for full methodology. We will link to it when the manuscript is published. This upload contains three datasets: "binary_scores_phase1.csv", "binary_scores_phase2.csv" and "modsel_results.csv". Each is a .csv file containing data from a survey of occupancy estimation practices. Each row represents one peer-reviewed paper, while each column represents a characteristic of the paper. Most columns are TRUE/FALSE, indicating whether or not the paper satisfied the relevant criterion. One additional raw resource is provided. The .zip file "all_papers_2022-04-07.zip" contains 6 .xls files giving the full set of papers returned by the original Web of Science search. These are unmodified from the initial search. All datasets contain the column: ID - A unique ID for each paper; it most cases, a DOI. When Web of Science returned an invalid DOI, the ID is set to the paper title instead. Note that all papers in Phase 2 are also in Phase 1, and all model selection papers are in both Phase 1 and Phase 2. The ID column can be used to join datasets. Across both datasets, if all options in a category are FALSE or if a field is NA, that may mean that the review team was unable to determine what choices the authors made. binary_scores_phase1.csv gives the results of Phase 2 of the review. It contains the following columns: coll_newdata - Did the authors analyze newly collected data? coll_existing - Did the authors analyze existing, previously published data? coll_longterm - Did the authors analyze data produced by a long-term monitoring program? coll_particip - Did the authors analyze participatory science data? eco_frshwtr - Was the study system a freshwater ecosystem? eco_marine - Was the study system a marine ecosystem? eco_terra - Was the study system a terrestrial ecosystem? region_USA - Were the data collected in the USA? region_NoAm - Were the data collected in North America? region_CenAm - Were the data collected in Central America? region_SoAm - Were the data collected in South America? region_Africa - Were the data collected in Africa? region_Eur - Were the data collected in Europe? region_Asia - Were the data collected in Asia? region_Oceania - Were the data collected in Oceania? framework_MLE - Did the authors estimate models in a maximum likelihood framework? framework_ML - Did the authors estimate models in a machine learning framework? framework_Bayes - Did the authors estimate models in a Bayesian framework? gof_AUC - Did the authors use area-under-the-curve to evaluate their models? gof_CV - Did the authors use cross validation to evaluate their models? gof_PPC - Did the authors use poserior predictive checks to evaluate their models? gof_parboot - Did the authors use parametric bootstrapping to evaluate their models? gof_bayespv - Did the authors use Bayesian p-values to evaluate their models? gof_any - Did the authors conduct any model checking? taxon_mammal - Were some or all of the study species mammals? taxon_bird - Were some or all of the study species birds? taxon_herp - Were some or all of the study species herptiles (reptiles and amphibians)? taxon_fish - Were some or all of the study species fish? taxon_arthro - Were some or all of the study species arthropods? taxon_othinv - Were some or all of the study species non-arthropod invertebrates? soft_unmarked - Did the authors estimate models using the software unmarked? soft_PRESENCE - Did the authors estimate models using PRESENCE-family software? soft_MARK - Did the authors estimate models using MARK-family software? soft_JAGS - Did the authors estimate models using JAGS-family software? soft_lme4 - Did the authors estimate models using the software lme4? soft_MaxEnt - Did the authors estimate models using the software MaxEnt? soft_baseR - Did the authors estimate models using custom models written in base-R? soft_NR - Did the authors fail to clearly report what software they used to estimate models? soft_other - Did the authors estimate models using some other software? nspecies - How many species did the authors study? nspec_1 - Did the authors analyze data on exactly 1 species? nspec_2 - Did the authors analyze data on exactly 2 species? nspec_3_5 - Did the authors analyze data on 3-5 species? nspec_6_10 - Did the authors analyze data on 6-10 species? nspec_11_20 - Did the authors analyze data on 11-20 species? nspec_20plus - Did the authors analyze data on more than 20 species? compare_avg - Did the authors conduct model averaging? compare_sel - Did the authors conduct model selection? compare_other - Did the authors compare multiple models without averaging or selecting between them? modsel_AICc - Did the authors use AICc in model selection or averaging? modsel_AIC - Did the authors use AIC in model selection or averaging? modsel_other - Did the authors use another information criterion in model selection or averaging? dattype_DND - Were some or all of the data collected as detection-nondetection data? dattype_count - Were some or all of the data collected as count data? dattype_PO - Were some or all of the data collected as presence-only data? dattype_other - Were some or all of the data collected in some other form? survey_visual - Were some or all of the data collected using in-person visual surveys? survey_audio - Were some or all of the data collected using in-person audio surveys? survey_camtrap - Were some or all of the data collected using camera trap surveys? survey_capture - Were some or all of the data collected using animal capture surveys? survey_sign - Were some or all of the data collected using sign surveys? survey_passaudio - Were some or all of the data collected using passive acoustic surveys? survey_DNA - Were some or all of the data collected using eDNA surveys? survey_other - Were some or all of the data collected using some other survey protocol? modtype_SSOM - Did the authors analyze data with an SSOM? modtype_DynOcc - Did the authors analyze data with a dynamic occupancy model? modtype_GLM - Did the authors analyze data with a GLM? modtype_cooccur - Did the authors analyze data with a multispecies co-occurrence model? modtype_community - Did the authors analyze data with a multispecies community model? modtype_MaxEnt - Did the authors analyze data with a MaxEnt model? modtype_other - Did the authors analyze data with another model? affil_acad - Did any of the authors have an academic affiliation? affil_govt - Did any of the authors have a government affiliation? affil_other - Did any of the authors have a private or NGO affiliation? binary_scores_phase2.csv gives the results of Phase 2 of the review. It contains the following columns: code_avail - Did we determine that the authors published their model fitting code? data_avail - Did we determine that the authors published their data? nsite - At how many sites were data collected? nsite_lt20 - Were data collected at 20 or fewer sites? nsite_21_50 - Were data collected at 20-50 sites? nsite_51_100 - Were data collected at 51-100 sites? nsite_101p - Were data collected at more than 100 sites? time_pds - Over how many primary time periods (e.g. sampling seasons) were data collected? time_pds_1 - Were data collected during a single time period? time_pds_2_3 - Were data collected during 2-3 time periods? time_pds_4p - Were data collected during 4 or more time periods? nrepl - Roughly how many replicate surveys were collected per site? nrepl_1 - Was only one replicate survey conducted per site? nrepl_2_3 - Were 2-3 replicate surveys conducted per site? nrepl_4_5 - Were 4-5 replicate surveys conducted per site? nrepl_6p - Were 6 or more replicate surveys conducted per site? window - What sampling window was used to discretize continuous-time sampling? (continuous-time studies only; otherwise NA) det_window_lt_day - Was a sampling unit of less than one day used to discretize sampling? det_window_1day - Was a sampling unit of one day used to discretize sampling? det_window_2_6day - Was a sampling unit of 2-6 days used to discretize sampling? det_window_7_14day - Was a sampling unit of l7-14 days used to discretize sampling? det_window_15p_day - Was a sampling unit of 15 days or more used to discretize sampling? homerange_is_bigger - Did the authors describe their survey area per site as bigger than the target species' home range? homerange_is_smaller -Did the authors describe their survey area per site as smaller than the target species' home range? informative_priors - Did the authors use informative priors in a Bayesian analysis? time_in_mod_as_covar - Did the authors include primary time periods in the model as a covariate? time_in_mod_separate_model - Did the authors use separate models to analyze data collected during different primary time periods? time_in_mod_other - Did the authors include primary time periods in the model in some other way? ncovar_det - How many covariates were included in the best model's detection submodel? ncovar_det_zero - Did the detection submodel include 0 covariates? ncovar_det_1_3 - Did the detection submodel include 1-3 covariates? ncovar_det_4p - Did the detection submodel include 4 or more covariates? ncovar_occ - How many covariates were included in the best model's occupancy submodel? ncovar_occ_zero - Did the occupancy submodel include 0 covariates? ncovar_occ_1_3 - Did the occupancy submodel include 1-3 covariates? ncovar_occ_4p - Did the occupancy submodel include 4 or more covariates? covars_both_mods - Were any variables considered in both submodels? model_ranefs - Did the authors include any random effects? model_expl_spatial - Did the model have an explicit spatial component? motivating_q_range - Was range estimation a main goal motivating the study? motivating_q_drivers - Was identifying drivers of occupancy a main goal motivating the study? motivating_q_trends - Was identifying trends in occupancy a main goal motivating the study? motivating_q_predict - Was predicting occupancy under new conditions a main goal motivating the study? motivating_q_theoretical - Was advancing ecological theory a main goal motivating the study? motivating_q_field_method - Was evaluation of a field method a main goal motivating the study? motivating_q_model_method - Was evaluation of a modeling method a main goal motivating the study? context_conservation - Did the authors contextualize their study as relevant to conservation? context_management - Did the authors contextualize their study as relevant to wildlife management? context_natural_hist - Did the authors contextualize their study as relevant to studying natural history of target species? context_methodology - Did the authors contextualize their study as advancing methodology? detdensity - Did the authors mention the possibility that detection and animal density were confounded? interp_top_mod_as_biol - Did the authors interpret model selection results as evidence for a biological process? nmod_reported_best - Did the authors report only the best model from a model selection workflow? nmod_reported_some - Did the authors report multiple model results from a model selection workflow? nmod_reported_all - Did the authors report all model results from a model selection workflow? priors_reported - Did the authors report their priors in a Bayesian workflow? violation_nonindependence - Do the authors acknowledge violating the assumption of independent data? violation_movement - Do the authors acknowledge violating the assumption of no animal movement? violation_demography - Do the authors acknowledge violating the assumption of no demographic change? violation_det_heterogeneity - Do the authors acknowledge violating the assumption of no unmodeled heterogeneity in detection? violation_yes_other - Do the authors acknowledge violating another assumption? violation_explicit_no - Do the authors state that all assumptions were met? hypotheses_all - Do the authors provide hypotheses for the effect of all covariates? hypotheses_some - Do the authors provide hypotheses for the effect of some covariates? interpret_det_literal - Do the authors interpret detection literally? interpret_det_biol - Do the authors interpret detection as confounded with biology? interpret_vars_significant - Do the authors interpret some variables as significant based on p-values? interpret_vars_credible - Do the authors interpret some variables as meaningful based on Bayesian credible intervals? modsel_results.csv describes the model selection choices made by 64 Phase 2 papers that conducted model selection. In addition to ID, it contains two columns: Submodel approach - Did the authors use a separate-by-submodel approach to model selection, did they only conduct model selection on one submodel, or did they perform variable selection on both submodels simultaneously? Candidate set approach - Did the authors conduct model selection among a set of a priori candidate models, or did they select between arbitrary models based on combinations of all covariates?







This page was built for dataset: Data for "How do ecologists estimate occupancy in practice?" by Goldstein et al.