Processed Datasets - Imputation in Well Log Data: A Benchmark
DOI10.5281/zenodo.10987946Zenodo10987946MaRDI QIDQ6701681FDOQ6701681
Dataset published at Zenodo repository.
Jessica Sena, Jackson Faria, André Korenchendler, Lucas Perez, Francisco Neves, Alexei Manso Correa Machado, Vinícius R. Riffel, Pedro H. T. Gama, Matheus C. A. Sobreira
Publication date: 16 April 2024
Copyright license: Creative Commons Attribution 4.0 International
Imputation of well log data is a common task in the field. However a quick review of the literature reveals a lack of padronization when evaluating methods for the problem.The goal of the benchmark is to introduce a standard evaluation protocol to any imputation method for well log data. In the proposed benchmark, three public datasets are used: Geolink: The Geolink Dataset is another public dataset of wells in the Norwegian offshore. The data is provided by the company of the same name,GEOLINK and follows the NOLD 2.0 license. This dataset contains a total of 223 wells. It also has lithology labels for the wells with a total of 36 lithology classes. [download original] Taranaki Basin: The Taranaki Basin Dataset is a curated set of wells and a convenient option for experimentation especially due to it is ease of accessibility and use.This collection, under the CDLA-Sharing-1.0 license, contains well logs extracted from the New Zealand Petroleum Minerals Online Exploration Database andPetlab.There are a total of 407 wells, of which 289 are onshore and 118 are offshore exploration and production wells. [download original] Teapot Dome: The Teapot Dome dataset is provided by the Rocky Mountain Oilfield Testing Center (RMOTC) and the US Department of Energy.It contains different types of data related to the Teapot Dome oil field, such as 2D and 3D seismic data, well logs, and GIS data. The data is licensed under the Creative Commons 4.0 license. In total, the dataset has 1,179 wells with available logs. The number of available logs varies across wells. There are only 91 wells with the gamma ray, bulk density, and neutron porosity logs, while only three wells have the complete basic suite. [direct download] Here you can download all three datasets already preprocessed to be used with our implementation, found here. File Description: There are six files for each fold partition for each dataset. datasetname_fold_k_well_log_metadata_train.json : JSON file with general information of the slices of training partition of the fold k. Contains total number of slices and the number of slices per well. datasetname_fold_k_well_log_metadata_val.json : JSON file with general information of the slices of validation partition of the foldk. Contains total number of slices and the number of slices per well. datasetname_fold_k_well_log_slices_train.npy: .npy (numpy) file ready to be loaded with the slices for training of the foldk already processed. When loaded should have shape of (total_slices, 256, number_of_logs) datasetname_fold_k_well_log_slices_val.npy: .npy (numpy) file ready to be loaded with the slices for validation of the foldk already processed. datasetname_fold_k_well_log_slices_meta_train.json :JSON file with the slices info for all slices in the training partition of the fold k. For each slice, 7 data points are provided, the last four are discarded (it would contain other information that was not used). The first three are in order the: origin well name, the starting position in that well, and the end position of the slice in that well. datasetname_fold_k_well_log_slices_meta_val.json : JSON file with the slices info for all slices in the validation partition of the fold k.
This page was built for dataset: Processed Datasets - Imputation in Well Log Data: A Benchmark