Data sets and machine learning models for: Predicting critical properties and acentric factor of fluids using multi-task machine learning

From MaRDI portal
(Redirected from Dataset:6693416)



DOI10.5281/zenodo.8072892Zenodo8072892MaRDI QIDQ6693416FDOQ6693416

Dataset published at Zenodo repository.

Yunsie Chung, Sayandeep Biswas, William H. Green, Haoyang Wu, Josephine Ramirez

Publication date: 6 April 2023

Copyright license: Creative Commons Attribution 4.0 International



The experimental data sets, data splits, additional features, QM calculations, model predictions, and final machine learning models for the manuscript Predicting Critical Properties and Acentric Factor of Fluids Using Multi-Task Machine Learning.Citation should refer directly to the manuscript: Biswas, S.;Chung, Y.;Ramirez, J.;Wu, H.;Green, W. H.Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning. Journal of Chemical Information and Modeling.202363(15), 4574-4588. DOI: 10.1021/acs.jcim.3c00546 To use the machine learningmodels, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/crit_prop. Detailed informationcan be found in README.md file. Details on the properties considered The data set includes the following 8 properties: Tc: critical temperature, in K Pc: critical pressure, in bar rhoc: critical density, in mol/L omega: acentric factor, unitless Tb: boiling point, in K Tm: melting point, in K dHvap: enthalpy of vaporization at boiling point, in kJ/mol dHfus: enthalpy of fusion at melting point, in kJ/mol Details on the files 1. Data sets under CritProp_v1.1.0: all_data: includes the data sets used in this work. All data points are listed for each chemical compound as well asits corresponding data source. The details of the data sources can be found in the README.md file. The distribution of the data set is included in each folder. estimated_data_for_pretraining:contains the estimated data from Yaws handbook that are used to pre-train our machine learning(ML) model. experimental_data: contains the experimental data (references 1 - 15) used to fine-tune our final ML model. additional_features: includes the additional features tested for the ML model.The Abraham features are generated for all data (references 1 - 15)while the acsf, qm, and rdkit features are only generated for the data from references 1 - 9. abraham: Abraham solute parameters (E, S, A, B, L). Molecular features. acsf: ACSF (atom-centered symmetry functions). Atomic features that are coverted from the 3D coordinates of the compound qm_atom: QM (quantum chemical) atomic feature. qm_mol: QM molecular feature. rdkit: Selected RDKit 2D molecular features. data_splits_and_model_predictions: contains the training and test sets used to evaluate the model. It alsocontains the predicted values from our final ML model for each test set. random and scaffold splits: training and test sets that include the data from references 1 - 9. external test set: a test set that includes the data from only references 10 - 15. 2. Machine learning (ML) model files: CritProp_ML_model_files_with_abraham_feat.zip: contains the Chemprop ML model files that are trained using Abraham featuresas additional molecular features. This gives the best results. CritProp_ML_model_files_without_additional_feat.zip: contains the Chemprop ML model files that are trained withoutany additional features. This gives the second best results. To use these ML models, please refer to the sample files and instructions on https://github.com/yunsiechung/chemprop/tree/crit_prop 3. QM (quantum chemical) calculations: QM_calculations.zip: contains the results of the QM calculations that are performed to compute QM features.







This page was built for dataset: Data sets and machine learning models for: Predicting critical properties and acentric factor of fluids using multi-task machine learning