Portal/rdm/examples/weather prediction: Difference between revisions

From MaRDI portal
LKastner (talk | contribs)
mNo edit summary
LKastner (talk | contribs)
Complete and enhance references.
Line 11: Line 11:
for Medium-Range Weather Forecasts ([https://www.ecmwf.int/en/forecasts ECMWF]). '''Data Types:''' Measurement data (temperature,
for Medium-Range Weather Forecasts ([https://www.ecmwf.int/en/forecasts ECMWF]). '''Data Types:''' Measurement data (temperature,
precipitation, humidity, wind speed/direction, pressure). '''Data Processing:''' Algorithms. Moving Average Filter: This algorithm can smooth out short-term fluctuations in the data, potentially revealing
precipitation, humidity, wind speed/direction, pressure). '''Data Processing:''' Algorithms. Moving Average Filter: This algorithm can smooth out short-term fluctuations in the data, potentially revealing
underlying trends relevant to weather prediction <ref>(Alerskans and Kaas, 2021)</ref>. Standard Deviation
underlying trends relevant to weather prediction <ref>Alerskans, E. and Kaas, E. (2021). [https://doi.org/10.1002/met.2006 Local temperature forecasts based on statistical post-processing of numerical weather prediction data.] ''Meteorological Applications'', 28(4):e2006.</ref>. Standard Deviation
Filter: This can identify outliers that deviate significantly from the average, potentially indicating
Filter: This can identify outliers that deviate significantly from the average, potentially indicating
errors or unusual weather events <ref>(Grönquist et al., 2021)</ref>. Regularised online forecaster: This algorithm is based on the regularised prediction schemes which returns non-parametric prediction rules
errors or unusual weather events <ref>Grönquist, P., Yao, C., Ben-Nun, T., Dryden, N., Dueben, P., Li, S., and Hoefler, T. (2021). [https://doi.org/10.1098/rsta.2020.0092 Deep learning for post-processing ensemble weather forecasts.] ''Philosophical Transactions of the Royal Society A'',
<ref>(Jézéquel et al., 2019)</ref> in (possibly) infinite-dimensional spaces. '''Feature Engineering:''' Create
379(2194):20200092.</ref>. Regularised online forecaster: This algorithm is based on the regularised prediction schemes which returns non-parametric prediction rules
<ref>Jézéquel, R., Gaillard, P., and Rudi, A. (2019). [https://proceedings.neurips.cc/paper/2019/hash/faad95253aee7437871781018bdf3309-Abstract.html Efficient online learning with kernels for adversarial large scale problems.] In ''Advances in Neural Information Processing Systems'', pages 9427–9436.</ref> in (possibly) infinite-dimensional spaces. '''Feature Engineering:''' Create
new variables relevant to prediction (e.g., temperature difference from previous day, dew point).
new variables relevant to prediction (e.g., temperature difference from previous day, dew point).
'''Data Volume''': Public weather datasets can be quite large, depending on the chosen timeframe and
'''Data Volume''': Public weather datasets can be quite large, depending on the chosen timeframe and

Revision as of 14:29, 16 May 2024

Example: RDM in a hypothetical project on weather prediction

Project description

This project plan demonstrates a blueprint how the data can be managed in a project of weather prediction using publicly available data sources. It ensures data quality, accessibility, and potential contributions to future research.

1. Data description

Existing Data Reuse: Leverage publicly available historical weather data from a source like the National Oceanic and Atmospheric Administration (NOAA) or the European Centre for Medium-Range Weather Forecasts (ECMWF). Data Types: Measurement data (temperature, precipitation, humidity, wind speed/direction, pressure). Data Processing: Algorithms. Moving Average Filter: This algorithm can smooth out short-term fluctuations in the data, potentially revealing underlying trends relevant to weather prediction [1]. Standard Deviation Filter: This can identify outliers that deviate significantly from the average, potentially indicating errors or unusual weather events [2]. Regularised online forecaster: This algorithm is based on the regularised prediction schemes which returns non-parametric prediction rules [3] in (possibly) infinite-dimensional spaces. Feature Engineering: Create new variables relevant to prediction (e.g., temperature difference from previous day, dew point). Data Volume: Public weather datasets can be quite large, depending on the chosen timeframe and spatial resolution. We may need to download and process data in chunks.

2. Documentation and data quality

Metadata: Record metadata like the source agency, data format (e.g., CSV), time period covered, spatial resolution (e.g., zip code, city), any other data processing steps applied. Quality Control: Perform data quality checks for outliers, inconsistencies, and missing values. Statistical methods like q-q plots and analysis of interquartile range (IQR) can identify potential anomalies. Software Tools: Python programming language with following libraries: Pandas, NumPy, and scikit-learn will be used for data analysis and model building.

3. Storage and technical archiving the project

Storage: The downloaded data to be stored in a secure, version-controlled repository like zenodo. Processed data (cleaned, engineered) will be saved as separate files for clarity. Data Security: The repository will be restricted to project members. The results of the experiments will obtain open access on the repository. Downloaded data might be compressed to save storage space.

4. Legal obligations and conditions

Data Source: Comply with the chosen source’s data usage policies and attribution requirements. Publication: Datasets are often publicly available, but publications should acknowledge the source and take into account any specific licenses associated with the data. Copyright: Public weather data is typically not copyrighted, but it’s important to check the source’s specific terms.

5. Data exchange and long-term data accessibility

Data Sharing: Consider sharing our processed data (cleaned, potentially with additional features) along with the code used for analysis in a public repository (GitHub, GitHub release on zenodo). Retention: The raw data and processed data will be retained for at least ten years to facilitate potential model improvements or future research (which is supported for example by zenodo.org repository). Accessibility: Shared data and code will be accompanied by clear documentation explaining data format, processing steps, and model details to ensure usability by others.

6. Responsibilities and resources

Data Acquisition Person: One team member will be responsible for downloading data from the chosen source, managing data quality checks, and ensuring compliance with data usage policies. Statistical Modeling Person: Another team member will be responsible for data analysis, feature engineering, model development, and evaluation. Resources: Time for data acquisition, cleaning, analysis, and model development. Computational resources for statistical analysis might require additional allocation, depending on data volume and model complexity. Data Curation: After project completion, designated team members will be responsible for uploading processed data and code to chosen repositories, maintaining access for the defined retention period, and potentially updating documentation based on final model selection.

In case we need to publish additional data, we will follow the MaRDI guidelines on data management and how to choose an appropriate repository.

Notes

  1. Alerskans, E. and Kaas, E. (2021). Local temperature forecasts based on statistical post-processing of numerical weather prediction data. Meteorological Applications, 28(4):e2006.
  2. Grönquist, P., Yao, C., Ben-Nun, T., Dryden, N., Dueben, P., Li, S., and Hoefler, T. (2021). Deep learning for post-processing ensemble weather forecasts. Philosophical Transactions of the Royal Society A, 379(2194):20200092.
  3. Jézéquel, R., Gaillard, P., and Rudi, A. (2019). Efficient online learning with kernels for adversarial large scale problems. In Advances in Neural Information Processing Systems, pages 9427–9436.
Author

This example was authored by Oleksandr Zadorozhnyi from Technische Universität München. If you have questions, please do not hesitate to get in touch with us at the MaRDI helpdesk.