Portal/rdm/examples/weather prediction: Difference between revisions
Zadorozhnyi (talk | contribs) minor change |
mNo edit summary |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 11: | Line 11: | ||
for Medium-Range Weather Forecasts ([https://www.ecmwf.int/en/forecasts ECMWF]). '''Data Types:''' Measurement data (temperature, | for Medium-Range Weather Forecasts ([https://www.ecmwf.int/en/forecasts ECMWF]). '''Data Types:''' Measurement data (temperature, | ||
precipitation, humidity, wind speed/direction, pressure). '''Data Processing:''' Algorithms. Moving Average Filter: This algorithm can smooth out short-term fluctuations in the data, potentially revealing | precipitation, humidity, wind speed/direction, pressure). '''Data Processing:''' Algorithms. Moving Average Filter: This algorithm can smooth out short-term fluctuations in the data, potentially revealing | ||
underlying trends relevant to weather prediction <ref> | underlying trends relevant to weather prediction <ref>Alerskans, E. and Kaas, E. (2021). [https://doi.org/10.1002/met.2006 Local temperature forecasts based on statistical post-processing of numerical weather prediction data.] ''Meteorological Applications'', 28(4):e2006.</ref>. Standard Deviation | ||
Filter: This can identify outliers that deviate significantly from the average, potentially indicating | Filter: This can identify outliers that deviate significantly from the average, potentially indicating | ||
errors or unusual weather events <ref> | errors or unusual weather events <ref>Grönquist, P., Yao, C., Ben-Nun, T., Dryden, N., Dueben, P., Li, S., and Hoefler, T. (2021). [https://doi.org/10.1098/rsta.2020.0092 Deep learning for post-processing ensemble weather forecasts.] ''Philosophical Transactions of the Royal Society A'', | ||
<ref> | 379(2194):20200092.</ref>. Regularised online forecaster: This algorithm is based on the regularised prediction schemes which returns non-parametric prediction rules | ||
<ref>Jézéquel, R., Gaillard, P., and Rudi, A. (2019). [https://proceedings.neurips.cc/paper/2019/hash/faad95253aee7437871781018bdf3309-Abstract.html Efficient online learning with kernels for adversarial large scale problems.] In ''Advances in Neural Information Processing Systems'', pages 9427–9436.</ref> in (possibly) infinite-dimensional spaces. '''Feature Engineering:''' Create | |||
new variables relevant to prediction (e.g., temperature difference from previous day, dew point). | new variables relevant to prediction (e.g., temperature difference from previous day, dew point). | ||
'''Data Volume''': Public weather datasets can be quite large, depending on the chosen timeframe and | '''Data Volume''': Public weather datasets can be quite large, depending on the chosen timeframe and | ||
Line 34: | Line 35: | ||
==== 4. Legal obligations and conditions ==== | ==== 4. Legal obligations and conditions ==== | ||
'''Data Source:''' Comply with the chosen source’s data usage | '''Data Source:''' Comply with the chosen source’s data usage policies and attribution requirements. '''Publication:''' Datasets are often publicly available, but publications should acknowledge the source and take into account any specific licenses associated with the | ||
data. '''Copyright:''' Public weather data is typically not copyrighted, but it’s important to check the | data. '''Copyright:''' Public weather data is typically not copyrighted, but it’s important to check the | ||
source’s specific terms. | source’s specific terms. | ||
Line 55: | Line 54: | ||
code to chosen repositories, maintaining access for the defined retention period, and potentially | code to chosen repositories, maintaining access for the defined retention period, and potentially | ||
updating documentation based on final model selection. | updating documentation based on final model selection. | ||
In case we need to publish additional data, we will follow the | In case we need to publish additional data, we will follow the | ||
[https://www.mardi4nfdi.de MaRDI] guidelines on data management and how to | [https://www.mardi4nfdi.de MaRDI] guidelines on data management and how to | ||
choose an appropriate repository. | choose an appropriate repository. | ||
== References == | |||
<references /> | |||
====== Author ====== | |||
== | This example was authored by [https://www.math.cit.tum.de/math/personen/wissenschaftliches-personal/oleksandr-zadorozhnyi/ Oleksandr Zadorozhnyi] from Technische Universität München. If you have questions, please do not hesitate to get in touch with us at the [https://www.mardi4nfdi.de/community/help-desk MaRDI helpdesk]. | ||
Latest revision as of 09:54, 27 May 2024
Example: RDM in a hypothetical project on weather prediction
Project description
This project plan demonstrates a blueprint how the data can be managed in a project of weather prediction using publicly available data sources. It ensures data quality, accessibility, and potential contributions to future research.
1. Data description
Existing Data Reuse: Leverage publicly available historical weather data from a source like the National Oceanic and Atmospheric Administration (NOAA) or the European Centre for Medium-Range Weather Forecasts (ECMWF). Data Types: Measurement data (temperature, precipitation, humidity, wind speed/direction, pressure). Data Processing: Algorithms. Moving Average Filter: This algorithm can smooth out short-term fluctuations in the data, potentially revealing underlying trends relevant to weather prediction [1]. Standard Deviation Filter: This can identify outliers that deviate significantly from the average, potentially indicating errors or unusual weather events [2]. Regularised online forecaster: This algorithm is based on the regularised prediction schemes which returns non-parametric prediction rules [3] in (possibly) infinite-dimensional spaces. Feature Engineering: Create new variables relevant to prediction (e.g., temperature difference from previous day, dew point). Data Volume: Public weather datasets can be quite large, depending on the chosen timeframe and spatial resolution. We may need to download and process data in chunks.
2. Documentation and data quality
Metadata: Record metadata like the source agency, data format (e.g., CSV), time period covered, spatial resolution (e.g., zip code, city), any other data processing steps applied. Quality Control: Perform data quality checks for outliers, inconsistencies, and missing values. Statistical methods like q-q plots and analysis of interquartile range (IQR) can identify potential anomalies. Software Tools: Python programming language with following libraries: Pandas, NumPy, and scikit-learn will be used for data analysis and model building.
3. Storage and technical archiving the project
Storage: The downloaded data to be stored in a secure, version-controlled repository like zenodo. Processed data (cleaned, engineered) will be saved as separate files for clarity. Data Security: The repository will be restricted to project members. The results of the experiments will obtain open access on the repository. Downloaded data might be compressed to save storage space.
4. Legal obligations and conditions
Data Source: Comply with the chosen source’s data usage policies and attribution requirements. Publication: Datasets are often publicly available, but publications should acknowledge the source and take into account any specific licenses associated with the data. Copyright: Public weather data is typically not copyrighted, but it’s important to check the source’s specific terms.
5. Data exchange and long-term data accessibility
Data Sharing: Consider sharing our processed data (cleaned, potentially with additional features) along with the code used for analysis in a public repository (GitHub, GitHub release on zenodo). Retention: The raw data and processed data will be retained for at least ten years to facilitate potential model improvements or future research (which is supported for example by zenodo.org repository). Accessibility: Shared data and code will be accompanied by clear documentation explaining data format, processing steps, and model details to ensure usability by others.
6. Responsibilities and resources
Data Acquisition Person: One team member will be responsible for downloading data from the chosen source, managing data quality checks, and ensuring compliance with data usage policies. Statistical Modeling Person: Another team member will be responsible for data analysis, feature engineering, model development, and evaluation. Resources: Time for data acquisition, cleaning, analysis, and model development. Computational resources for statistical analysis might require additional allocation, depending on data volume and model complexity. Data Curation: After project completion, designated team members will be responsible for uploading processed data and code to chosen repositories, maintaining access for the defined retention period, and potentially updating documentation based on final model selection.
In case we need to publish additional data, we will follow the MaRDI guidelines on data management and how to choose an appropriate repository.
References
- ↑ Alerskans, E. and Kaas, E. (2021). Local temperature forecasts based on statistical post-processing of numerical weather prediction data. Meteorological Applications, 28(4):e2006.
- ↑ Grönquist, P., Yao, C., Ben-Nun, T., Dryden, N., Dueben, P., Li, S., and Hoefler, T. (2021). Deep learning for post-processing ensemble weather forecasts. Philosophical Transactions of the Royal Society A, 379(2194):20200092.
- ↑ Jézéquel, R., Gaillard, P., and Rudi, A. (2019). Efficient online learning with kernels for adversarial large scale problems. In Advances in Neural Information Processing Systems, pages 9427–9436.
Author
This example was authored by Oleksandr Zadorozhnyi from Technische Universität München. If you have questions, please do not hesitate to get in touch with us at the MaRDI helpdesk.