Portal/TA1/guidelines/authors: Difference between revisions
Trying internal links |
m LKastner moved page Portal/TA1/guidelines/authors to Portal/TA1/guidelines/authors: Make consistent with other TA pages |
||
(2 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
If your publication relies on computer experiments or calculations, you need to | If your publication relies on computer experiments or calculations, you need to | ||
* Publish your data | * [[Portal/T1/guidelines/authors#Publish your data|Publish your data]] | ||
* Publish your code | *[[Portal/T1/guidelines/authors#Publish your code|Publish your code]] | ||
* [[Portal/T1/guidelines/authors#Write down all software versions|Write down all software versions]] | *[[Portal/T1/guidelines/authors#Write down all software versions|Write down all software versions]] | ||
* Document your hardware setup | * [[Portal/T1/guidelines/authors#Document your hardware setup|Document your hardware setup]] | ||
==Motivation== | ==Motivation== | ||
The main motivation is to make communication of computer experiments more efficient: | The main motivation is to make communication of computer experiments more efficient: | ||
* It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics. | *It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics. | ||
*Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed. | *Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed. | ||
*High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software. | *High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software. | ||
== | ==Publish your data== | ||
For a reader of your work to be able to replicate your experiment, you need to give them access to the full workflow. Hence they will need the input data or parameters that you used for your computation and the output data. If there were intermediate steps then the corresponding intermediate datasets should also be recorded. | |||
As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead. | |||
== | If your datasets are large consider uploading them to a service like [https://zenodo.org/ zenodo]. | ||
* | |||
* | ==Publish your code== | ||
Ideally if someone wanted to rerun your experiment they would only | |||
* Download your code | |||
* Install the software your code depends on | |||
* Run a script from your code that reruns the experiments. | |||
Some setups are more complicated than others, but your code is where the complexity is directly under your control. Hence you should put in effort to make this part easy for the reader. | |||
Ask yourself: | |||
* Is it clear how the reader can get my code? | |||
* Is the code easy to run? Does it come with appropriate documentation? Think of a time where it was easy for you to rerun someones code vs when it was hard. | |||
* Is the code easy to read in case the reader wants to modify it? Adhering to the principles of clean code is a good idea. | |||
Put yourself in the position of a referee. The time you want to spend is limited, so rerunning your experiments should not require more human attention than necessary. On the other hand having the computer run in the background is reasonable. | |||
==Write down all software versions == | |||
Software can change its behavior between versions. Commands that previously existed might go away or have their name changed. The in- and output data types of a function may change. If this influences your code the best case scenario is to receive an error message, but there may be more subtle changes that result in your calculations being wrong when run with new versions of the underlying software. Hence it is important to write down which software you are using and which version. | |||
The bare minimum is to record: | |||
*Operating system with version | *Operating system with version | ||
*Software used with version | * Software used with version | ||
==Document your hardware setup== | |||
Mostly this will be relevant when measuring benchmarks, like memory usage or runtime. However, depending on the underlying architecture, there can also be subtle changes that influence your results. | |||
Hence you should record | |||
* Type of machine if applicable | |||
* | *CPU information | ||
*Memory information | |||
* | |||
* |
Latest revision as of 14:40, 6 December 2023
Guidelines for authors of computer experiments in computer algebra
This is still work in progress, so the information may be incomplete. Feedback is very welcome.
Quick start
If your publication relies on computer experiments or calculations, you need to
Motivation
The main motivation is to make communication of computer experiments more efficient:
- It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics.
- Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed.
- High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software.
Publish your data
For a reader of your work to be able to replicate your experiment, you need to give them access to the full workflow. Hence they will need the input data or parameters that you used for your computation and the output data. If there were intermediate steps then the corresponding intermediate datasets should also be recorded.
As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead.
If your datasets are large consider uploading them to a service like zenodo.
Publish your code
Ideally if someone wanted to rerun your experiment they would only
- Download your code
- Install the software your code depends on
- Run a script from your code that reruns the experiments.
Some setups are more complicated than others, but your code is where the complexity is directly under your control. Hence you should put in effort to make this part easy for the reader.
Ask yourself:
- Is it clear how the reader can get my code?
- Is the code easy to run? Does it come with appropriate documentation? Think of a time where it was easy for you to rerun someones code vs when it was hard.
- Is the code easy to read in case the reader wants to modify it? Adhering to the principles of clean code is a good idea.
Put yourself in the position of a referee. The time you want to spend is limited, so rerunning your experiments should not require more human attention than necessary. On the other hand having the computer run in the background is reasonable.
Write down all software versions
Software can change its behavior between versions. Commands that previously existed might go away or have their name changed. The in- and output data types of a function may change. If this influences your code the best case scenario is to receive an error message, but there may be more subtle changes that result in your calculations being wrong when run with new versions of the underlying software. Hence it is important to write down which software you are using and which version.
The bare minimum is to record:
- Operating system with version
- Software used with version
Document your hardware setup
Mostly this will be relevant when measuring benchmarks, like memory usage or runtime. However, depending on the underlying architecture, there can also be subtle changes that influence your results.
Hence you should record
- Type of machine if applicable
- CPU information
- Memory information