Latest revision as of 15:40, 6 December 2023

Guidelines for authors of computer experiments in computer algebra

This is still work in progress, so the information may be incomplete. Feedback is very welcome.

Quick start

If your publication relies on computer experiments or calculations, you need to

Motivation

The main motivation is to make communication of computer experiments more efficient:

It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics.
Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed.
High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software.

Publish your data

For a reader of your work to be able to replicate your experiment, you need to give them access to the full workflow. Hence they will need the input data or parameters that you used for your computation and the output data. If there were intermediate steps then the corresponding intermediate datasets should also be recorded.

As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead.

If your datasets are large consider uploading them to a service like zenodo.

Publish your code

Ideally if someone wanted to rerun your experiment they would only

Download your code
Install the software your code depends on
Run a script from your code that reruns the experiments.

Some setups are more complicated than others, but your code is where the complexity is directly under your control. Hence you should put in effort to make this part easy for the reader.

Ask yourself:

Is it clear how the reader can get my code?
Is the code easy to run? Does it come with appropriate documentation? Think of a time where it was easy for you to rerun someones code vs when it was hard.
Is the code easy to read in case the reader wants to modify it? Adhering to the principles of clean code is a good idea.

Put yourself in the position of a referee. The time you want to spend is limited, so rerunning your experiments should not require more human attention than necessary. On the other hand having the computer run in the background is reasonable.

Write down all software versions

Software can change its behavior between versions. Commands that previously existed might go away or have their name changed. The in- and output data types of a function may change. If this influences your code the best case scenario is to receive an error message, but there may be more subtle changes that result in your calculations being wrong when run with new versions of the underlying software. Hence it is important to write down which software you are using and which version.

The bare minimum is to record:

Operating system with version
Software used with version

Document your hardware setup

Mostly this will be relevant when measuring benchmarks, like memory usage or runtime. However, depending on the underlying architecture, there can also be subtle changes that influence your results.

Hence you should record

Type of machine if applicable
CPU information
Memory information

@@ Line 4: / Line 4: @@
 Feedback is very welcome.
-== Motivation ==
+== Quick start==
-The main motivation is to make communication of computer experiments more efficient:
+If your publication relies on computer experiments or calculations, you need to
-* It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics.
-* Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed.
-* High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software.
-== Environment ==
+* [[Portal/T1/guidelines/authors#Publish your data|Publish your data]]
-This section contains pointers what needs to be recorded for other people to be able to understand and reproduce the experiment.
+*[[Portal/T1/guidelines/authors#Publish your code|Publish your code]]
+*[[Portal/T1/guidelines/authors#Write down all software versions|Write down all software versions]]
+* [[Portal/T1/guidelines/authors#Document your hardware setup|Document your hardware setup]]
-=== Hardware ===
+==Motivation==
-* Type of machine if applicable
+The main motivation is to make communication of computer experiments more efficient:
-* CPU information
+*It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics.
-* Memory information
+*Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed.
+*High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software.
+==Publish your data==
+For a reader of your work to be able to replicate your experiment, you need to give them access to the full workflow. Hence they will need the input data or parameters that you used for your computation and the output data. If there were intermediate steps then the corresponding intermediate datasets should also be recorded.
+As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead.
+If your datasets are large consider uploading them to a service like [https://zenodo.org/ zenodo].
+==Publish your code==
+Ideally if someone wanted to rerun your experiment they would only
+* Download your code
+* Install the software your code depends on
+* Run a script from your code that reruns the experiments.
+Some setups are more complicated than others, but your code is where the complexity is directly under your control. Hence you should put in effort to make this part easy for the reader.
+Ask yourself:
+* Is it clear how the reader can get my code?
+* Is the code easy to run? Does it come with appropriate documentation? Think of a time where it was easy for you to rerun someones code vs when it was hard.
+* Is the code easy to read in case the reader wants to modify it? Adhering to the principles of clean code is a good idea.
+Put yourself in the position of a referee. The time you want to spend is limited, so rerunning your experiments should not require more human attention than necessary. On the other hand having the computer run in the background is reasonable.
+==Write down all software versions ==
+Software can change its behavior between versions. Commands that previously existed might go away or have their name changed. The in- and output data types of a function may change. If this influences your code the best case scenario is to receive an error message, but there may be more subtle changes that result in your calculations being wrong when run with new versions of the underlying software. Hence it is important to write down which software you are using and which version.
-=== Software ===
+The bare minimum is to record:
-* Operating system with version
+*Operating system with version
 * Software used with version
+==Document your hardware setup==
+Mostly this will be relevant when measuring benchmarks, like memory usage or runtime. However, depending on the underlying architecture, there can also be subtle changes that influence your results.
-== Experiment details ==
+Hence you should record
-* Collect all your code
+* Type of machine if applicable
-* Make sure your code is easy to read, adhering to the principles of clean code is a good idea
+*CPU information
-* It may take some time for a reader to understand your code. It should not take time to make it run. Provide a central script that reruns your experiment.
+*Memory information
-* Provide any in- and output data from your experiment. If it is large consider uploading it to a service like [https://zenodo.org/ zenodo].
-* If your environment is complicated to set up, consider providing a Docker container or a Vagrant virtual machine.

Portal/TA1/guidelines/authors: Difference between revisions