Portal/TA1/guidelines/authors: Difference between revisions

From MaRDI portal
LKastner (talk | contribs)
Rewrite to parallelize with postcard
LKastner (talk | contribs)
mNo edit summary
Line 23: Line 23:
As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead.
As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead.


If it is large consider uploading it to a service like [https://zenodo.org/ zenodo].
If your datasets are large consider uploading them to a service like [https://zenodo.org/ zenodo].


==Publish your code==
==Publish your code==

Revision as of 10:37, 24 August 2022

Guidelines for authors of computer experiments in computer algebra

This is still work in progress, so the information may be incomplete. Feedback is very welcome.

Quick start

If your publication relies on computer experiments or calculations, you need to

Motivation

The main motivation is to make communication of computer experiments more efficient:

  • It should be easy for other mathematicians to reproduce the results of computer experiments. The hard part should always be the mathematics.
  • Computer experiments have become an integral part of publications, even in pure mathematics. Thus it is necessary to ensure high quality such that this part of a paper may also be refereed.
  • High quality documentation ensures that a computer experiment may be reproduced 100 years from now, even if it means reimplementation in new software.

Publish your data

For a reader of your work to be able to replicate your experiment, you need to give them access to the full workflow. Hence they will need the input data or parameters that you used for your computation and the output data. If there were intermediate steps then the corresponding intermediate datasets should also be recorded.

As an example think of running your algorithm on different inputs and recording the runtimes. Now you produce a nice plot of these runtimes to compare your algorithms performance with some existing implementation. It is not enough that your paper only contains the plot. To rerun your experiment the exact input data is necessary. Even if these are standard datasets they may differ depending on their version or where you exactly downloaded it from. Of course the runtimes may be different. But it is better to also record the output data if someone wanted to verify correctness instead.

If your datasets are large consider uploading them to a service like zenodo.

Publish your code

Ideally if someone wanted to rerun your experiment they would only

  • Download your code
  • Install the software your code depends on
  • Run a script from your code that reruns the experiments.

Some setups are more complicated than others, but your code is where the complexity is directly under your control. Hence you should put in effort to make this part easy for the reader.

Ask yourself:

  • Is it clear how the reader can get my code?
  • Is the code easy to run? Does it come with appropriate documentation? Think of a time where it was easy for you to rerun someones code vs when it was hard.
  • Is the code easy to read in case the reader wants to modify it? Adhering to the principles of clean code is a good idea.

Put yourself in the position of a referee. The time you want to spend is limited, so rerunning your experiments should not require more human attention than necessary. On the other hand having the computer run in the background is reasonable.

Write down all software versions

Software can change its behavior between versions. Commands that previously existed might go away or have their name changed. The in- and output data types of a function may change. If this influences your code the best case scenario is to receive an error message, but there may be more subtle changes that result in your calculations being wrong when run with new versions of the underlying software. Hence it is important to write down which software you are using and which version.

The bare minimum is to record:

  • Operating system with version
  • Software used with version

Document your hardware setup

Mostly this will be relevant when measuring benchmarks, like memory usage or runtime. However, depending on the underlying architecture, there can also be subtle changes that influence your results.

Hence you should record

  • Type of machine if applicable
  • CPU information
  • Memory information