Guidelines for peer reviewing software

As TA1 one of our tasks is improving the peer reviewing process of mathematical papers with a software component. Here is a list of potential criteria one could consider when doing such a review. This list consists mostly of suggestions. Only through sufficient experimenting and collaboration with journals, editors, conferences and researchers will we be able to determine suitable standards that are not too taxing on authors, but do increase the reliability of the code and ability of future researchers to reuse published software.

Importance of the code in the publication

Every paper is different. Sometimes code plays a major role in determining whether the results are correct, but it can also happen that the code is only used for some small examples. It is therefore good to indicate how much of the paper is dependent on software.

The paper only uses a small bit of code for simple computations
The results of the paper depend heavily on computations
The paper develops new algorithms and the code is part of the publication
The data output is part of the publication

Availability of the code

It should be possible for someone a 100 years from now on to still be able to find, reuse and maybe improve the software that was written. It is therefore important to check whether the code is available somewhere and if a good storage solution is being used.

Code unavailable: No link to the code is provided and the code can also not be found on the website of the author.
Code is available on the website of the author.
Code is available on a long term storage solution. (e.g. Github, Zenodo)
Code is available on the website of the journal.

Files provided

Just a check list of which files are being included in the publication.

Notebook (Jupyter, IPython, Sage notebooks,etc.)
Source Code
Example files and documentation
Computed data
Code that verifies computed data

Licensing

Is there a license for the code. If yes, which one?

Reproducibility of the code and ease of installation

It should be as easy as possible for someone to reuse the code. It is therefore good to check how easy it is too install the software and repeat the performed experiments/tests. The most ideal solution is some kind of Docker file or virtual machine that one is able to exactly reproduce the same conditions the authors used for there experiments. Even if it is 100 years later and all used programming languages are outdated.

Specifications of hardware used to run the code in a reasonable amount of time
Installation instructions available?
Docker file or virtual machine included?
Specification of the datasets used (including links, versions)
Specification of dependencies on algorithms developed by others
Hardware and software environment used
Instructions for repeating the computations performed to obtain the results in the paper.
Documentation and examples

Correctness and reliability

There will always be bugs in software. It is therefore good to introduce rigorous testing and comparisons to other systems to see if the code performs as intended.

Does the code give the results listed in the paper?
How does the code react to different examples other than the ones listed in the paper?
Computing the data/performing the same experiments using different software packages increases the reliability of the data.
Comparing the output in a bunch of "easier" cases of more complicated algorithms with the output of slower, but less error-sensitive algorithms increases reliability
Were any methods used to test if the computed output was correct? (e.g. if the inverse B of a matrix A was computed, one could test if AB is the identity matrix)
Is the code still being maintained? How often is it updated?

Readability

It should be possible for someone else to read the code and figure out what it does.

Indentation and formatting consistent
Naming of variables is consistent, meaningful and distinctive.
Program has a clear structure and split up in functions and files
Code is annotated

Portal/TA1/guidelines/referees

Contents