Project:ZbMATH documentation

From MaRDI portal
Revision as of 14:42, 12 March 2025 by Larissa (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Getting started

To get started, we need to get the portal setup running on our local machine or your VM. For this, follow the instructions here.

Data Model

This are the potential properties these item types from zbMATH can have; they do not necessarily have to have all of them

Publication
  • label: the publication's title
  • description: scientific article; zbMATH DE number de_number
  • instance of: scholarly article
  • title: the publication's title
  • author: links to Person items of the author(s)
  • published in: link to journal item
  • publication date
  • zbMATH keywords: list of zbMATH keywords
  • full work available at URL: link to publication
  • cites work: other publication items that it references
  • review text: text of the publication's review from zbMATH
  • reviewer: link to Person item that has written the review
  • zbMATH OpenDocument ID
  • zbMATH DE number
  • Mathematics Subject Classification ID
  • DOI
  • arXiv ID
Author
  • label: The author's name
  • description: None
  • instance of: human
  • zbMATH author ID
Journal
  • label: The journal's name
  • description: scientific journal
  • instance of: scientific journal

Pulling the data dump

Next, we need to pull a data dump from zbMATH. It will be a full dump because as of now, there is no api parameter for getting changes made after a certain date.

  1. Make sure that your local portal is running. For pointers on how to do that, see the Getting started section.
  2. Go to the docker_importer/config directory and copy the import_config.config.template to import_config.config. Change output_directory in the config file to the directory inside the container you want the dump to be downloaded to. Then copy it to the container using sudo docker cp import_config.config mardi-importer:/config. You might want to create a synced data directory between your container and your local files using these instructions (or copy it over afterwards using docker cp, but then it will take twice the space).
  3. Enter the importer container using sudo docker exec -it mardi-importer bash
  4. Edit /mardi_importer/mardi_importer/scripts/import.py to say importer.import_all(pull=True, push=False) in the zbmath section.
  5. Check that in /mardi_importer/mardi_importer/zbmath/ZBMathSource.py in the pull() function, the lines are not commented out. If the script runs without errors, this means that at the end, pulling the data dump and processing it have already been done
  6. Go to /mardi_importer/mardi_importer/scripts and execute python3 import.py --mode ZBMath --conf_path /config/import_config.config
  7. If the dump process fails, look at the last date that was completely pulled and edit the config file's start after parameter.This will create a new file that you will need to merge together afterwards. In that case, you should comment out the line in ZBMathSource.py that also processes the dump because you want to merge first.

Processing the data dump

A raw datadump that has been pulled has to be processed to be in a form that the following scripts can handle. If the above step ran without restarting and errors, this has already been done. Otherwise:

  1. Make sure that your local portal is running. For pointers on how to do that, see the Getting started section.
  2. Go to the docker_importer/config directory and copy the import_config.config.template to import_config.config or edit it directly inside the container if it is already there. Change output_directory to the directory inside the container you want the processed dump to be written to. Also change raw_dump_path to the path of the raw dump file. If the config file is not already in the container, copy it there using sudo docker cp import_config.config mardi-importer:/config.
  3. Enter the importer container using sudo docker exec -it mardi-importer bash
  4. Edit /mardi_importer/mardi_importer/scripts/import.py to say importer.import_all(pull=True, push=False) in the zbmath section.
  5. Check that in /mardi_importer/mardi_importer/zbmath/ZBMathSource.py in the pull() function, the first line that downloads the dump is commented out.
  6. Go to /mardi_importer/mardi_importer/scripts and execute python3 import.py --mode ZBMath --conf_path /config/import_config.config

Moving the dump to the correct place

The raw dump and the processed dump should be backed up in mardi02 under data/zbmath/. Create a folder there that reflects when the dump was done.

Comparing to last dump

Programming work in progress

Handling the output files

Programming work in progress

Uploading Test Run

depends on the previous 2 sections, writing this after they are completed

Real Upload

depends on the previous 3 sections, writing this after they are completed

Creating profile pages

When the upload is done, profile pages need to be created for publications and authors. This is documented here.