Project:ZbMATH documentation
Getting started
To get started, we need to get the portal setup running on our local machine or your VM. For this, follow the instructions here.
Data Model
This are the potential properties these item types from zbMATH can have; they do not necessarily have to have all of them
Publication
- label: the publication's title
- description: scientific article; zbMATH DE number de_number
- instance of: scholarly article
- title: the publication's title
- author: links to Person items of the author(s)
- published in: link to journal item
- publication date
- zbMATH keywords: list of zbMATH keywords
- full work available at URL: link to publication
- cites work: other publication items that it references
- review text: text of the publication's review from zbMATH
- reviewer: link to Person item that has written the review
- zbMATH OpenDocument ID
- zbMATH DE number
- Mathematics Subject Classification ID
- DOI
- arXiv ID
Author
- label: The author's name
- description: None
- instance of: human
- zbMATH author ID
Journal
- label: The journal's name
- description: scientific journal
- instance of: scientific journal
Pulling the data dump
Next, we need to pull a data dump from zbMATH. It will be a full dump because as of now, there is no api parameter for getting changes made after a certain date.
- Make sure that your local portal is running. For pointers on how to do that, see the Getting started section.
- Go to the docker_importer/config directory and copy the import_config.config.template to import_config.config. Change output_directory in the config file to the directory inside the container you want the dump to be downloaded to. Then copy it to the container using
sudo docker cp import_config.config mardi-importer:/config. You might want to create a synced data directory between your container and your local files using these instructions (or copy it over afterwards using docker cp, but then it will take twice the space). - Enter the importer container using
sudo docker exec -it mardi-importer bash - Edit /mardi_importer/mardi_importer/scripts/import.py to say
importer.import_all(pull=True, push=False)in the zbmath section. - Check that in /mardi_importer/mardi_importer/zbmath/ZBMathSource.py in the
pull()function, the lines are not commented out. If the script runs without errors, this means that at the end, pulling the data dump and processing it have already been done - Go to /mardi_importer/mardi_importer/scripts and execute
python3 import.py --mode ZBMath --conf_path /config/import_config.config - If the dump process fails, look at the last date that was completely pulled and edit the config file's start after parameter.This will create a new file that you will need to merge together afterwards. In that case, you should comment out the line in ZBMathSource.py that also processes the dump because you want to merge first.
Processing the data dump
A raw datadump that has been pulled has to be processed to be in a form that the following scripts can handle. If the above step ran without restarting and errors, this has already been done. Otherwise:
- Make sure that your local portal is running. For pointers on how to do that, see the Getting started section.
- Go to the docker_importer/config directory and copy the import_config.config.template to import_config.config or edit it directly inside the container if it is already there. Change output_directory to the directory inside the container you want the processed dump to be written to. Also change raw_dump_path to the path of the raw dump file. If the config file is not already in the container, copy it there using
sudo docker cp import_config.config mardi-importer:/config. - Enter the importer container using
sudo docker exec -it mardi-importer bash - Edit /mardi_importer/mardi_importer/scripts/import.py to say
importer.import_all(pull=True, push=False)in the zbmath section. - Check that in /mardi_importer/mardi_importer/zbmath/ZBMathSource.py in the
pull()function, the first line that downloads the dump is commented out. - Go to /mardi_importer/mardi_importer/scripts and execute
python3 import.py --mode ZBMath --conf_path /config/import_config.config
Moving the dump to the correct place
The raw dump and the processed dump should be backed up in mardi02 under data/zbmath/. Create a folder there that reflects when the dump was done.
Comparing to last dump
Programming work in progress
Handling the output files
Programming work in progress
Uploading Test Run
depends on the previous 2 sections, writing this after they are completed
Real Upload
depends on the previous 3 sections, writing this after they are completed
Creating profile pages
When the upload is done, profile pages need to be created for publications and authors. This is documented here.