Project:ImporterDocumentation: Difference between revisions
From MaRDI portal
No edit summary |
No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
Methods: | Methods: | ||
* write_data_dump(self) | * ''write_data_dump(self)'' | ||
* process_data(self) | * ''process_data(self)'' | ||
===== ZBMathSource(ADataSource) ===== | |||
Class for reading data from the zbMath API using the listRecords endpoint. | |||
Methods: | |||
* '''''__init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)''''' | |||
** parameters: | |||
*** ''out_dir'': output directory for data dump and processed data | |||
*** ''tags'': tags to look for in xml | |||
*** ''from_date'': earliest publication date; default is None | |||
*** ''until_date'': latest publication date; default is None | |||
*** ''raw_dump_path'': if a data dump has already been created and only process_data should be called, this is required | |||
* '''''write_data_dump(self)''''' | |||
** overrides abstract method | |||
** uses sickle to query zbMath API with the oai_zb_preview metadata prefix and write a complete raw data dump | |||
* '''''process_data(self)''''' | |||
** overrides abstract method | |||
** reads data dump and outputs a file with the processed data in csv format | |||
** processes each record from zbMath API response separately to reduce memory requirements | |||
** where there is no information for the tags ''author, document_title, language, keywords, publication_year'' or ''serial'', the doi is queried with the Crossref API using the habanero package to retrieve this information; if nothing is found, the value is set to None | |||
== Data sources == | == Data sources == | ||
== | ===== zbMath ===== | ||
* website: https://zbmath.org/ | |||
* OAI: https://oai.zbmath.org/ |
Latest revision as of 12:50, 20 April 2022
Classes
ADataSource
Abstract base class for reading data from external sources.
Methods:
- write_data_dump(self)
- process_data(self)
ZBMathSource(ADataSource)
Class for reading data from the zbMath API using the listRecords endpoint.
Methods:
- __init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)
- parameters:
- out_dir: output directory for data dump and processed data
- tags: tags to look for in xml
- from_date: earliest publication date; default is None
- until_date: latest publication date; default is None
- raw_dump_path: if a data dump has already been created and only process_data should be called, this is required
- parameters:
- write_data_dump(self)
- overrides abstract method
- uses sickle to query zbMath API with the oai_zb_preview metadata prefix and write a complete raw data dump
- process_data(self)
- overrides abstract method
- reads data dump and outputs a file with the processed data in csv format
- processes each record from zbMath API response separately to reduce memory requirements
- where there is no information for the tags author, document_title, language, keywords, publication_year or serial, the doi is queried with the Crossref API using the habanero package to retrieve this information; if nothing is found, the value is set to None
Data sources
zbMath
- website: https://zbmath.org/
- OAI: https://oai.zbmath.org/