Project:ImporterDocumentation: Difference between revisions
From MaRDI portal
Line 14: | Line 14: | ||
Methods: | Methods: | ||
* ''__init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)'' | * '''''__init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)''''' | ||
** parameters: | ** parameters: | ||
*** out_dir: output directory for data dump and processed data | *** ''out_dir'': output directory for data dump and processed data | ||
*** tags: tags to look for in xml | *** ''tags'': tags to look for in xml | ||
*** from_date: earliest publication date; default is None | *** ''from_date'': earliest publication date; default is None | ||
*** until_date: latest publication date; default is None | *** ''until_date'': latest publication date; default is None | ||
*** raw_dump_path: if a data dump has already been created and only process_data should be called, this is required | *** ''raw_dump_path'': if a data dump has already been created and only process_data should be called, this is required | ||
* ''write_data_dump(self)'' | * '''''write_data_dump(self)''''' | ||
** overrides abstract method | ** overrides abstract method | ||
** uses sickle to query ZBMath API and get a complete data dump with the oai_zb_preview metadata prefix | ** uses sickle to query ZBMath API and get a complete data dump with the oai_zb_preview metadata prefix | ||
* ''process_data(self)'' | * '''''process_data(self)''''' | ||
** overrides abstract method | ** overrides abstract method | ||
** reads data dump and outputs a file with the processed data in csv format | ** reads data dump and outputs a file with the processed data in csv format |
Revision as of 12:36, 20 April 2022
Classes
ADataSource
Abstract base class for reading data from external sources.
Methods:
- write_data_dump(self)
- process_data(self)
ZBMathSource(ADataSource)
Class for reading data from the ZBMath API.
Methods:
- __init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)
- parameters:
- out_dir: output directory for data dump and processed data
- tags: tags to look for in xml
- from_date: earliest publication date; default is None
- until_date: latest publication date; default is None
- raw_dump_path: if a data dump has already been created and only process_data should be called, this is required
- parameters:
- write_data_dump(self)
- overrides abstract method
- uses sickle to query ZBMath API and get a complete data dump with the oai_zb_preview metadata prefix
- process_data(self)
- overrides abstract method
- reads data dump and outputs a file with the processed data in csv format
- processes each record from ZBMath API response separately to reduce memory requirements
- where there is no information for the tags author, document_title, language, keywords, publication_year or serial, the doi is queried to retrieve this information; if nothing is found, the value is set to None