Project:ImporterDocumentation: Difference between revisions

Revision as of 12:42, 20 April 2022

Abstract base class for reading data from external sources.

Methods:

Class for reading data from the zbMath API using the listRecords endpoint.

Methods:

__init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)
- parameters:
  - out_dir: output directory for data dump and processed data
  - tags: tags to look for in xml
  - from_date: earliest publication date; default is None
  - until_date: latest publication date; default is None
  - raw_dump_path: if a data dump has already been created and only process_data should be called, this is required
write_data_dump(self)
- overrides abstract method
- uses sickle to query zbMath API and get a complete data dump with the oai_zb_preview metadata prefix
process_data(self)
- overrides abstract method
- reads data dump and outputs a file with the processed data in csv format
- processes each record from zbMath API response separately to reduce memory requirements
- where there is no information for the tags author, document_title, language, keywords, publication_year or serial, the doi is queried with the Crossref API using the habanero package to retrieve this information; if nothing is found, the value is set to None

@@ Line 28: / Line 28: @@
 ** reads data dump and outputs a file with the processed data in csv format
 ** processes each record from zbMath API response separately to reduce memory requirements
-** where there is no information for the tags ''author, document_title, language, keywords, publication_year'' or ''serial'', the doi is queried to retrieve this information; if nothing is found, the value is set to None
+** where there is no information for the tags ''author, document_title, language, keywords, publication_year'' or ''serial'', the doi is queried with the Crossref API using the habanero package to retrieve this information; if nothing is found, the value is set to None
 == Data sources ==