Project:ImporterDocumentation: Difference between revisions

← Older edit

VisualWikitext

Latest revision as of 13:50, 20 April 2022

Classes

ADataSource

Abstract base class for reading data from external sources.

Methods:

write_data_dump(self)
process_data(self)

ZBMathSource(ADataSource)

Class for reading data from the zbMath API using the listRecords endpoint.

Methods:

__init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)
- parameters:
  - out_dir: output directory for data dump and processed data
  - tags: tags to look for in xml
  - from_date: earliest publication date; default is None
  - until_date: latest publication date; default is None
  - raw_dump_path: if a data dump has already been created and only process_data should be called, this is required
write_data_dump(self)
- overrides abstract method
- uses sickle to query zbMath API with the oai_zb_preview metadata prefix and write a complete raw data dump
process_data(self)
- overrides abstract method
- reads data dump and outputs a file with the processed data in csv format
- processes each record from zbMath API response separately to reduce memory requirements
- where there is no information for the tags author, document_title, language, keywords, publication_year or serial, the doi is queried with the Crossref API using the habanero package to retrieve this information; if nothing is found, the value is set to None

Data sources

zbMath

website: https://zbmath.org/
OAI: https://oai.zbmath.org/

@@ Line 6: / Line 6: @@
 Methods:
-* write_data_dump(self)
+* ''write_data_dump(self)''
-* process_data(self)
+* ''process_data(self)''
+===== ZBMathSource(ADataSource) =====
+Class for reading data from the zbMath API using the listRecords endpoint.
+Methods:
+* '''''__init__(self, out_dir, tags, from_date=None, until_date=None, raw_dump_path=None)'''''
+** parameters:
+*** ''out_dir'': output directory for data dump and processed data
+*** ''tags'': tags to look for in xml
+*** ''from_date'': earliest publication date; default is None
+*** ''until_date'': latest publication date; default is None
+*** ''raw_dump_path'': if a data dump has already been created and only process_data should be called, this is required
+* '''''write_data_dump(self)'''''
+** overrides abstract method
+** uses sickle to query zbMath API  with the oai_zb_preview metadata prefix and write a complete raw data dump
+* '''''process_data(self)'''''
+** overrides abstract method
+** reads data dump and outputs a file with the processed data in csv format
+** processes each record from zbMath API response separately to reduce memory requirements
+** where there is no information for the tags ''author, document_title, language, keywords, publication_year'' or ''serial'', the doi is queried with the Crossref API using the habanero package to retrieve this information; if nothing is found, the value is set to None
 == Data sources ==
-== Design decisions ==
+===== zbMath =====
+* website: https://zbmath.org/
+* OAI: https://oai.zbmath.org/