Answers to specific technical questions: Difference between revisions

From MaRDI portal
Alvaro (talk | contribs)
Alvaro (talk | contribs)
Line 114: Line 114:


- This is a fork with additional Readme.md infos regarding the MaRDI Portal, it is a mediawiki extension which is already referenced by source in the docker-wikibase, in case there are some future MaRDI customizations for the Wikibase importer? Is it correct?   
- This is a fork with additional Readme.md infos regarding the MaRDI Portal, it is a mediawiki extension which is already referenced by source in the docker-wikibase, in case there are some future MaRDI customizations for the Wikibase importer? Is it correct?   
* this fork ads an extra parameter to the original WiibaseImport extension. see WikibaseImport/EADME


- Are the wikibase-importer calls are already implemented in docker-importer ? If not, is it the idea to call the maintenance scripts over the python cli interface from here ?: https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/importer/Importer.py   
- Are the wikibase-importer calls are already implemented in docker-importer ? If not, is it the idea to call the maintenance scripts over the python cli interface from here ?: https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/importer/Importer.py   
* Importing entities from Wikidata is implemented. Everything else is implemented in the Jupyter notebooks linked from https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg You would have to copy this code into the docker-imported container and test it.


...  smth like: <code>python >> php extensions/WikibaseImport/maintenance/importEntities.php  {some cmd from python, e.g. --entity P31 --do-not-recurse}</code>   
...  smth like: <code>python >> php extensions/WikibaseImport/maintenance/importEntities.php  {some cmd from python, e.g. --entity P31 --do-not-recurse}</code>   


... and then just to capture the script exit codes by python ?   
... and then just to capture the script exit codes by python ?   
* perhaps, but the original idea of docker-importer is to run import.sh from a cronjob


- Is the configuration set here https://github.com/MaRDI4NFDI/WikibaseImport/blob/master_mardi/extension.json ? So in example if referencing to another wikibase than wikidata, the urls would be changed ?   
- Is the configuration set here https://github.com/MaRDI4NFDI/WikibaseImport/blob/master_mardi/extension.json ? So in example if referencing to another wikibase than wikidata, the urls would be changed ?   
* That would cause naming conflicts between entities in the different wikibases. To do that, you would have to change WikibaseImporter to store the original wikibase source in the mediawiki database. See https://github.com/MaRDI4NFDI/portal-examples/blob/main/Import%20from%20zbMath/WB_wikidata_properties.ipynb for details


- In case you know, what is the behaviour of <code>php maintenance/importEntities.php --all-properties</code> when calling it twice on different states of the wikibase to import (example calling it for wikidata once a year) would this overwrite the previous properties in the portal-database, is it usable for syncing a wikibase ?   
- In case you know, what is the behaviour of <code>php maintenance/importEntities.php --all-properties</code> when calling it twice on different states of the wikibase to import (example calling it for wikidata once a year) would this overwrite the previous properties in the portal-database, is it usable for syncing a wikibase ?   
* Entities already imported will be ignored


- ~Discussion: Usage example, in case you know, if using sparql like this https://w.wiki/Sjx (which is referring to wikidata items). Assuming the related wikidata properties have been imported to wikibase-portal and the query is done in mardi-portal-docker-query-service could there some namespace defined for the imported data, to prevent ID collisions ?   
- ~Discussion: Usage example, in case you know, if using sparql like this https://w.wiki/Sjx (which is referring to wikidata items). Assuming the related wikidata properties have been imported to wikibase-portal and the query is done in mardi-portal-docker-query-service could there some namespace defined for the imported data, to prevent ID collisions ?   
* I not sure, but you probably should define a namespace if you want to query 2 wikibases at the same time, if that's what you want to do


- Discussion: Is wikibase import fork necessary to realize federated queries ? Probably not, cause items could usually imported through http specification in sparql queries.    
- Discussion: Is wikibase import fork necessary to realize federated queries ? Probably not, cause items could usually imported through http specification in sparql queries.
* I think you should use namespaces for that


==Docker-Wikibase :==
==Docker-Wikibase :==

Revision as of 19:42, 16 March 2022

Sammelseite für offene Fragen (English / Deutsch).


Specific questions

Portal-Compose:

These are questions related to Readme.md in the project:

1. submodules init not necessary anymore? so it can be moved to Docker-Wikibase or completely removed from Readme

  • Thanks, removed from README

2. Add volume in dev-extensions for extension in readme.md in develop-locally section (Johannes)

3. readme.md for ci: why is it required to set default test-passwords in ci as the readme suggests, some tests seem to fail locally with default password ? Point out that the CI steps are defined in main.yml. Also point out that CI builds is triggered from GitHub itself and that the trigger usually on commit, and that this can be defined in the github environment.

  • "some tests seem to fail locally with default password" See question 5.
  • "CI steps are defined in main.yml": added that
  • "CI builds is triggered from GitHub": added that
  • "this can be defined in the github environment": that's already there isn't it?

4. Test locally: run_tests.sh. Add additional note in readme for this: Run compose-up locally then execute script

  • Thanks, added to README

5. Admin password doesn't change locally (from default password) although all containers deleted before and new password defined in env variable, so the local test for this also fails.

  • The short answer is: since the password is stored in the database the first time you do docker-compose up, you would have to delete the volumes (not necessarily the containers), e.g. `docker volume prune`
  • The long answer is: I have yet to document how to change passwords on another wiki page.

6. Deploy on the MaRDI server: notes seem ok, but incomplete, how is deployment done?

  • Please ask physikerwelt

7. Documentation for traefik missing.

  • Please ask physikerwelt or dajuno

8. Add hint in documentation of portal-compose on the linked repositories which create the custom portal containers

  • Added that

Portal-Examples:

- Check PR

  • Did that

- WB_wikidata_properties.ipynb recheck with creds. See in PR.

  • Did that

- Other scripts seem ok

- Reminder:, nochmal alle Scripte hier kurz durchgehen im Meeting. Question for discussion, should scripts for constant data updates running in cli be stored as ipynb format?

  • "should scripts for constant data updates running in cli be stored as ipynb?": No, these are prototypes. The real thing is in docker-importer.

docker-importer:

- The docker-importer is a docker-container which has functionalities for data-import (e.g. from swMATH, zbMATH), and can trigger the import of data in a wikibase container in the same docker-composition cyclically. The import scripts in the importer are written in python and examples for these scripts can be seen as Jupyter-Notebooks in the repository Portal-Examples. " Is this description correct ? If so, could you add it to the README.md ?

  • Thanks, added

- If the data-import will go live, the data-importer will be located in portal-compose files. The current state is WIP and therefore a custom compose with wikibase is provided. Is it correct ?

  • Yes

- https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg shows that data is read from wikibase database, but not stored somewhere If this is correct, what are the plans and the current state on updating the fetched data in wikibase?

  • Please show me tomorrow

- The shellscript import.sh is meant to trigger the import process as soon as the crontab is activated ? The script import.sh is meant to invoke the pythonscript src/import.py and this starts the import mechanisms. The exact functionality of import.py is defined by the cli-parameters on invoke? Why is import.sh not calling the python script ?

  • That's the idea, but the import script is not finished yet.

- Could the docker importer already be located in portal-compose and with a flag defining that the cron-pattern is deactivated and an external shellscript which triggers the import manually in a next-step ?

  • No, the script is not finished.

Testing-Concept:

- Tests in Selenium are defined like this: https://github.com/MaRDI4NFDI/portal-compose/blob/main/test/MathExtensionsTest.py Would it make sense for the 'System-Testing' just to have a selenium-ci container running which tests against the url deployed portal.mardi4nfdi.de. This could be triggered with a shellscript similar like the local unit-tests after the deployment (and not by github CI). Some cases of course cant be validated by an 'extrernally' running script, but it can be sufficient as smoke tests to check basic functionality in the deployed portal. Have a flag which tags external testcases and an url switch should make it possible to reuse the already written testcases.

  • You could do that. You would have to add the container to the docker-compose that is deployed. However, some tests could write data, delete data etc. so look out.


docker-backup :

- What is the xml files backup of wikipages is it ? Are the wikipages themselves also in the msql-dumps ?

  • Backups are redundant, however there are some important differences regarding page revisions. See backup documentation linked below.

- In case you know, where are the private pages backed up?

  • All pages in the database are backed-up, no matter what namespace.

- Could you rename images/files naming inconsistency in the docker-backup repo (is it files or all images?) i.e in line 46 in backup.sh?

- Is there any content of wikibase/mediawiki not considered for backup currently ?

  • .env, pull-script, traefik settings, perhaps graphana settings, logs

- Discussion: Would it make sense to add a prune functionality for backups, so that some backups which are older then keep-days are kept ?

  • that's already there. docker-backup/README

- Discussion: Would it also make sense to prune/delete obsolete logs from the other containers from this container ? (since they seem to require lots of space)

  • I would use logrotate for that, but it is afaik not activated at the moment

- Next Steps, Discussion: Mail or Notifications in Grafana/Monitoring on backups ? Also observing the size of backup and logging folders with the monitoring and maximum possible space with monitoring or maybe as mail content ?

  • There's a commented-out section in the backup script for that, but since we don't have a mail server, it's not working. I would not recommend running your own mail server, since that's one of the things hackers are looking for.

~~js: eventually check backup folders on mardi01

MarDI Wikibase Import Fork :

- This is a fork with additional Readme.md infos regarding the MaRDI Portal, it is a mediawiki extension which is already referenced by source in the docker-wikibase, in case there are some future MaRDI customizations for the Wikibase importer? Is it correct?

  • this fork ads an extra parameter to the original WiibaseImport extension. see WikibaseImport/EADME

- Are the wikibase-importer calls are already implemented in docker-importer ? If not, is it the idea to call the maintenance scripts over the python cli interface from here ?: https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/importer/Importer.py

... smth like: python >> php extensions/WikibaseImport/maintenance/importEntities.php {some cmd from python, e.g. --entity P31 --do-not-recurse}

... and then just to capture the script exit codes by python ?

  • perhaps, but the original idea of docker-importer is to run import.sh from a cronjob

- Is the configuration set here https://github.com/MaRDI4NFDI/WikibaseImport/blob/master_mardi/extension.json ? So in example if referencing to another wikibase than wikidata, the urls would be changed ?

- In case you know, what is the behaviour of php maintenance/importEntities.php --all-properties when calling it twice on different states of the wikibase to import (example calling it for wikidata once a year) would this overwrite the previous properties in the portal-database, is it usable for syncing a wikibase ?

  • Entities already imported will be ignored

- ~Discussion: Usage example, in case you know, if using sparql like this https://w.wiki/Sjx (which is referring to wikidata items). Assuming the related wikidata properties have been imported to wikibase-portal and the query is done in mardi-portal-docker-query-service could there some namespace defined for the imported data, to prevent ID collisions ?

  • I not sure, but you probably should define a namespace if you want to query 2 wikibases at the same time, if that's what you want to do

- Discussion: Is wikibase import fork necessary to realize federated queries ? Probably not, cause items could usually imported through http specification in sparql queries.

  • I think you should use namespaces for that

Docker-Wikibase :

- In the repository root there are lot's of shell-scripts, could you add a table with a short description for each shellscript in Readme.md, which mentions on what occasion each shellscript is called and what it does (in very short)?

- To the fetcher/collector/composer structure of the Dockerfile: The docker-wikibase build is realised by multiple containers. At first an ubuntu-container is created which has git and curl which downloads mediawiki-extensions from the specified source repositories (fetcher). Then git-artifacts (.git folders) are removed. The collector is a mediawiki-docker-container, the downloaded repositories from the fetcher are now placed in the extensions folder of mediawiki in the collectors filesystem. The composer copies the mediawiki-data (including the custom extensions) !and calls composer install. Composer install launches the specific installation steps of the extensions which are usually defined in the composer.json files! ???. Finally the container-image for mardi-wikibase is created on base of mediawiki, prerequisited packages are installed, then the mediawiki-content is created from the mediawiki-files (which include the extensions) from composer, additional data and configuration is copied to the mardi-wikibase, endpoints are created. Several templates for settings are copied to the final container-image, the actual settings when using the container are defined in portal-compose, in the files here.

- Conceptual questions for the beforementioned structure in Dockerfile:

-- Why is the extra composer step necessary, why is the final Wikibase image not directly created from the collector  ?

-- What are the multiple LocalSettings.php templates for ?

- Dev-Dockerfile documentation missing: The dev-image is a specific docker-wikibase build which installs some additional extensions to it, to enable development with the docker-wikibase container, it enables debugging the container by xdebug-capable IDE's. The CI automatically triggers a standard build, as well as a specifically tagged debugging build.

docker-quickstatements :

- Similar question to Docker-Wikibase, why is composer step necessary in Dockerfile?

- quickstatements/public_html seems to be the storage for patches of the original docker-quickstatements, if correct could create a hint in README.md?

- If the mentioned steps are already possible, could you finish documentation todos in readme?

- Discussion (for Future): State and Fixes for Quickstaments

Future MaRDI Steps:

- Suggestions and Discussion on Future MaRDI Steps which not already have been explicitly mentioned in the other questions.

  • Perhaps I would do that on another page

Overall:

- Could you re-check everybody has admin-access for all repos in MaRDI4NFDI ?

  • Done

- Also for Traefik dashboard https://traefik.portal.mardi4nfdi.de if this relates to aot

  • Please ask physikerwelt or dajuno



See also: Technical_introduction

General stuff

Backup

Backup and restore
How to configure automatically and manually backup the data, and restore a backup.

Testing

Testing concept
General guidelines about testing the MaRDI Portal.
Selenium, see also Deployment
Selenium container documentation

How to import data in the portal (in development)

Import process overview
UML action diagram showing import process from zwMath.
Import properties from Wikidata
How to to import items and properties from Wikidata into the Portal.
Read data from zbMath
There are millions of references to papers in the zbMath database. We just need (for now) those related to the list of mathematical software that has been imported into the MaRDI-Portal.
Populate the portal using the Wikibase API
Import the data read from zbMath into the data structure setup from properties imported from Wikidata.

Where to ask for help when new wikibase features are needed

WBSG projects and WBSG Rhizome Loomio (requires login)
The Wikibase Stakeholder Group coordinates a variety of projects for the broader Wikibase ecosystem.

Passwords

How to change passwords
Change wiki passwords and database passwords.