Answers to specific technical questions: Difference between revisions
No edit summary |
DavidNolte (talk | contribs) |
||
(25 intermediate revisions by 3 users not shown) | |||
Line 2: | Line 2: | ||
=Specific questions= | |||
''' | == Grafana: == | ||
# Are all current versions of JSON models of the customized dashboards used in Grafana tracked in repositories? There are some mentioned in README. | |||
#* No, neither dashboards nor alerts are version tracked (but there is a json provided for backup in ./grafana). Importantly, they are not backed up! | |||
# If there are JSON models for default dashboards imported, where are the sources listed? The most are mentioned in Readme of portal-compose, how is watchtower dashboard created? | |||
#* the readme is the only source currently | |||
#* (likely) adapted from https://grafana.com/grafana/dashboards/15849-watchtower/ | |||
# Would you suggest placing the dashboards in GitHub and importing them automatically when the set-up (of ~wikibase/portal-compose) is done at some stage? | |||
#* Yes, that would be wise. Also, when dashboards are edited in Grafana, these edits should also be tracked somehow. I don't know if such a Grafana-github integration exists. There is version control for dashboards within Grafana, maybe it can be useful here (i.e. via backups of the corresponding Grafana volumes). A simple solution/workaround could be the requirement that users who edit grafana dashboards should download the corresponding JSON file and commit the diff/patch in the git repo. Maybe grafana can automatize this? | |||
#* Don't forget the alert rules, they also should be -- at least -- backed up. | |||
#* I've had a quick look at automatic retrieval/imports of Grafana dashboards and alerts, "solutions" seemed cumbersome. | |||
# https://github.com/MaRDI4NFDI/docker-alpine-ext This is used to set up Grafana, Prometheus in docker-compose. Could the purpose(s) of this container be stated in its Readme? | |||
#* Go for it if you think it helps. The image is not related to either tool, though. It's simply alpine linux with gettext, just what the readme says. | |||
# How does the data from docker-backup created file (backup_full.prom) get to the Grafana dashboard? Could you describe where is it stored etc, how is it read/parsed etc. | |||
#* backup.sh of docker-backup creates the .prom file at every backup run (https://github.com/MaRDI4NFDI/docker-backup/blob/6c5694fd0b576048be83982a8bd4eaeaffb06384/backup.sh#L117-L146) . It is read by the prometheus node_exporter (specifically, [https://github.com/prometheus/node_exporter#textfile-collector textfilecollector]), cf https://github.com/MaRDI4NFDI/portal-compose/blob/504b42a5379fc5a92617cf47153e873983625247/docker-compose.yml#L403. Grafana uses prometheus as data source, hence can access all metrics read by prometheus. | |||
''David: Bottomline -- I think it would be very useful to devise a good strategy to backup and/or version control the dashboards and alerts. There are many options with varying degree of complexity and automization. Some research is definitely required.'' | |||
==Portal-Compose:== | |||
These are questions related to Readme.md in the project: | These are questions related to Readme.md in the project: | ||
Line 14: | Line 34: | ||
3. readme.md for ci: why is it required to set default test-passwords in ci as the readme suggests, some tests seem to fail locally with default password ? Point out that the CI steps are defined in main.yml. Also point out that CI builds is triggered from GitHub itself and that the trigger usually on commit, and that this can be defined in the github environment. | 3. readme.md for ci: why is it required to set default test-passwords in ci as the readme suggests, some tests seem to fail locally with default password ? Point out that the CI steps are defined in main.yml. Also point out that CI builds is triggered from GitHub itself and that the trigger usually on commit, and that this can be defined in the github environment. | ||
-> Clarified in Meeeting -> PR | |||
* "some tests seem to fail locally with default password" See question 5. | * "some tests seem to fail locally with default password" See question 5. | ||
Line 28: | Line 50: | ||
* The short answer is: since the password is stored in the database the first time you do docker-compose up, you would have to delete the volumes (not necessarily the containers), e.g. `docker volume prune` | * The short answer is: since the password is stored in the database the first time you do docker-compose up, you would have to delete the volumes (not necessarily the containers), e.g. `docker volume prune` | ||
* The long answer is: I have yet to document how to change passwords [[Private:How_to_change_passwords | on another wiki page]]. | * The long answer is: I have yet to document how to change passwords [[Private:How_to_change_passwords | on another wiki page]]. | ||
* Meeting: delete volumes locally if this error occurs | |||
6. Deploy on the MaRDI server: notes seem ok, but incomplete, how is deployment done? | 6. Deploy on the MaRDI server: notes seem ok, but incomplete, how is deployment done? | ||
* Please ask physikerwelt | * Please ask physikerwelt | ||
* https://github.com/MaRDI4NFDI/portal-compose/issues/147 | |||
7. Documentation for traefik missing. | 7. Documentation for traefik missing. | ||
* Please ask physikerwelt or dajuno | * Please ask physikerwelt or dajuno | ||
* https://github.com/MaRDI4NFDI/portal-compose/issues/148 | |||
8. Add hint in documentation of portal-compose on the linked repositories which create the custom portal containers | 8. Add hint in documentation of portal-compose on the linked repositories which create the custom portal containers | ||
Line 41: | Line 66: | ||
* Added that | * Added that | ||
==Portal-Examples:== | |||
- Check PR | - Check PR | ||
Line 56: | Line 81: | ||
* "should scripts for constant data updates running in cli be stored as ipynb?": No, these are prototypes. The real thing is in docker-importer. | * "should scripts for constant data updates running in cli be stored as ipynb?": No, these are prototypes. The real thing is in docker-importer. | ||
- Wichtige Scripte und deren Zusammenhang sind mit https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg verknüpft. | |||
== docker-importer: == | |||
- The docker-importer is a docker-container which has functionalities for data-import (e.g. from swMATH, zbMATH), and can trigger the import of data in a wikibase container in the same docker-composition cyclically. The import scripts in the importer are written in python and examples for these scripts can be seen as Jupyter-Notebooks in the repository Portal-Examples. " Is this description correct ? If so, could you add it to the README.md ? | - The docker-importer is a docker-container which has functionalities for data-import (e.g. from swMATH, zbMATH), and can trigger the import of data in a wikibase container in the same docker-composition cyclically. The import scripts in the importer are written in python and examples for these scripts can be seen as Jupyter-Notebooks in the repository Portal-Examples. " Is this description correct ? If so, could you add it to the README.md ? | ||
Line 69: | Line 94: | ||
- https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg shows that data is read from wikibase database, but not stored somewhere If this is correct, what are the plans and the current state on updating the fetched data in wikibase? | - https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg shows that data is read from wikibase database, but not stored somewhere If this is correct, what are the plans and the current state on updating the fetched data in wikibase? | ||
* Please show me tomorrow | * Please show me tomorrow, clarified in Meeting: data goes to mysql see Import data using wikibase API | ||
- The shellscript import.sh is meant to trigger the import process as soon as the crontab is activated ? The script import.sh is meant to invoke the pythonscript src/import.py and this starts the import mechanisms. The exact functionality of import.py is defined by the cli-parameters on invoke? Why is import.sh not calling the python script ? | - The shellscript import.sh is meant to trigger the import process as soon as the crontab is activated ? The script import.sh is meant to invoke the pythonscript src/import.py and this starts the import mechanisms. The exact functionality of import.py is defined by the cli-parameters on invoke? Why is import.sh not calling the python script ? | ||
Line 78: | Line 103: | ||
* No, the script is not finished. | * No, the script is not finished. | ||
* Testing can be done locally with compose file | |||
* Development process: write test in python use test to check functionality defined in python | |||
* Development can be done with running Mediawiki: Python scripts are mounted in external folders and can be modified during runtime. | |||
* Notebooks can be blueprints for the import in the importer for further development | |||
==Testing-Concept:== | |||
- Tests in Selenium are defined like this: https://github.com/MaRDI4NFDI/portal-compose/blob/main/test/MathExtensionsTest.py Would it make sense for the 'System-Testing' just to have a selenium-ci container running which tests against the url deployed portal.mardi4nfdi.de. This could be triggered with a shellscript similar like the local unit-tests after the deployment (and not by github CI). Some cases of course cant be validated by an 'extrernally' running script, but it can be sufficient as smoke tests to check basic functionality in the deployed portal. Have a flag which tags external testcases and an url switch should make it possible to reuse the already written testcases. | - Tests in Selenium are defined like this: [https://github.com/MaRDI4NFDI/portal-compose/blob/main/test/MathExtensionsTest.py https://github.com/ConceptMaRDI4NFDI/portal-compose/blob/main/test/MathExtensionsTest.py] Would it make sense for the 'System-Testing' just to have a selenium-ci container running which tests against the url deployed portal.mardi4nfdi.de. This could be triggered with a shellscript similar like the local unit-tests after the deployment (and not by github CI). Some cases of course cant be validated by an 'extrernally' running script, but it can be sufficient as smoke tests to check basic functionality in the deployed portal. Have a flag which tags external testcases and an url switch should make it possible to reuse the already written testcases. | ||
* You could do that. You would have to add the container to the docker-compose that is deployed. However, some tests could write data, delete data etc. so look out. | * You could do that. You would have to add the container to the docker-compose that is deployed. However, some tests could write data, delete data etc. so look out. | ||
* After the deployment on Mardi01 these tests are triggered at the moment (deployment script triggers the tests with specific smoke-test/system-testing flag) | |||
==docker-backup :== | |||
- What is the xml files backup of wikipages is it ? Are the wikipages themselves also in the msql-dumps ? | - What is the xml files backup of wikipages is it ? Are the wikipages themselves also in the msql-dumps ? | ||
* Backups are redundant, however there are some important differences regarding page revisions. See backup documentation linked below. | |||
- In case you know, where are the private pages backed up? | - In case you know, where are the private pages backed up? | ||
* All pages in the database are backed-up, no matter what namespace. | |||
- Could you rename images/files naming inconsistency in the docker-backup repo (is it files or all images?) i.e in line 46 in backup.sh? | - Could you rename images/files naming inconsistency in the docker-backup repo (is it files or all images?) i.e in line 46 in backup.sh? | ||
* done. see pull request https://github.com/MaRDI4NFDI/docker-backup/pull/10 | |||
- Is there any content of wikibase/mediawiki not considered for backup currently ? | - Is there any content of wikibase/mediawiki not considered for backup currently ? | ||
* .env, pull-script, traefik settings, perhaps graphana settings, logs | |||
- Discussion: Would it make sense to add a prune functionality for backups, so that some backups which are older then keep-days are kept ? | - Discussion: Would it make sense to add a prune functionality for backups, so that some backups which are older then keep-days are kept ? | ||
* that's already there. docker-backup/README | |||
** As i read from readme it deletes ALL backups older than keep days, discussion idea is to keep some still (like one of 30) to save space | |||
** would be an extension to current prune | |||
- Discussion: Would it also make sense to prune/delete obsolete logs from the other containers from this container ? (since they seem to require lots of space) | - Discussion: Would it also make sense to prune/delete obsolete logs from the other containers from this container ? (since they seem to require lots of space) | ||
* I would use logrotate for that, but it is afaik not activated at the moment | |||
- Next Steps, Discussion: Mail or Notifications in Grafana/Prometheus/Monitoring on backups ? Also observing the size of backup and logging folders with the monitoring and maximum possible space with monitoring or maybe as mail content ? | |||
* There's a commented-out section in the backup script for that, but since we don't have a mail server, it's not working. I would not recommend running your own mail server, since that's one of the things hackers are looking for. | |||
~~js: eventually check backup folders on mardi01 | |||
Notes: | |||
- Encoding problem in restoring formerly backed up xml pages https://github.com/MaRDI4NFDI/portal-compose/issues/149 | |||
==MarDI Wikibase Import Fork :== | |||
- This is a fork with additional Readme.md infos regarding the MaRDI Portal, it is a mediawiki extension which is already referenced by source in the docker-wikibase, in case there are some future MaRDI customizations for the Wikibase importer? Is it correct? | |||
* this fork ads an extra parameter to the original WiibaseImport extension. see WikibaseImport/EADME | |||
** Whats the extra parameter i hope i haven't missed this in readme ? | |||
- Are the wikibase-importer calls are already implemented in docker-importer ? If not, is it the idea to call the maintenance scripts over the python cli interface from here ?: https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/importer/Importer.py | |||
* Importing entities from Wikidata is implemented. Everything else is implemented in the Jupyter notebooks linked from https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg You would have to copy this code into the docker-imported container and test it. | |||
... smth like: <code>python >> php extensions/WikibaseImport/maintenance/importEntities.php {some cmd from python, e.g. --entity P31 --do-not-recurse}</code> | |||
... and then just to capture the script exit codes by python ? | |||
* https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/wikidata/EntityCreator.py it's already done like this here. | |||
- Is the configuration set here https://github.com/MaRDI4NFDI/WikibaseImport/blob/master_mardi/extension.json ? So in example if referencing to another wikibase than wikidata, the urls would be changed ? | |||
* That would cause naming conflicts between entities in the different wikibases. To do that, you would have to change WikibaseImporter to store the original wikibase source in the mediawiki database. See https://github.com/MaRDI4NFDI/portal-examples/blob/main/Import%20from%20zbMath/WB_wikidata_properties.ipynb for details | |||
* This would have to be checked | |||
- In case you know, what is the behaviour of <code>php maintenance/importEntities.php --all-properties</code> when calling it twice on different states of the wikibase to import (example calling it for wikidata once a year) would this overwrite the previous properties in the portal-database, is it usable for syncing a wikibase ? | |||
* Entities already imported will be ignored | |||
- ~Discussion: Usage example, in case you know, if using sparql like this https://w.wiki/Sjx (which is referring to wikidata items). Assuming the related wikidata properties have been imported to wikibase-portal and the query is done in mardi-portal-docker-query-service could there some namespace defined for the imported data, to prevent ID collisions ? | |||
* I not sure, but you probably should define a namespace if you want to query 2 wikibases at the same time, if that's what you want to do | |||
- Discussion: Is wikibase import fork necessary to realize federated queries ? Probably not, cause items could usually imported through http specification in sparql queries. | |||
* I think you should use namespaces for that | |||
Note: | |||
- If importing from multiple Wikibase with colliding ids, the Wikibase-ImportEntitites script has to be modified to include prefixes or namespaces to distinquish the data. | |||
==Docker-Wikibase :== | |||
- In the repository root there are lot's of shell-scripts, could you add a table with a short description for each shellscript in Readme.md, which mentions on what occasion each shellscript is called and what it does (in very short)? | |||
All scripts are used in the Dockerfile. Scripts in the order they are used in the Dockerfile: | |||
;clone-extension.sh | |||
:clone an extension from github with the correct branch | |||
;wait-for-it.sh | |||
:wait for ports in other containers to be actually available (not the same as waiting that the container has started) | |||
;entrypoint.sh | |||
:Entrypoint of the container, installs the wiki's maintenance/install scripts | |||
;extra-install.sh | |||
:creates elastic search index and OAuth settings for Quickstatements | |||
;extra-entrypoint-run-first.sh | |||
:Creates the elastic search index, after calling wait-for-it | |||
- To the fetcher/collector/composer structure of the Dockerfile: The docker-wikibase build is realised by multiple containers. At first an ubuntu-container is created which has git and curl which downloads mediawiki-extensions from the specified source repositories (fetcher). Then git-artifacts (.git folders) are removed. The collector is a mediawiki-docker-container, the downloaded repositories from the fetcher are now placed in the extensions folder of mediawiki in the collectors filesystem. The composer copies the mediawiki-data (including the custom extensions) !and calls composer install. Composer install launches the specific installation steps of the extensions which are usually defined in the composer.json files! ???. Finally the container-image for mardi-wikibase is created on base of mediawiki, prerequisited packages are installed, then the mediawiki-content is created from the mediawiki-files (which include the extensions) from composer, additional data and configuration is copied to the mardi-wikibase, endpoints are created. Several templates for settings are copied to the final container-image, the actual settings when using the container are defined in portal-compose, in the [https://github.com/MaRDI4NFDI/portal-compose/tree/main/mediawiki/LocalSettings.d files here]. | |||
* I think that's correct | |||
- | - Conceptual questions for the beforementioned structure in Dockerfile: | ||
-- Why is the extra composer step necessary, why is the final Wikibase image not directly created from the collector ? | |||
* Just to save space. Docker adds a new layer for each command. By doing it like this, you get a much smaller image. | |||
-- What are the multiple LocalSettings.php templates for ? | |||
* LocalSettings.php.template is the original Localsettings from the official mediawiki. | |||
* 2LocalSettings.php.wikibase-bundle.template is the original Localsettings from the wikibase docker bundle. | |||
* 3 LocalSettings.php.mardi.template activates the extensions required by the MaRDI portal | |||
2 and 3 are concatted to the final LocalSettings in shared folder in entrypoint.sh | |||
- | - Dev-Dockerfile documentation missing: The dev-image is a specific docker-wikibase build which installs some additional extensions to it, to enable development with the docker-wikibase container, it enables debugging the container by xdebug-capable IDE's. The CI automatically triggers a standard build, as well as a specifically tagged debugging build. | ||
* I'm not using this, but you could, good idea | |||
Order of settings can be set here, it would good practice to define an order Localsettings.d first then extensions or other: | |||
[https://github.com/MaRDI4NFDI/docker-wikibase/blob/main/LocalSettings.php.mardi.template - https://github.com/MaRDI4NFDI/docker-wikibase/blob/main/LocalSettings.php.mardi.template] | |||
==docker-quickstatements :== | |||
- Similar question to Docker-Wikibase, why is composer step necessary in Dockerfile? | |||
* To save space by producing a smaller image (same as above) | |||
- quickstatements/public_html seems to be the storage for patches of the original docker-quickstatements, if correct could create a hint in README.md? | |||
* Did that, although very briefly. | |||
- If the mentioned steps are already possible, could you finish documentation todos in readme? | |||
* Technically, the next steps would be possible, but the project management does not think we could do that in the context of the current funding application. | |||
- Discussion (for Future): State and Fixes for Quickstaments | |||
* Same thing, no funding | |||
Alternative: Open-Refine extension which does the same, pro: open-refine in active development | |||
'''Future MaRDI Steps:''' | '''Future MaRDI Steps:''' | ||
Line 129: | Line 246: | ||
- Suggestions and Discussion on Future MaRDI Steps which not already have been explicitly mentioned in the other questions. | - Suggestions and Discussion on Future MaRDI Steps which not already have been explicitly mentioned in the other questions. | ||
* | * [[Suggestions and Discussion on Future MaRDI Steps]] | ||
==Overall:== | |||
- Could you re-check everybody has admin-access for all repos in MaRDI4NFDI ? | - Could you re-check everybody has admin-access for all repos in MaRDI4NFDI ? | ||
Line 146: | Line 263: | ||
'''See also: [[Technical_introduction]]''' | '''See also: [[Technical_introduction]]''' | ||
=General stuff= | |||
==Backup== | ==Backup== | ||
Line 162: | Line 281: | ||
:UML action diagram showing import process from zwMath. | :UML action diagram showing import process from zwMath. | ||
;[https://github.com/MaRDI4NFDI/portal-examples/blob/main/Import%20from%20zbMath/WB_wikidata_properties.ipynb Import properties from | ;[https://github.com/MaRDI4NFDI/portal-examples/blob/main/Import%20from%20zbMath/WB_wikidata_properties.ipynb Import properties from Wikidat- Could you re-check everybody has admin-access for all repos in MaRDI4NFDI ?a] | ||
:How to to import items and properties from Wikidata into the Portal. | :How to to import items and properties from Wikidata into the Portal. | ||
Line 177: | Line 296: | ||
==Passwords== | ==Passwords== | ||
;[[Private:How_to_change_passwords | How to change passwords]] | ;[[Private:How_to_change_passwords | How to change passwords]] | ||
:Change wiki passwords and database passwords | :[[Private:How_to_change_passwords|Change wiki passwords and database passwords]] |
Latest revision as of 11:08, 30 August 2022
Sammelseite für offene Fragen (English / Deutsch).
Specific questions
Grafana:
- Are all current versions of JSON models of the customized dashboards used in Grafana tracked in repositories? There are some mentioned in README.
- No, neither dashboards nor alerts are version tracked (but there is a json provided for backup in ./grafana). Importantly, they are not backed up!
- If there are JSON models for default dashboards imported, where are the sources listed? The most are mentioned in Readme of portal-compose, how is watchtower dashboard created?
- the readme is the only source currently
- (likely) adapted from https://grafana.com/grafana/dashboards/15849-watchtower/
- Would you suggest placing the dashboards in GitHub and importing them automatically when the set-up (of ~wikibase/portal-compose) is done at some stage?
- Yes, that would be wise. Also, when dashboards are edited in Grafana, these edits should also be tracked somehow. I don't know if such a Grafana-github integration exists. There is version control for dashboards within Grafana, maybe it can be useful here (i.e. via backups of the corresponding Grafana volumes). A simple solution/workaround could be the requirement that users who edit grafana dashboards should download the corresponding JSON file and commit the diff/patch in the git repo. Maybe grafana can automatize this?
- Don't forget the alert rules, they also should be -- at least -- backed up.
- I've had a quick look at automatic retrieval/imports of Grafana dashboards and alerts, "solutions" seemed cumbersome.
- https://github.com/MaRDI4NFDI/docker-alpine-ext This is used to set up Grafana, Prometheus in docker-compose. Could the purpose(s) of this container be stated in its Readme?
- Go for it if you think it helps. The image is not related to either tool, though. It's simply alpine linux with gettext, just what the readme says.
- How does the data from docker-backup created file (backup_full.prom) get to the Grafana dashboard? Could you describe where is it stored etc, how is it read/parsed etc.
- backup.sh of docker-backup creates the .prom file at every backup run (https://github.com/MaRDI4NFDI/docker-backup/blob/6c5694fd0b576048be83982a8bd4eaeaffb06384/backup.sh#L117-L146) . It is read by the prometheus node_exporter (specifically, textfilecollector), cf https://github.com/MaRDI4NFDI/portal-compose/blob/504b42a5379fc5a92617cf47153e873983625247/docker-compose.yml#L403. Grafana uses prometheus as data source, hence can access all metrics read by prometheus.
David: Bottomline -- I think it would be very useful to devise a good strategy to backup and/or version control the dashboards and alerts. There are many options with varying degree of complexity and automization. Some research is definitely required.
Portal-Compose:
These are questions related to Readme.md in the project:
1. submodules init not necessary anymore? so it can be moved to Docker-Wikibase or completely removed from Readme
- Thanks, removed from README
2. Add volume in dev-extensions for extension in readme.md in develop-locally section (Johannes)
3. readme.md for ci: why is it required to set default test-passwords in ci as the readme suggests, some tests seem to fail locally with default password ? Point out that the CI steps are defined in main.yml. Also point out that CI builds is triggered from GitHub itself and that the trigger usually on commit, and that this can be defined in the github environment.
-> Clarified in Meeeting -> PR
- "some tests seem to fail locally with default password" See question 5.
- "CI steps are defined in main.yml": added that
- "CI builds is triggered from GitHub": added that
- "this can be defined in the github environment": that's already there isn't it?
4. Test locally: run_tests.sh. Add additional note in readme for this: Run compose-up locally then execute script
- Thanks, added to README
5. Admin password doesn't change locally (from default password) although all containers deleted before and new password defined in env variable, so the local test for this also fails.
- The short answer is: since the password is stored in the database the first time you do docker-compose up, you would have to delete the volumes (not necessarily the containers), e.g. `docker volume prune`
- The long answer is: I have yet to document how to change passwords on another wiki page.
- Meeting: delete volumes locally if this error occurs
6. Deploy on the MaRDI server: notes seem ok, but incomplete, how is deployment done?
- Please ask physikerwelt
- https://github.com/MaRDI4NFDI/portal-compose/issues/147
7. Documentation for traefik missing.
- Please ask physikerwelt or dajuno
- https://github.com/MaRDI4NFDI/portal-compose/issues/148
8. Add hint in documentation of portal-compose on the linked repositories which create the custom portal containers
- Added that
Portal-Examples:
- Check PR
- Did that
- WB_wikidata_properties.ipynb recheck with creds. See in PR.
- Did that
- Other scripts seem ok
- Reminder:, nochmal alle Scripte hier kurz durchgehen im Meeting. Question for discussion, should scripts for constant data updates running in cli be stored as ipynb format?
- "should scripts for constant data updates running in cli be stored as ipynb?": No, these are prototypes. The real thing is in docker-importer.
- Wichtige Scripte und deren Zusammenhang sind mit https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg verknüpft.
docker-importer:
- The docker-importer is a docker-container which has functionalities for data-import (e.g. from swMATH, zbMATH), and can trigger the import of data in a wikibase container in the same docker-composition cyclically. The import scripts in the importer are written in python and examples for these scripts can be seen as Jupyter-Notebooks in the repository Portal-Examples. " Is this description correct ? If so, could you add it to the README.md ?
- Thanks, added
- If the data-import will go live, the data-importer will be located in portal-compose files. The current state is WIP and therefore a custom compose with wikibase is provided. Is it correct ?
- Yes
- https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg shows that data is read from wikibase database, but not stored somewhere If this is correct, what are the plans and the current state on updating the fetched data in wikibase?
- Please show me tomorrow, clarified in Meeting: data goes to mysql see Import data using wikibase API
- The shellscript import.sh is meant to trigger the import process as soon as the crontab is activated ? The script import.sh is meant to invoke the pythonscript src/import.py and this starts the import mechanisms. The exact functionality of import.py is defined by the cli-parameters on invoke? Why is import.sh not calling the python script ?
- That's the idea, but the import script is not finished yet.
- Could the docker importer already be located in portal-compose and with a flag defining that the cron-pattern is deactivated and an external shellscript which triggers the import manually in a next-step ?
- No, the script is not finished.
- Testing can be done locally with compose file
- Development process: write test in python use test to check functionality defined in python
- Development can be done with running Mediawiki: Python scripts are mounted in external folders and can be modified during runtime.
- Notebooks can be blueprints for the import in the importer for further development
Testing-Concept:
- Tests in Selenium are defined like this: https://github.com/ConceptMaRDI4NFDI/portal-compose/blob/main/test/MathExtensionsTest.py Would it make sense for the 'System-Testing' just to have a selenium-ci container running which tests against the url deployed portal.mardi4nfdi.de. This could be triggered with a shellscript similar like the local unit-tests after the deployment (and not by github CI). Some cases of course cant be validated by an 'extrernally' running script, but it can be sufficient as smoke tests to check basic functionality in the deployed portal. Have a flag which tags external testcases and an url switch should make it possible to reuse the already written testcases.
- You could do that. You would have to add the container to the docker-compose that is deployed. However, some tests could write data, delete data etc. so look out.
- After the deployment on Mardi01 these tests are triggered at the moment (deployment script triggers the tests with specific smoke-test/system-testing flag)
docker-backup :
- What is the xml files backup of wikipages is it ? Are the wikipages themselves also in the msql-dumps ?
- Backups are redundant, however there are some important differences regarding page revisions. See backup documentation linked below.
- In case you know, where are the private pages backed up?
- All pages in the database are backed-up, no matter what namespace.
- Could you rename images/files naming inconsistency in the docker-backup repo (is it files or all images?) i.e in line 46 in backup.sh?
- done. see pull request https://github.com/MaRDI4NFDI/docker-backup/pull/10
- Is there any content of wikibase/mediawiki not considered for backup currently ?
- .env, pull-script, traefik settings, perhaps graphana settings, logs
- Discussion: Would it make sense to add a prune functionality for backups, so that some backups which are older then keep-days are kept ?
- that's already there. docker-backup/README
- As i read from readme it deletes ALL backups older than keep days, discussion idea is to keep some still (like one of 30) to save space
- would be an extension to current prune
- Discussion: Would it also make sense to prune/delete obsolete logs from the other containers from this container ? (since they seem to require lots of space)
- I would use logrotate for that, but it is afaik not activated at the moment
- Next Steps, Discussion: Mail or Notifications in Grafana/Prometheus/Monitoring on backups ? Also observing the size of backup and logging folders with the monitoring and maximum possible space with monitoring or maybe as mail content ?
- There's a commented-out section in the backup script for that, but since we don't have a mail server, it's not working. I would not recommend running your own mail server, since that's one of the things hackers are looking for.
~~js: eventually check backup folders on mardi01
Notes:
- Encoding problem in restoring formerly backed up xml pages https://github.com/MaRDI4NFDI/portal-compose/issues/149
MarDI Wikibase Import Fork :
- This is a fork with additional Readme.md infos regarding the MaRDI Portal, it is a mediawiki extension which is already referenced by source in the docker-wikibase, in case there are some future MaRDI customizations for the Wikibase importer? Is it correct?
- this fork ads an extra parameter to the original WiibaseImport extension. see WikibaseImport/EADME
- Whats the extra parameter i hope i haven't missed this in readme ?
- Are the wikibase-importer calls are already implemented in docker-importer ? If not, is it the idea to call the maintenance scripts over the python cli interface from here ?: https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/importer/Importer.py
- Importing entities from Wikidata is implemented. Everything else is implemented in the Jupyter notebooks linked from https://github.com/MaRDI4NFDI/docker-importer/blob/main/doc/activity.drawio.svg You would have to copy this code into the docker-imported container and test it.
... smth like: python >> php extensions/WikibaseImport/maintenance/importEntities.php {some cmd from python, e.g. --entity P31 --do-not-recurse}
... and then just to capture the script exit codes by python ?
- https://github.com/MaRDI4NFDI/docker-importer/blob/main/src/wikidata/EntityCreator.py it's already done like this here.
- Is the configuration set here https://github.com/MaRDI4NFDI/WikibaseImport/blob/master_mardi/extension.json ? So in example if referencing to another wikibase than wikidata, the urls would be changed ?
- That would cause naming conflicts between entities in the different wikibases. To do that, you would have to change WikibaseImporter to store the original wikibase source in the mediawiki database. See https://github.com/MaRDI4NFDI/portal-examples/blob/main/Import%20from%20zbMath/WB_wikidata_properties.ipynb for details
- This would have to be checked
- In case you know, what is the behaviour of php maintenance/importEntities.php --all-properties
when calling it twice on different states of the wikibase to import (example calling it for wikidata once a year) would this overwrite the previous properties in the portal-database, is it usable for syncing a wikibase ?
- Entities already imported will be ignored
- ~Discussion: Usage example, in case you know, if using sparql like this https://w.wiki/Sjx (which is referring to wikidata items). Assuming the related wikidata properties have been imported to wikibase-portal and the query is done in mardi-portal-docker-query-service could there some namespace defined for the imported data, to prevent ID collisions ?
- I not sure, but you probably should define a namespace if you want to query 2 wikibases at the same time, if that's what you want to do
- Discussion: Is wikibase import fork necessary to realize federated queries ? Probably not, cause items could usually imported through http specification in sparql queries.
- I think you should use namespaces for that
Note:
- If importing from multiple Wikibase with colliding ids, the Wikibase-ImportEntitites script has to be modified to include prefixes or namespaces to distinquish the data.
Docker-Wikibase :
- In the repository root there are lot's of shell-scripts, could you add a table with a short description for each shellscript in Readme.md, which mentions on what occasion each shellscript is called and what it does (in very short)?
All scripts are used in the Dockerfile. Scripts in the order they are used in the Dockerfile:
- clone-extension.sh
- clone an extension from github with the correct branch
- wait-for-it.sh
- wait for ports in other containers to be actually available (not the same as waiting that the container has started)
- entrypoint.sh
- Entrypoint of the container, installs the wiki's maintenance/install scripts
- extra-install.sh
- creates elastic search index and OAuth settings for Quickstatements
- extra-entrypoint-run-first.sh
- Creates the elastic search index, after calling wait-for-it
- To the fetcher/collector/composer structure of the Dockerfile: The docker-wikibase build is realised by multiple containers. At first an ubuntu-container is created which has git and curl which downloads mediawiki-extensions from the specified source repositories (fetcher). Then git-artifacts (.git folders) are removed. The collector is a mediawiki-docker-container, the downloaded repositories from the fetcher are now placed in the extensions folder of mediawiki in the collectors filesystem. The composer copies the mediawiki-data (including the custom extensions) !and calls composer install. Composer install launches the specific installation steps of the extensions which are usually defined in the composer.json files! ???. Finally the container-image for mardi-wikibase is created on base of mediawiki, prerequisited packages are installed, then the mediawiki-content is created from the mediawiki-files (which include the extensions) from composer, additional data and configuration is copied to the mardi-wikibase, endpoints are created. Several templates for settings are copied to the final container-image, the actual settings when using the container are defined in portal-compose, in the files here.
- I think that's correct
- Conceptual questions for the beforementioned structure in Dockerfile:
-- Why is the extra composer step necessary, why is the final Wikibase image not directly created from the collector ?
- Just to save space. Docker adds a new layer for each command. By doing it like this, you get a much smaller image.
-- What are the multiple LocalSettings.php templates for ?
- LocalSettings.php.template is the original Localsettings from the official mediawiki.
- 2LocalSettings.php.wikibase-bundle.template is the original Localsettings from the wikibase docker bundle.
- 3 LocalSettings.php.mardi.template activates the extensions required by the MaRDI portal
2 and 3 are concatted to the final LocalSettings in shared folder in entrypoint.sh
- Dev-Dockerfile documentation missing: The dev-image is a specific docker-wikibase build which installs some additional extensions to it, to enable development with the docker-wikibase container, it enables debugging the container by xdebug-capable IDE's. The CI automatically triggers a standard build, as well as a specifically tagged debugging build.
- I'm not using this, but you could, good idea
Order of settings can be set here, it would good practice to define an order Localsettings.d first then extensions or other:
- https://github.com/MaRDI4NFDI/docker-wikibase/blob/main/LocalSettings.php.mardi.template
docker-quickstatements :
- Similar question to Docker-Wikibase, why is composer step necessary in Dockerfile?
- To save space by producing a smaller image (same as above)
- quickstatements/public_html seems to be the storage for patches of the original docker-quickstatements, if correct could create a hint in README.md?
- Did that, although very briefly.
- If the mentioned steps are already possible, could you finish documentation todos in readme?
- Technically, the next steps would be possible, but the project management does not think we could do that in the context of the current funding application.
- Discussion (for Future): State and Fixes for Quickstaments
- Same thing, no funding
Alternative: Open-Refine extension which does the same, pro: open-refine in active development
Future MaRDI Steps:
- Suggestions and Discussion on Future MaRDI Steps which not already have been explicitly mentioned in the other questions.
Overall:
- Could you re-check everybody has admin-access for all repos in MaRDI4NFDI ?
- Done
- Also for Traefik dashboard https://traefik.portal.mardi4nfdi.de if this relates to aot
- Please ask physikerwelt or dajuno
See also: Technical_introduction
General stuff
Backup
- Backup and restore
- How to configure automatically and manually backup the data, and restore a backup.
Testing
- Testing concept
- General guidelines about testing the MaRDI Portal.
- Selenium, see also Deployment
- Selenium container documentation
How to import data in the portal (in development)
- Import process overview
- UML action diagram showing import process from zwMath.
- Import properties from Wikidat- Could you re-check everybody has admin-access for all repos in MaRDI4NFDI ?a
- How to to import items and properties from Wikidata into the Portal.
- Read data from zbMath
- There are millions of references to papers in the zbMath database. We just need (for now) those related to the list of mathematical software that has been imported into the MaRDI-Portal.
- Populate the portal using the Wikibase API
- Import the data read from zbMath into the data structure setup from properties imported from Wikidata.
Where to ask for help when new wikibase features are needed
- WBSG projects and WBSG Rhizome Loomio (requires login)
- The Wikibase Stakeholder Group coordinates a variety of projects for the broader Wikibase ecosystem.