Project:Debug WDQS: Difference between revisions
From MaRDI portal
EloiFerrer (talk | contribs) No edit summary |
EloiFerrer (talk | contribs) |
||
| (3 intermediate revisions by the same user not shown) | |||
| Line 3: | Line 3: | ||
=== WDQS architecture === | === WDQS architecture === | ||
# Whenever a wikibase item is created, all its information (representing the statements) are saved as a JSON object in the MediaWiki table that saves page information. | |||
# Given that all the statements are just packed as a single JSON object inside a mySQL database, it is not possible to use SQL queries to just query items for particular statements. | |||
# To be able to perform these kind of queries (e.g. what items have a property value equal to X for property Y), all the statements are copied in another database, which is not mySQL based. This other database is a graph database using Blazegraph, which accepts queries in SPARQL. | |||
# Having two databases (mySQL and Blazegraph) that contain the same information requires a mechanism to keep them in synchronization. This mechanism runs in the <code>docker-wdqs-updater-1</code> container. All it does are API calls to the RecentChanges API endpoint from Mediawiki to check the recent changes in Item and Property pages (pages in the <code>Item:</code> and <code>Property:</code> namespaces). Then, it processes these changes and it pushes them into Blazegraph in RDF format. | |||
# It is important to keep in mind that the MediaWiki mySQL table is the source of truth. Whenever we are trying to debug an error related to WDQS we should first make sure that the information is shown correctly in the Item/Property page. | |||
# The information being shown in the profile pages (e.g. <code>Person:</code>, <code>Publication:</code>, ... namespaces) is often retrieved using SPARQL queries, which means that it is read from Blazegraph, not from mySQL. | |||
=== WDQS containers === | === WDQS containers === | ||
The entire WDQS service is based on four docker containers: | The entire WDQS service is based on four docker containers: | ||
# WDQS backend | # '''WDQS backend''' <code>docker-wdqs-1</code>: Main database container. It runs the Blazegraph instance and contains all the data. | ||
# WDQS frontend | # '''WDQS frontend''' <code>mardi-wdqs-frontend</code>: Simple frontend application to write SPARQL queries and send them to the Blazegraph database. Available at [https://query.portal.mardi4nfdi.de Query Service UI]. | ||
# WDQS updater | # '''WDQS updater''' <code>docker-wdqs-updater-1</code>: Container running the updater process. It queries the MediaWiki RecentChanges API and inserts the changes into the Blazegraph database. | ||
# WDQS proxy | # '''WDQS proxy''' <code>docker-wdqs-proxy-1</code>: The WDQS backend accepts POST requests to insert data. The WDQS proxy is set up between the frontend and the backend to just allow GET requests (readonly). It also makes an API endpoint available to query the database. | ||
=== Debug errors === | === Debug errors === | ||
;The profile page does not show information that I see when I visit the Item page. | |||
:Try following the debugging steps listed below in order. | |||
<syntaxhighlight lang=bash> | ;I have a SPARQL query that does not return the same results I see when I visit a given <code>Item:</code> page. | ||
:Try following the debugging steps listed below in order. | |||
;The Query UI is not loading or it returns an error when I run a SPARQL query. | |||
:Follow step 2. | |||
==== Debugging steps ==== | |||
* '''Step 1''': Check if you can see the information related to that item in the [https://query.portal.mardi4nfdi.de Query Service UI]. Just use a query like: | |||
:<syntaxhighlight lang=bash> | |||
DESCRIBE wd:Q100 | DESCRIBE wd:Q100 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
:If the Query Service is unresponsive or you get an error proceed to step 2. Otherwise, you can jump to step 3. | |||
* '''Step 2''': If the Query Service UI does not load or an error is returned when executing the query, it indicates that some of the WDQS containers is experiencing problems, most probably <code>docker-wdqs-1</code>. The first recommendation to fix the problem is to just restart the four containers. Start restaring <code>docker-wdqs-1</code> and then follow with the other three WDQS containers. After the containers have restarted try to send again the query in the UI and check if the problem persists. If it does, it will be necessary to check in detail the logs in the containers. To fix the problem, it might be necessary to tweak some of the configuration variables that are passed to one of the containers. You can check the documentation on the configuration parameters [https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual here], which we pass to the containers as environmental variables in the [https://github.com/MaRDI4NFDI/portal-compose/blob/5501bb527c53cca75d0a3dca03fd585732997473/docker-compose.yml#L218 docker-compose.yml] file. | |||
This | * '''Step 3''': If after executing a SPARQL query results are returned but they are incomplete, this indicates that the Blazegraph backend and query engine are working properly, but the information has not been properly copied from mySQL to the Blazegraph database. This indicates that at some point or during some time interval the WDQS updater container has not been running. In this case it will be necessary to resynchronize again specific items or just all the items starting at a given point in time. Follow [[Project:RerunUpdate|these instructions]] for that. | ||
Latest revision as of 10:00, 14 March 2025
The first step to fix bugs related to WDQS is to understand how WDQS and Wikibase interact.
WDQS architecture
- Whenever a wikibase item is created, all its information (representing the statements) are saved as a JSON object in the MediaWiki table that saves page information.
- Given that all the statements are just packed as a single JSON object inside a mySQL database, it is not possible to use SQL queries to just query items for particular statements.
- To be able to perform these kind of queries (e.g. what items have a property value equal to X for property Y), all the statements are copied in another database, which is not mySQL based. This other database is a graph database using Blazegraph, which accepts queries in SPARQL.
- Having two databases (mySQL and Blazegraph) that contain the same information requires a mechanism to keep them in synchronization. This mechanism runs in the
docker-wdqs-updater-1container. All it does are API calls to the RecentChanges API endpoint from Mediawiki to check the recent changes in Item and Property pages (pages in theItem:andProperty:namespaces). Then, it processes these changes and it pushes them into Blazegraph in RDF format. - It is important to keep in mind that the MediaWiki mySQL table is the source of truth. Whenever we are trying to debug an error related to WDQS we should first make sure that the information is shown correctly in the Item/Property page.
- The information being shown in the profile pages (e.g.
Person:,Publication:, ... namespaces) is often retrieved using SPARQL queries, which means that it is read from Blazegraph, not from mySQL.
WDQS containers
The entire WDQS service is based on four docker containers:
- WDQS backend
docker-wdqs-1: Main database container. It runs the Blazegraph instance and contains all the data. - WDQS frontend
mardi-wdqs-frontend: Simple frontend application to write SPARQL queries and send them to the Blazegraph database. Available at Query Service UI. - WDQS updater
docker-wdqs-updater-1: Container running the updater process. It queries the MediaWiki RecentChanges API and inserts the changes into the Blazegraph database. - WDQS proxy
docker-wdqs-proxy-1: The WDQS backend accepts POST requests to insert data. The WDQS proxy is set up between the frontend and the backend to just allow GET requests (readonly). It also makes an API endpoint available to query the database.
Debug errors
- The profile page does not show information that I see when I visit the Item page.
- Try following the debugging steps listed below in order.
- I have a SPARQL query that does not return the same results I see when I visit a given
Item:page. - Try following the debugging steps listed below in order.
- The Query UI is not loading or it returns an error when I run a SPARQL query.
- Follow step 2.
Debugging steps
- Step 1: Check if you can see the information related to that item in the Query Service UI. Just use a query like:
DESCRIBE wd:Q100
- If the Query Service is unresponsive or you get an error proceed to step 2. Otherwise, you can jump to step 3.
- Step 2: If the Query Service UI does not load or an error is returned when executing the query, it indicates that some of the WDQS containers is experiencing problems, most probably
docker-wdqs-1. The first recommendation to fix the problem is to just restart the four containers. Start restaringdocker-wdqs-1and then follow with the other three WDQS containers. After the containers have restarted try to send again the query in the UI and check if the problem persists. If it does, it will be necessary to check in detail the logs in the containers. To fix the problem, it might be necessary to tweak some of the configuration variables that are passed to one of the containers. You can check the documentation on the configuration parameters here, which we pass to the containers as environmental variables in the docker-compose.yml file.
- Step 3: If after executing a SPARQL query results are returned but they are incomplete, this indicates that the Blazegraph backend and query engine are working properly, but the information has not been properly copied from mySQL to the Blazegraph database. This indicates that at some point or during some time interval the WDQS updater container has not been running. In this case it will be necessary to resynchronize again specific items or just all the items starting at a given point in time. Follow these instructions for that.