Project:MariaDB-K8s: Difference between revisions

From MaRDI portal
 
Line 1: Line 1:
= MariaDB configuration =
= MariaDB Configuration =


== Set up the DB ==
== Overview ==


The main database of the portal is based on MariaDB. Our MariaDB instance is deployed an managed using [https://github.com/mariadb-operator/mariadb-operator mariadb-operator], which allows to use Custom Resource Definitions (CRDs) in Kubernetes to declaratively manage our database. In simple terms, this allows us to define all our DB resources (databases, uses, grants...) using YAML files.
The portal uses MariaDB as its primary database, deployed and managed through the [https://github.com/mariadb-operator/mariadb-operator mariadb-operator] in Kubernetes. This setup provides high availability through Galera clustering with automatic replication between a primary write node and multiple read-only replicas.


The MariaDB operator takes care of the DB replication to spin up a main node and several other read replicas.
'''Key Components:'''
* '''Primary Service''': Write operations and main database node
* '''Secondary Service''': Read-only replicas for load distribution
* '''Galera Clustering''': Synchronous replication between nodes
* '''MediaWiki Integration''': Load-balanced database connections


The following definition in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/mariadb.yaml mariadb.yaml] assigns an external IP to access the DB from outside the cluster.
== Prerequisites ==
 
Before working with the MariaDB configuration, ensure you have:
 
* <code>kubectl</code> access to the staging/production cluster
* Access to the [https://github.com/MaRDI4NFDI/portal-k8s portal-k8s] repository
* S3 credentials for backup operations (if needed)
* Basic understanding of Kubernetes Custom Resource Definitions (CRDs)
 
== Initial Setup ==
 
=== Database Deployment ===
 
The MariaDB instance is deployed using Custom Resource Definitions (CRDs) that allow declarative management of database resources through YAML files. The mariadb-operator handles the underlying complexity of cluster management and replication.
 
=== Service Configuration ===
 
External access to the database is configured through LoadBalancer services in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/mariadb.yaml mariadb.yaml]:


<syntaxhighlight lang=yaml>
<syntaxhighlight lang=yaml>
primaryService:
primaryService:
  type: LoadBalancer
type: LoadBalancer


secondaryService:
secondaryService:
  type: LoadBalancer
type: LoadBalancer
</syntaxhighlight>
 
This definition assigns external IP addresses to both the primary and secondary services.
 
'''Service Roles:'''
* '''primaryService''': Handles write operations to the main database node
* '''secondaryService''': Distributes read operations across read-only replicas
 
=== Retrieving Service IPs ===
 
After deployment, obtain the assigned external IP addresses:
 
<syntaxhighlight lang=bash>
kubectl get svc -n <namespace>
</syntaxhighlight>
 
Look for services with <code>TYPE=LoadBalancer</code> and note their <code>EXTERNAL-IP</code> values. These IPs are required for MediaWiki database connectivity configuration.
 
=== Verification Steps ===
 
Confirm the deployment is healthy:
 
<syntaxhighlight lang=bash>
# Check MariaDB resource status
kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>
 
# Verify all pods are running
kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb
 
# Test database connectivity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SELECT 1;"
</syntaxhighlight>
 
== Replication Management ==
 
=== How Galera Replication Works ===
 
The database uses Galera clustering for synchronous replication. Galera provides:
* Multi-master synchronous replication
* Automatic node failure detection and recovery
* Consistent data across all nodes
 
=== Known Limitations ===
 
'''Critical Issue: User Privilege Replication'''
 
Galera has a significant limitation documented in the [https://github.com/mariadb-operator/mariadb-operator/blob/main/docs/BACKUP.md#galera-backup-limitations backup limitations]: the <code>mysql.global_priv</code> table (which stores users and grants) is '''not replicated''' between nodes.
 
'''Impact:'''
* User accounts may not exist on all nodes after pod restarts
* Can cause authentication failures during cluster recovery
* Particularly affects the root user, causing startup issues
 
'''Mitigation Strategy:'''
# Always include <code>rootPasswordSecretKeyRef</code> in the mariadb.yaml definition
# Use User and Grant CRDs for additional database users (implemented for <code>sqluser</code>)
# Monitor cluster health after any pod restarts
 
=== Monitoring Replication Health ===
 
Check replication status across all nodes:
 
<syntaxhighlight lang=bash>
# Access mariaDB
kubectl exec -n <namespace> (production|staging)-mariadb-0 -- bash
mariadb -u root -p
</syntaxhighlight>
</syntaxhighlight>
   
'''primaryService''' represents the write node whereas '''secondaryService''' refers to the read-only replicas. With


<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
kubectl get svc
# Check status
SHOW STATUS LIKE 'wsrep%';
SHOW STATUS LIKE 'wsrep_local_state_comment';
</syntaxhighlight>
</syntaxhighlight>


it is possible to see which specific IP was assigned to these services. These IPs are necessary to set up the connection between the MediaWiki container and the DB.
Expected output should show <code>Synced</code> for healthy nodes.
 
== Troubleshooting ==
 
=== Common Issues and Solutions ===
 
==== Root User Replication Problem ====
 
'''Symptoms:'''
* Pods fail to start after restart (or keeps restarting).
* Authentication errors in pod logs
 
'''Root Cause:'''
The root user is not replicated between Galera nodes due to the <code>mysql.global_priv</code> table limitation.
 
'''Solution:'''
Manually recreate the root user on the primary node to trigger replication:
 
{{Warning|This involves dropping and recreating the root user. Ensure you have the correct root password before proceeding.}}
 
<syntaxhighlight lang=sql>
-- Connect to the primary pod (e.g., staging-mariadb-0)
-- Replace <root_password> with the actual password from your secret
 
DROP USER IF EXISTS 'root'@'localhost';
DROP USER IF EXISTS 'root'@'%';
CREATE USER 'root'@'localhost' IDENTIFIED BY '<root_password>';
CREATE USER 'root'@'%' IDENTIFIED BY '<root_password>';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
</syntaxhighlight>


== MediaWiki configuration ==
=== Debugging Steps ===


MediaWiki requires its SQL user to have the SLAVE MONITOR and BINLOG MONITOR privileges in order for replication to work. These privileges are thus defined in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/grant.yaml grant.yaml].
Follow this systematic approach for unidentified database issues:


The connection between MediaWiki and the different DB nodes can be specified with the following parameter in LocalSettings.php, substituting primaryService-IP and secondaryService-IP by the right values:
'''1. Check MariaDB Resource Status'''
<syntaxhighlight lang=bash>
kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>
</syntaxhighlight>


<syntaxhighlight lang=php>
'''2. Examine Pod Status'''
$wgLBFactoryConf = array(
<syntaxhighlight lang=bash>
kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb
kubectl logs -n <namespace> mariadb-0 --tail=100
</syntaxhighlight>
 
'''3. Verify Galera Cluster Health'''
<syntaxhighlight lang=bash>
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
</syntaxhighlight>
 
== Mediawiki Integration ==
 
=== MediaWiki Configuration ===


    'class' => 'LBFactoryMulti',
MediaWiki requires specific database privileges for replication to function correctly. The required privileges are defined in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/grant.yaml grant.yaml]:


    'sectionsByDB' => array(
* <code>SLAVE MONITOR</code>: Allows MediaWiki to check replication lag
        'my_wiki' => 's1',
* <code>BINLOG MONITOR</code>: Enables binary log monitoring for replication status
    ),


    'sectionLoads' => array(
=== Database Connection Configuration ===
        's1' => array(
            '<primaryService-IP>'  => 0,
            '<secondaryService-IP>' => 50,
        ),
    ),


Configure MediaWiki to use both primary and secondary services in <code>LocalSettings.php</code>:


    'serverTemplate' => array(
<syntaxhighlight lang=php>
        'dbname'    => $wgDBname,
$wgLBFactoryConf = array(
        'user'      => $wgDBuser,
  'class' => 'LBFactoryMulti',
        'password'  => $wgDBpassword,
 
        'type'      => 'mysql',
  'sectionsByDB' => array(
        'flags'      => DBO_DEFAULT,
      'my_wiki' => 's1',
        'max lag'    => 30,
  ),
    ),
 
  'sectionLoads' => array(
      's1' => array(
          '<primaryService-IP>'  => 0,    // Primary (write) node
          '<secondaryService-IP>' => 50,  // Read replica with load weight
      ),
  ),
 
  'serverTemplate' => array(
      'dbname'    => $wgDBname,
      'user'      => $wgDBuser,
      'password'  => $wgDBpassword,
      'type'      => 'mysql',
      'flags'      => DBO_DEFAULT,
      'max lag'    => 30,             // Maximum acceptable replication lag
  ),
);
);
</syntaxhighlight>
</syntaxhighlight>


== Backup ==
'''Load Balancing Explanation:'''
* <code>0</code> for primary: Handles writes and can serve reads
* <code>50</code> for secondary: Handles 50% of read operations (higher numbers = more load)
 
== Backup and Recovery ==
 
=== Automated Backup Process ===


* Backups for the database are scheduled every 24h. We simultaneously keep 7 copies for the last 7 days.
Backups are automatically scheduled every 24 hours, with retention of 7 daily copies. The mariadb-operator manages backup creation and storage in S3, as configured in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/backup.yaml backup.yaml].
* MariaDB-operator takes care of creating the backup and pushing them into an S3 bucket, as defined in [https://github.com/MaRDI4NFDI/portal-k8s/blob/main/charts/mariadb/templates/backup.yaml backup.yaml].
* Production backups are prefixed with <code>production</code>, staging backups use the prefix <code>staging</code>.
* The files in the S3 bucket can examined using [https://s3tools.org/s3cmd s3cmd]:
:<syntaxhighlight lang=bash>
s3cmd --host=hsm-test-09.zib.de:9001 --host-bucket=hsm-test-09.zib.de:9001 --region=us-east-1 --access_key=<access_key> --secret_key=<secret_key> ls s3:// --human-readable-sizes
</syntaxhighlight>


== Restore a database to the cluster ==
'''Backup Naming Convention:'''
* Production backups: <code>production-*</code>
* Staging backups: <code>staging-*</code>


To restore or load a backup file once the DB is running.
=== Backup Verification ===


1) Copy the backup file inside the main DB pod. The main pod is the only one with write rights. The file can be copied into the volume attached to the mariadb-0 pod, which will have enough space for it. This is mounted to /var/lib/mysql:
List available backups in S3:


<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
kubectl cp ./portal_db_backup.gz (production|staging)/mariadb-0:/var/lib/mysql
s3cmd --host=hsm-test-09.zib.de:9001 \
    --host-bucket=hsm-test-09.zib.de:9001 \
    --region=us-east-1 \
    --access_key=<access_key> \
    --secret_key=<secret_key> \
    ls s3:// --human-readable-sizes
</syntaxhighlight>
</syntaxhighlight>


2) Unzip and load the file into the database:
=== Database Restore Procedures ===
 
'''Prerequisites:'''
* Database cluster must be running and healthy
* Sufficient disk space for restore operation
* Backup file must be accessible


Unzipping directly will probably fail with out of memory errors when the backup file is big.
==== Step 1: Copy Backup to Primary Pod ====


<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
cd /var/lib/mysql
# Copy backup to the primary pod (only pod with write permissions)
gzip -d portal_db_backup.gz
kubectl cp ./portal_db_backup.gz <namespace>/mariadb-0:/var/lib/mysql/
</syntaxhighlight>
</syntaxhighlight>


Therefore it might be better to stream the compressed file into a new file:
==== Step 2: Prepare Backup File ====


'''For Large Backups (Recommended):'''
<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
# Connect to primary pod
kubectl exec -it -n <namespace> mariadb-0 -- bash
# Navigate to MySQL data directory
cd /var/lib/mysql
# Stream decompression to avoid memory issues
gunzip -c portal_db_backup.gz > portal_backup
gunzip -c portal_db_backup.gz > portal_backup
</syntaxhighlight>
</syntaxhighlight>


3) The backup can then be loaded into the database using:
'''For Small Backups:'''
<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup
# Direct decompression
gzip -d portal_db_backup.gz
</syntaxhighlight>
</syntaxhighlight>
The <code>--max_allowed_packet</code> can be tweaked to avoid memory errors that could crash the pod.


If this command results in the error
==== Step 3: Load Backup into Database ====
 
<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
ERROR at line 1: Unknown command '\-'.
# Load backup with memory optimization
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup
</syntaxhighlight>
</syntaxhighlight>


it might be fixed by just editing the first line in the file with
'''Troubleshooting Import Issues:'''
 
If you encounter <code>ERROR at line 1: Unknown command '\-'</code>:
<syntaxhighlight lang=bash>
<syntaxhighlight lang=bash>
# Remove problematic first line
sed -i '1d' portal_backup
sed -i '1d' portal_backup
# Retry import
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup
</syntaxhighlight>
</syntaxhighlight>


4) During the restoration of the database, it is important to monitor the available disk space. Several log files, numbered from <code>prod-mariadb-bin.000001</code> onwards, will be generated during replication and can quickly fill up the disk.
==== Step 4: Monitor Disk Space During Restore ====
 
{{Warning|Monitor available disk space during restore operations.}}


To check whether these files can already be deleted, one should access the replicas and run
Binary log files (<code>prod-mariadb-bin.000001</code>, <code>prod-mariadb-bin.000002</code>, etc.) are generated during replication and can quickly fill available space.


'''Check Replication Status:'''
<syntaxhighlight lang=sql>
<syntaxhighlight lang=sql>
-- Run on replica nodes to check log processing status
SHOW ALL SLAVES STATUS\G
SHOW ALL SLAVES STATUS\G
</syntaxhighlight>
</syntaxhighlight>


to check whether the log file has already been processed. For this, check the both current Master_Log_File or Relay_Master_Log_File. Any log files below the number mentioned in these two fields can be already deleted with:
'''Identify Safe-to-Delete Logs:'''
Look for <code>Master_Log_File</code> and <code>Relay_Master_Log_File</code> values. Any binary log files with numbers below these can be safely purged.


'''Purge Old Binary Logs:'''
<syntaxhighlight lang=sql>
<syntaxhighlight lang=sql>
-- Replace XX with the appropriate log number
PURGE BINARY LOGS TO 'prod-mariadb-bin.0000XX';
PURGE BINARY LOGS TO 'prod-mariadb-bin.0000XX';
</syntaxhighlight>
==== Step 5: Verify Restore Success ====
<syntaxhighlight lang=bash>
# Check database integrity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
USE my_wiki;
SHOW TABLES;
SELECT COUNT(*) FROM <key_table>;
"
# Verify replication is working
kubectl exec -n <namespace> mariadb-1 -- mariadb -u root -p -e "
SHOW SLAVE STATUS\G
"
</syntaxhighlight>
== Maintenance ==
=== Regular Tasks ===
* Check backup completion status
* Review pod resource usage
* Verify backup restore procedures in staging
* Clean up old binary log files if accumulating
=== Performance Monitoring ===
Key metrics to monitor:
<syntaxhighlight lang=bash>
# Check replication lag
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
SHOW STATUS LIKE 'wsrep_flow_control_paused';
SHOW STATUS LIKE 'wsrep_local_recv_queue';
"
# Monitor connection usage
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';
"
</syntaxhighlight>
=== Log Management ===
Galera and MariaDB generate extensive logs. Regular cleanup prevents disk space issues:
<syntaxhighlight lang=bash>
# Check current log sizes
kubectl exec -n <namespace> mariadb-0 -- du -sh /var/lib/mysql/prod-mariadb-bin.*
# Automated cleanup (run carefully)
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);
"
</syntaxhighlight>
</syntaxhighlight>

Latest revision as of 16:01, 16 June 2025

MariaDB Configuration

Overview

The portal uses MariaDB as its primary database, deployed and managed through the mariadb-operator in Kubernetes. This setup provides high availability through Galera clustering with automatic replication between a primary write node and multiple read-only replicas.

Key Components:

  • Primary Service: Write operations and main database node
  • Secondary Service: Read-only replicas for load distribution
  • Galera Clustering: Synchronous replication between nodes
  • MediaWiki Integration: Load-balanced database connections

Prerequisites

Before working with the MariaDB configuration, ensure you have:

  • kubectl access to the staging/production cluster
  • Access to the portal-k8s repository
  • S3 credentials for backup operations (if needed)
  • Basic understanding of Kubernetes Custom Resource Definitions (CRDs)

Initial Setup

Database Deployment

The MariaDB instance is deployed using Custom Resource Definitions (CRDs) that allow declarative management of database resources through YAML files. The mariadb-operator handles the underlying complexity of cluster management and replication.

Service Configuration

External access to the database is configured through LoadBalancer services in mariadb.yaml:

primaryService:
 type: LoadBalancer

secondaryService:
 type: LoadBalancer

This definition assigns external IP addresses to both the primary and secondary services.

Service Roles:

  • primaryService: Handles write operations to the main database node
  • secondaryService: Distributes read operations across read-only replicas

Retrieving Service IPs

After deployment, obtain the assigned external IP addresses:

kubectl get svc -n <namespace>

Look for services with TYPE=LoadBalancer and note their EXTERNAL-IP values. These IPs are required for MediaWiki database connectivity configuration.

Verification Steps

Confirm the deployment is healthy:

# Check MariaDB resource status
kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>

# Verify all pods are running
kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb

# Test database connectivity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SELECT 1;"

Replication Management

How Galera Replication Works

The database uses Galera clustering for synchronous replication. Galera provides:

  • Multi-master synchronous replication
  • Automatic node failure detection and recovery
  • Consistent data across all nodes

Known Limitations

Critical Issue: User Privilege Replication

Galera has a significant limitation documented in the backup limitations: the mysql.global_priv table (which stores users and grants) is not replicated between nodes.

Impact:

  • User accounts may not exist on all nodes after pod restarts
  • Can cause authentication failures during cluster recovery
  • Particularly affects the root user, causing startup issues

Mitigation Strategy:

  1. Always include rootPasswordSecretKeyRef in the mariadb.yaml definition
  2. Use User and Grant CRDs for additional database users (implemented for sqluser)
  3. Monitor cluster health after any pod restarts

Monitoring Replication Health

Check replication status across all nodes:

# Access mariaDB
kubectl exec -n <namespace> (production|staging)-mariadb-0 -- bash
mariadb -u root -p
# Check status
SHOW STATUS LIKE 'wsrep%';
SHOW STATUS LIKE 'wsrep_local_state_comment';

Expected output should show Synced for healthy nodes.

Troubleshooting

Common Issues and Solutions

Root User Replication Problem

Symptoms:

  • Pods fail to start after restart (or keeps restarting).
  • Authentication errors in pod logs

Root Cause: The root user is not replicated between Galera nodes due to the mysql.global_priv table limitation.

Solution: Manually recreate the root user on the primary node to trigger replication:

-- Connect to the primary pod (e.g., staging-mariadb-0)
-- Replace <root_password> with the actual password from your secret

DROP USER IF EXISTS 'root'@'localhost';
DROP USER IF EXISTS 'root'@'%';
CREATE USER 'root'@'localhost' IDENTIFIED BY '<root_password>';
CREATE USER 'root'@'%' IDENTIFIED BY '<root_password>';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

Debugging Steps

Follow this systematic approach for unidentified database issues:

1. Check MariaDB Resource Status

kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>

2. Examine Pod Status

kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb
kubectl logs -n <namespace> mariadb-0 --tail=100

3. Verify Galera Cluster Health

kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Mediawiki Integration

MediaWiki Configuration

MediaWiki requires specific database privileges for replication to function correctly. The required privileges are defined in grant.yaml:

  • SLAVE MONITOR: Allows MediaWiki to check replication lag
  • BINLOG MONITOR: Enables binary log monitoring for replication status

Database Connection Configuration

Configure MediaWiki to use both primary and secondary services in LocalSettings.php:

$wgLBFactoryConf = array(
   'class' => 'LBFactoryMulti',
   
   'sectionsByDB' => array(
       'my_wiki' => 's1', 
   ),
   
   'sectionLoads' => array(
       's1' => array(
           '<primaryService-IP>'   => 0,    // Primary (write) node
           '<secondaryService-IP>' => 50,   // Read replica with load weight
       ),
   ),
   
   'serverTemplate' => array(
       'dbname'     => $wgDBname,
       'user'       => $wgDBuser,
       'password'   => $wgDBpassword,
       'type'       => 'mysql',
       'flags'      => DBO_DEFAULT,
       'max lag'    => 30,              // Maximum acceptable replication lag
   ),
);

Load Balancing Explanation:

  • 0 for primary: Handles writes and can serve reads
  • 50 for secondary: Handles 50% of read operations (higher numbers = more load)

Backup and Recovery

Automated Backup Process

Backups are automatically scheduled every 24 hours, with retention of 7 daily copies. The mariadb-operator manages backup creation and storage in S3, as configured in backup.yaml.

Backup Naming Convention:

  • Production backups: production-*
  • Staging backups: staging-*

Backup Verification

List available backups in S3:

s3cmd --host=hsm-test-09.zib.de:9001 \
     --host-bucket=hsm-test-09.zib.de:9001 \
     --region=us-east-1 \
     --access_key=<access_key> \
     --secret_key=<secret_key> \
     ls s3:// --human-readable-sizes

Database Restore Procedures

Prerequisites:

  • Database cluster must be running and healthy
  • Sufficient disk space for restore operation
  • Backup file must be accessible

Step 1: Copy Backup to Primary Pod

# Copy backup to the primary pod (only pod with write permissions)
kubectl cp ./portal_db_backup.gz <namespace>/mariadb-0:/var/lib/mysql/

Step 2: Prepare Backup File

For Large Backups (Recommended):

# Connect to primary pod
kubectl exec -it -n <namespace> mariadb-0 -- bash

# Navigate to MySQL data directory
cd /var/lib/mysql

# Stream decompression to avoid memory issues
gunzip -c portal_db_backup.gz > portal_backup

For Small Backups:

# Direct decompression
gzip -d portal_db_backup.gz

Step 3: Load Backup into Database

# Load backup with memory optimization
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup

Troubleshooting Import Issues:

If you encounter ERROR at line 1: Unknown command '\-':

# Remove problematic first line
sed -i '1d' portal_backup
# Retry import
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup

Step 4: Monitor Disk Space During Restore

Binary log files (prod-mariadb-bin.000001, prod-mariadb-bin.000002, etc.) are generated during replication and can quickly fill available space.

Check Replication Status:

-- Run on replica nodes to check log processing status
SHOW ALL SLAVES STATUS\G

Identify Safe-to-Delete Logs: Look for Master_Log_File and Relay_Master_Log_File values. Any binary log files with numbers below these can be safely purged.

Purge Old Binary Logs:

-- Replace XX with the appropriate log number
PURGE BINARY LOGS TO 'prod-mariadb-bin.0000XX';

Step 5: Verify Restore Success

# Check database integrity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 USE my_wiki;
 SHOW TABLES;
 SELECT COUNT(*) FROM <key_table>;
"

# Verify replication is working
kubectl exec -n <namespace> mariadb-1 -- mariadb -u root -p -e "
 SHOW SLAVE STATUS\G
"

Maintenance

Regular Tasks

  • Check backup completion status
  • Review pod resource usage
  • Verify backup restore procedures in staging
  • Clean up old binary log files if accumulating

Performance Monitoring

Key metrics to monitor:

# Check replication lag
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 SHOW STATUS LIKE 'wsrep_flow_control_paused';
 SHOW STATUS LIKE 'wsrep_local_recv_queue';
"

# Monitor connection usage
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 SHOW STATUS LIKE 'Threads_connected';
 SHOW STATUS LIKE 'Max_used_connections';
"

Log Management

Galera and MariaDB generate extensive logs. Regular cleanup prevents disk space issues:

# Check current log sizes
kubectl exec -n <namespace> mariadb-0 -- du -sh /var/lib/mysql/prod-mariadb-bin.*

# Automated cleanup (run carefully)
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);
"