Project:MariaDB-K8s
MariaDB Configuration
Overview
The portal uses MariaDB as its primary database, deployed and managed through the mariadb-operator in Kubernetes. This setup provides high availability through Galera clustering with automatic replication between a primary write node and multiple read-only replicas.
Key Components:
- Primary Service: Write operations and main database node
- Secondary Service: Read-only replicas for load distribution
- Galera Clustering: Synchronous replication between nodes
- MediaWiki Integration: Load-balanced database connections
Prerequisites
Before working with the MariaDB configuration, ensure you have:
kubectlaccess to the staging/production cluster- Access to the portal-k8s repository
- S3 credentials for backup operations (if needed)
- Basic understanding of Kubernetes Custom Resource Definitions (CRDs)
Initial Setup
Database Deployment
The MariaDB instance is deployed using Custom Resource Definitions (CRDs) that allow declarative management of database resources through YAML files. The mariadb-operator handles the underlying complexity of cluster management and replication.
Service Configuration
External access to the database is configured through LoadBalancer services in mariadb.yaml:
primaryService:
type: LoadBalancer
secondaryService:
type: LoadBalancer
This definition assigns external IP addresses to both the primary and secondary services.
Service Roles:
- primaryService: Handles write operations to the main database node
- secondaryService: Distributes read operations across read-only replicas
Retrieving Service IPs
After deployment, obtain the assigned external IP addresses:
kubectl get svc -n <namespace>
Look for services with TYPE=LoadBalancer and note their EXTERNAL-IP values. These IPs are required for MediaWiki database connectivity configuration.
Verification Steps
Confirm the deployment is healthy:
# Check MariaDB resource status
kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>
# Verify all pods are running
kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb
# Test database connectivity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SELECT 1;"
Replication Management
How Galera Replication Works
The database uses Galera clustering for synchronous replication. Galera provides:
- Multi-master synchronous replication
- Automatic node failure detection and recovery
- Consistent data across all nodes
Known Limitations
Critical Issue: User Privilege Replication
Galera has a significant limitation documented in the backup limitations: the mysql.global_priv table (which stores users and grants) is not replicated between nodes.
Impact:
- User accounts may not exist on all nodes after pod restarts
- Can cause authentication failures during cluster recovery
- Particularly affects the root user, causing startup issues
Mitigation Strategy:
- Always include
rootPasswordSecretKeyRefin the mariadb.yaml definition - Use User and Grant CRDs for additional database users (implemented for
sqluser) - Monitor cluster health after any pod restarts
Monitoring Replication Health
Check replication status across all nodes:
# Access mariaDB
kubectl exec -n <namespace> (production|staging)-mariadb-0 -- bash
mariadb -u root -p
# Check status
SHOW STATUS LIKE 'wsrep%';
SHOW STATUS LIKE 'wsrep_local_state_comment';
Expected output should show Synced for healthy nodes.
Troubleshooting
Common Issues and Solutions
Root User Replication Problem
Symptoms:
- Pods fail to start after restart (or keeps restarting).
- Authentication errors in pod logs
Root Cause:
The root user is not replicated between Galera nodes due to the mysql.global_priv table limitation.
Solution: Manually recreate the root user on the primary node to trigger replication:
| This involves dropping and recreating the root user. Ensure you have the correct root password before proceeding. |
-- Connect to the primary pod (e.g., staging-mariadb-0)
-- Replace <root_password> with the actual password from your secret
DROP USER IF EXISTS 'root'@'localhost';
DROP USER IF EXISTS 'root'@'%';
CREATE USER 'root'@'localhost' IDENTIFIED BY '<root_password>';
CREATE USER 'root'@'%' IDENTIFIED BY '<root_password>';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
Debugging Steps
Follow this systematic approach for unidentified database issues:
1. Check MariaDB Resource Status
kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>
2. Examine Pod Status
kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb
kubectl logs -n <namespace> mariadb-0 --tail=100
3. Verify Galera Cluster Health
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
Mediawiki Integration
MediaWiki Configuration
MediaWiki requires specific database privileges for replication to function correctly. The required privileges are defined in grant.yaml:
SLAVE MONITOR: Allows MediaWiki to check replication lagBINLOG MONITOR: Enables binary log monitoring for replication status
Database Connection Configuration
Configure MediaWiki to use both primary and secondary services in LocalSettings.php:
$wgLBFactoryConf = array(
'class' => 'LBFactoryMulti',
'sectionsByDB' => array(
'my_wiki' => 's1',
),
'sectionLoads' => array(
's1' => array(
'<primaryService-IP>' => 0, // Primary (write) node
'<secondaryService-IP>' => 50, // Read replica with load weight
),
),
'serverTemplate' => array(
'dbname' => $wgDBname,
'user' => $wgDBuser,
'password' => $wgDBpassword,
'type' => 'mysql',
'flags' => DBO_DEFAULT,
'max lag' => 30, // Maximum acceptable replication lag
),
);
Load Balancing Explanation:
0for primary: Handles writes and can serve reads50for secondary: Handles 50% of read operations (higher numbers = more load)
Backup and Recovery
Automated Backup Process
Backups are automatically scheduled every 24 hours, with retention of 7 daily copies. The mariadb-operator manages backup creation and storage in S3, as configured in backup.yaml.
Backup Naming Convention:
- Production backups:
production-* - Staging backups:
staging-*
Backup Verification
List available backups in S3:
s3cmd --host=hsm-test-09.zib.de:9001 \
--host-bucket=hsm-test-09.zib.de:9001 \
--region=us-east-1 \
--access_key=<access_key> \
--secret_key=<secret_key> \
ls s3:// --human-readable-sizes
Database Restore Procedures
Prerequisites:
- Database cluster must be running and healthy
- Sufficient disk space for restore operation
- Backup file must be accessible
Step 1: Copy Backup to Primary Pod
# Copy backup to the primary pod (only pod with write permissions)
kubectl cp ./portal_db_backup.gz <namespace>/mariadb-0:/var/lib/mysql/
Step 2: Prepare Backup File
For Large Backups (Recommended):
# Connect to primary pod
kubectl exec -it -n <namespace> mariadb-0 -- bash
# Navigate to MySQL data directory
cd /var/lib/mysql
# Stream decompression to avoid memory issues
gunzip -c portal_db_backup.gz > portal_backup
For Small Backups:
# Direct decompression
gzip -d portal_db_backup.gz
Step 3: Load Backup into Database
# Load backup with memory optimization
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup
Troubleshooting Import Issues:
If you encounter ERROR at line 1: Unknown command '\-':
# Remove problematic first line
sed -i '1d' portal_backup
# Retry import
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup
Step 4: Monitor Disk Space During Restore
| Monitor available disk space during restore operations. |
Binary log files (prod-mariadb-bin.000001, prod-mariadb-bin.000002, etc.) are generated during replication and can quickly fill available space.
Check Replication Status:
-- Run on replica nodes to check log processing status
SHOW ALL SLAVES STATUS\G
Identify Safe-to-Delete Logs:
Look for Master_Log_File and Relay_Master_Log_File values. Any binary log files with numbers below these can be safely purged.
Purge Old Binary Logs:
-- Replace XX with the appropriate log number
PURGE BINARY LOGS TO 'prod-mariadb-bin.0000XX';
Step 5: Verify Restore Success
# Check database integrity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
USE my_wiki;
SHOW TABLES;
SELECT COUNT(*) FROM <key_table>;
"
# Verify replication is working
kubectl exec -n <namespace> mariadb-1 -- mariadb -u root -p -e "
SHOW SLAVE STATUS\G
"
Maintenance
Regular Tasks
- Check backup completion status
- Review pod resource usage
- Verify backup restore procedures in staging
- Clean up old binary log files if accumulating
Performance Monitoring
Key metrics to monitor:
# Check replication lag
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
SHOW STATUS LIKE 'wsrep_flow_control_paused';
SHOW STATUS LIKE 'wsrep_local_recv_queue';
"
# Monitor connection usage
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';
"
Log Management
Galera and MariaDB generate extensive logs. Regular cleanup prevents disk space issues:
# Check current log sizes
kubectl exec -n <namespace> mariadb-0 -- du -sh /var/lib/mysql/prod-mariadb-bin.*
# Automated cleanup (run carefully)
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);
"