Project:MariaDB-K8s

From MaRDI portal

MariaDB Configuration

Overview

The portal uses MariaDB as its primary database, deployed and managed through the mariadb-operator in Kubernetes. This setup provides high availability through Galera clustering with automatic replication between a primary write node and multiple read-only replicas.

Key Components:

  • Primary Service: Write operations and main database node
  • Secondary Service: Read-only replicas for load distribution
  • Galera Clustering: Synchronous replication between nodes
  • MediaWiki Integration: Load-balanced database connections

Prerequisites

Before working with the MariaDB configuration, ensure you have:

  • kubectl access to the staging/production cluster
  • Access to the portal-k8s repository
  • S3 credentials for backup operations (if needed)
  • Basic understanding of Kubernetes Custom Resource Definitions (CRDs)

Initial Setup

Database Deployment

The MariaDB instance is deployed using Custom Resource Definitions (CRDs) that allow declarative management of database resources through YAML files. The mariadb-operator handles the underlying complexity of cluster management and replication.

Service Configuration

External access to the database is configured through LoadBalancer services in mariadb.yaml:

primaryService:
 type: LoadBalancer

secondaryService:
 type: LoadBalancer

This definition assigns external IP addresses to both the primary and secondary services.

Service Roles:

  • primaryService: Handles write operations to the main database node
  • secondaryService: Distributes read operations across read-only replicas

Retrieving Service IPs

After deployment, obtain the assigned external IP addresses:

kubectl get svc -n <namespace>

Look for services with TYPE=LoadBalancer and note their EXTERNAL-IP values. These IPs are required for MediaWiki database connectivity configuration.

Verification Steps

Confirm the deployment is healthy:

# Check MariaDB resource status
kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>

# Verify all pods are running
kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb

# Test database connectivity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SELECT 1;"

Replication Management

How Galera Replication Works

The database uses Galera clustering for synchronous replication. Galera provides:

  • Multi-master synchronous replication
  • Automatic node failure detection and recovery
  • Consistent data across all nodes

Known Limitations

Critical Issue: User Privilege Replication

Galera has a significant limitation documented in the backup limitations: the mysql.global_priv table (which stores users and grants) is not replicated between nodes.

Impact:

  • User accounts may not exist on all nodes after pod restarts
  • Can cause authentication failures during cluster recovery
  • Particularly affects the root user, causing startup issues

Mitigation Strategy:

  1. Always include rootPasswordSecretKeyRef in the mariadb.yaml definition
  2. Use User and Grant CRDs for additional database users (implemented for sqluser)
  3. Monitor cluster health after any pod restarts

Monitoring Replication Health

Check replication status across all nodes:

# Access mariaDB
kubectl exec -n <namespace> (production|staging)-mariadb-0 -- bash
mariadb -u root -p
# Check status
SHOW STATUS LIKE 'wsrep%';
SHOW STATUS LIKE 'wsrep_local_state_comment';

Expected output should show Synced for healthy nodes.

Troubleshooting

Common Issues and Solutions

Root User Replication Problem

Symptoms:

  • Pods fail to start after restart (or keeps restarting).
  • Authentication errors in pod logs

Root Cause: The root user is not replicated between Galera nodes due to the mysql.global_priv table limitation.

Solution: Manually recreate the root user on the primary node to trigger replication:

-- Connect to the primary pod (e.g., staging-mariadb-0)
-- Replace <root_password> with the actual password from your secret

DROP USER IF EXISTS 'root'@'localhost';
DROP USER IF EXISTS 'root'@'%';
CREATE USER 'root'@'localhost' IDENTIFIED BY '<root_password>';
CREATE USER 'root'@'%' IDENTIFIED BY '<root_password>';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

Debugging Steps

Follow this systematic approach for unidentified database issues:

1. Check MariaDB Resource Status

kubectl get mariadb -n <namespace>
kubectl describe mariadb -n <namespace>

2. Examine Pod Status

kubectl get pods -n <namespace> -l app.kubernetes.io/name=mariadb
kubectl logs -n <namespace> mariadb-0 --tail=100

3. Verify Galera Cluster Health

kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';"

Mediawiki Integration

MediaWiki Configuration

MediaWiki requires specific database privileges for replication to function correctly. The required privileges are defined in grant.yaml:

  • SLAVE MONITOR: Allows MediaWiki to check replication lag
  • BINLOG MONITOR: Enables binary log monitoring for replication status

Database Connection Configuration

Configure MediaWiki to use both primary and secondary services in LocalSettings.php:

$wgLBFactoryConf = array(
   'class' => 'LBFactoryMulti',
   
   'sectionsByDB' => array(
       'my_wiki' => 's1', 
   ),
   
   'sectionLoads' => array(
       's1' => array(
           '<primaryService-IP>'   => 0,    // Primary (write) node
           '<secondaryService-IP>' => 50,   // Read replica with load weight
       ),
   ),
   
   'serverTemplate' => array(
       'dbname'     => $wgDBname,
       'user'       => $wgDBuser,
       'password'   => $wgDBpassword,
       'type'       => 'mysql',
       'flags'      => DBO_DEFAULT,
       'max lag'    => 30,              // Maximum acceptable replication lag
   ),
);

Load Balancing Explanation:

  • 0 for primary: Handles writes and can serve reads
  • 50 for secondary: Handles 50% of read operations (higher numbers = more load)

Backup and Recovery

Automated Backup Process

Backups are automatically scheduled every 24 hours, with retention of 7 daily copies. The mariadb-operator manages backup creation and storage in S3, as configured in backup.yaml.

Backup Naming Convention:

  • Production backups: production-*
  • Staging backups: staging-*

Backup Verification

List available backups in S3:

s3cmd --host=hsm-test-09.zib.de:9001 \
     --host-bucket=hsm-test-09.zib.de:9001 \
     --region=us-east-1 \
     --access_key=<access_key> \
     --secret_key=<secret_key> \
     ls s3:// --human-readable-sizes

Database Restore Procedures

Prerequisites:

  • Database cluster must be running and healthy
  • Sufficient disk space for restore operation
  • Backup file must be accessible

Step 1: Copy Backup to Primary Pod

# Copy backup to the primary pod (only pod with write permissions)
kubectl cp ./portal_db_backup.gz <namespace>/mariadb-0:/var/lib/mysql/

Step 2: Prepare Backup File

For Large Backups (Recommended):

# Connect to primary pod
kubectl exec -it -n <namespace> mariadb-0 -- bash

# Navigate to MySQL data directory
cd /var/lib/mysql

# Stream decompression to avoid memory issues
gunzip -c portal_db_backup.gz > portal_backup

For Small Backups:

# Direct decompression
gzip -d portal_db_backup.gz

Step 3: Load Backup into Database

# Load backup with memory optimization
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup

Troubleshooting Import Issues:

If you encounter ERROR at line 1: Unknown command '\-':

# Remove problematic first line
sed -i '1d' portal_backup
# Retry import
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup

Step 4: Monitor Disk Space During Restore

Binary log files (prod-mariadb-bin.000001, prod-mariadb-bin.000002, etc.) are generated during replication and can quickly fill available space.

Check Replication Status:

-- Run on replica nodes to check log processing status
SHOW ALL SLAVES STATUS\G

Identify Safe-to-Delete Logs: Look for Master_Log_File and Relay_Master_Log_File values. Any binary log files with numbers below these can be safely purged.

Purge Old Binary Logs:

-- Replace XX with the appropriate log number
PURGE BINARY LOGS TO 'prod-mariadb-bin.0000XX';

Step 5: Verify Restore Success

# Check database integrity
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 USE my_wiki;
 SHOW TABLES;
 SELECT COUNT(*) FROM <key_table>;
"

# Verify replication is working
kubectl exec -n <namespace> mariadb-1 -- mariadb -u root -p -e "
 SHOW SLAVE STATUS\G
"

Maintenance

Regular Tasks

  • Check backup completion status
  • Review pod resource usage
  • Verify backup restore procedures in staging
  • Clean up old binary log files if accumulating

Performance Monitoring

Key metrics to monitor:

# Check replication lag
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 SHOW STATUS LIKE 'wsrep_flow_control_paused';
 SHOW STATUS LIKE 'wsrep_local_recv_queue';
"

# Monitor connection usage
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 SHOW STATUS LIKE 'Threads_connected';
 SHOW STATUS LIKE 'Max_used_connections';
"

Log Management

Galera and MariaDB generate extensive logs. Regular cleanup prevents disk space issues:

# Check current log sizes
kubectl exec -n <namespace> mariadb-0 -- du -sh /var/lib/mysql/prod-mariadb-bin.*

# Automated cleanup (run carefully)
kubectl exec -n <namespace> mariadb-0 -- mariadb -u root -p -e "
 PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);
"