Project:MariaDB-K8s
MariaDB configuration
Set up the DB
The main database of the portal is based on MariaDB. Our MariaDB instance is deployed an managed using mariadb-operator, which allows to use Custom Resource Definitions (CRDs) in Kubernetes to declaratively manage our database. In simple terms, this allows us to define all our DB resources (databases, uses, grants...) using YAML files.
The MariaDB operator takes care of the DB replication to spin up a main node and several other read replicas.
The following definition in mariadb.yaml assigns an external IP to access the DB from outside the cluster.
primaryService:
type: LoadBalancer
secondaryService:
type: LoadBalancer
primaryService represents the write node whereas secondaryService refers to the read-only replicas. With
kubectl get svc
it is possible to see which specific IP was assigned to these services. These IPs are necessary to set up the connection between the MediaWiki container and the DB.
MediaWiki configuration
MediaWiki requires its SQL user to have the SLAVE MONITOR and BINLOG MONITOR privileges in order for replication to work. These privileges are thus defined in grant.yaml.
The connection between MediaWiki and the different DB nodes can be specified with the following parameter in LocalSettings.php, substituting primaryService-IP and secondaryService-IP by the right values:
$wgLBFactoryConf = array(
'class' => 'LBFactoryMulti',
'sectionsByDB' => array(
'my_wiki' => 's1',
),
'sectionLoads' => array(
's1' => array(
'<primaryService-IP>' => 0,
'<secondaryService-IP>' => 50,
),
),
'serverTemplate' => array(
'dbname' => $wgDBname,
'user' => $wgDBuser,
'password' => $wgDBpassword,
'type' => 'mysql',
'flags' => DBO_DEFAULT,
'max lag' => 30,
),
);
Backup
- Backups for the database are scheduled every 24h. We simultaneously keep 7 copies for the last 7 days.
- MariaDB-operator takes care of creating the backup and pushing them into an S3 bucket, as defined in backup.yaml.
- Production backups are prefixed with
production, staging backups use the prefixstaging. - The files in the S3 bucket can examined using s3cmd:
s3cmd --host=hsm-test-09.zib.de:9001 --host-bucket=hsm-test-09.zib.de:9001 --region=us-east-1 --access_key=<access_key> --secret_key=<secret_key> ls s3:// --human-readable-sizes
Restore a database to the cluster
To restore or load a backup file once the DB is running.
1) Copy the backup file inside the main DB pod. The main pod is the only one with write rights. The file can be copied into the volume attached to the mariadb-0 pod, which will have enough space for it. This is mounted to /var/lib/mysql:
kubectl cp ./portal_db_backup.gz (production|staging)/mariadb-0:/var/lib/mysql
2) Unzip and load the file into the database:
Unzipping directly will probably fail with out of memory errors when the backup file is big.
cd /var/lib/mysql
gzip -d portal_db_backup.gz
Therefore it might be better to stream the compressed file into a new file:
gunzip -c portal_db_backup.gz > portal_backup
3) The backup can then be loaded into the database using:
mariadb --max_allowed_packet=1G -u root -p my_wiki < portal_backup
The --max_allowed_packet can be tweaked to avoid memory errors that could crash the pod.
If this command results in the error
ERROR at line 1: Unknown command '\-'.
it might be fixed by just editing the first line in the file with
sed -i '1d' portal_backup
4) During the restoration of the database, it is important to monitor the available disk space. Several log files, numbered from prod-mariadb-bin.000001 onwards, will be generated during replication and can quickly fill up the disk.
To check whether these files can already be deleted, one should access the replicas and run
SHOW ALL SLAVES STATUS\G
to check whether the log file has already been processed. For this, check the both current Master_Log_File or Relay_Master_Log_File. Any log files below the number mentioned in these two fields can be already deleted with:
PURGE BINARY LOGS TO 'prod-mariadb-bin.0000XX';