Project:S3-K8s
MediaWiki S3 Image Storage with Proxy
This documentation describes how to configure MediaWiki to store images in an S3 bucket that's only accessible within the ZIB VPN, using a public proxy service to make images accessible to external users.
Architecture Overview
The solution consists of three main components:
- MediaWiki with AWS Extension: Stores uploaded images directly to S3 instead of local filesystem
- S3 Bucket: Private storage accessible only within ZIB's VPN
- S3 Proxy Service: Public-facing service that authenticates with S3 and serves images to external users
User Upload → MediaWiki (AWS Extension) → Private S3 Bucket
↓
External User → S3 Proxy Service → Private S3 Bucket
Benefits
- Scalable Storage: Images stored in S3 instead of local filesystem
- Automatic Backup: S3 provides built-in redundancy and backup
- Stateless MediaWiki: Mediawiki replicas can be created/destroyed without data loss
- Public Access: External users can view images despite private S3 bucket
- Security: S3 bucket remains private within institution network
MediaWiki Configuration
Install AWS Extension
Follow the documentation in GitHub for the AWS extension.
Add the configuration in LocalSettings.php. Our specific configuration currently includes:
wfLoadExtension( 'AWS' );
$wgAWSCredentials = [
'key' => getenv('S3_IMAGES_KEY'),
'secret' => getenv('S3_IMAGES_SECRET'),
'token' => false
];
$s3endpoint = getenv('S3_ENDPOINT');
$wgFileBackends['s3']['endpoint'] = 'https://' . $s3endpoint;
$wgFileBackends['s3']['use_path_style_endpoint'] = true;
$wgAWSRegion = 'default';
$wgAWSBucketName = 'mardi-portal';
$wgAWSBucketDomain = 'images.' . getenv('WIKIBASE_HOST');
$wgAWSBucketTopSubdirectory = "/" . getenv('S3_ENVIRONMENT');
$wgAWSRepoHashLevels = '2';
$wgAWSRepoDeletedHashLevels = '3';
S3 Proxy Service
The S3 proxy service acts as a public gateway to our private S3 bucket, handling authentication and serving images with appropriate headers and caching. The public repository for the S3 proxy can be found here.
Key Features
- Content Type Detection: Automatically determines MIME types based on file extensions
- Caching Headers: Sets appropriate cache headers for better performance
- Conditional Requests: Supports ETag and Last-Modified headers for efficient caching
- Security Headers: Adds security headers for different content types
- Health Checks: Provides health check endpoint for Kubernetes monitoring
Environment Variables
The proxy service requires these environment variables:
S3_REGION=region
S3_ENDPOINT=https://s3-endpoint
S3_BUCKET_NAME=bucket-name
S3_ACCESS_KEY_ID=access-key
S3_SECRET_ACCESS_KEY=secret-key
Kubernetes Deployment
The previous image is deployed using a Helm chart defined in our kubernetes repository.
It is also required to properly store the previously mentioned environmental variables as secrets in the cluster.
Configuration Values
The deployment can be customized by modifying values.yaml:
image:
repository: ghcr.io/mardi4nfdi/s3-proxy
tag: main
pullPolicy: Always
replicas: 2
servicePort: 80
containerPort: 8000
ingress:
host: "images.your-domain.com"
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Verification
Test S3 Proxy
# Check health endpoint
curl https://images.your-domain.com/health
# Test image access
curl -I https://images.your-domain.com/path/to/image.jpg
Test MediaWiki Integration
- Check that thumbnails are shown under
Special:ListFiles - Upload an image through MediaWiki interface at
Special:Upload - Verify the image appears correctly on wiki pages
- Check that the image URL points to the proxy domain
- Confirm the image is stored in the S3 bucket using s3cmd:
s3cmd --host=<endpoint> --host-bucket=<endpoint> --region=<region> --access_key=<access_key> --secret_key=<secret_key> ls s3://your-bucket-name/
Monitoring
The S3 proxy includes health check endpoints and structured logging. Monitor these metrics at grafana:
- Response times and error rates
- S3 API call success/failure rates
- Cache hit/miss ratios
- Resource usage (CPU/memory)