Backing up to S3
I don't think I need to tell you how important it is to back up your production database. Having said this, a replica set is NOT what I mean when I say "backing up". A replica set is always synced, so if something bad happens to your data, the bad things are synced to the replica. What you need is a scheduled backup for the worst case scenario. And what could be better suited to store a database dump than S3? So here we're going to learn how to backup a MongoDB to S3, but you could basically use this knowledge to backup any database (MySQL, Postgres, ...).
This article aims at people that are hosting their database themselves. People that rent servers, install MongoDB on those servers and maintain the setup themselves. There are also managed MongoDB setups, like for example Atlas, that would take care of those concerns for you.
The backup script
So let's get started with the final script and then let's disassemble it.
#!/bin/bash # Set up necessary variables backup_name=backup_`date +%Y-%m-%d-%H%M` backup_path=~/s3-backups/$backup_name log_path=~/s3-backups/$backup_name.log s3_location=s3://my-backups/$backup_name # Dump the database mongodump --out $backup_path &> $log_path # Upload to S3 aws s3 cp $backup_path $s3_location --recursive &>> $log_path # Send parts of the logs by email to check if everything went well grep -hnr "done dumping" $log_path | mail -s "Backup Status: Dumped Collections" youremail@example.com aws s3 ls $s3_location --recursive | wc -l | mail -s "Backup Status: Upload" youremail@example.com # Cleanup rm -rf $backup_path
So what we're doing here is we:
- Set up the varibles we need
-
Run
mongodump
. If you're using another database system, this will be a different command - Upload it to S3
- Send an email, to verify that everything worked (optional)
- Remove the dump, so you're not running out of disk space.
The steps needed to make the script work
To make this work, we still have a few missing parts. You will need to:
- Create an S3 bucket "my-backups".
- Create a Lifecycle Rule for your bucket (optional but recommended). You can read more about creating life cycle rules at: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html. I've created a rule, that archives backups to AWS Glacier DeepArchive after one week and permanently erases them after two months.
-
In AWS IAM, create a new policy:
Show Policy
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-backups" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject" ], "Resource": [ "arn:aws:s3:::my-backups/*" ] } ] }
-
Install the aws-cli. You could just run
sudo apt-get install aws
to do so. Then runaws configure
to grant the scripts access to AWS. Configure with the Access Key and Secret you obtained for the user created previously. - Install the email client (optional). I've described how this works in a separate article here: Sending Emails from Ubuntu
You can now check if everything is working correctly by running your script: ./backup-s3.sh.
(after you've ran chmod +x ./backup-s3.sh
)
The cron job
Now there is one missing piece to the puzzle. Your script needs to be scheduled! Here, a cronjob comes in handy. You can set up a new cronjob by running crontab -e
. Then you can insert the following script:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 0 * * * * ./backup-s3.sh
The first line is important, otherwise the cronjob doesn't have access to your aws cli. This is the script for an hourly backup. You can adapt it to your needs.
Well, that's it! Now you have backups, and you'll also be informed about the status of the backups. Of course, the status update once an hour might get a bit annoying. You can change the backup script, such that it just sends the mails once a day:
if [[ "$backup_name" == *"0600" ]]; then # send the mail fi
This would just send the logs generated at 6 am.
Final Notes
You should run this script on the replica server and not the primary database server. This takes the load off of the critical server. See for example this Stackoverflow discussion.