Originally published at: https://roots.io/guides/backing-up-trellis-sites-to-an-s3-bucket/
Add backup shell script Create site/scripts/backup-to-s3.sh with the following contents: #!/bin/bash eval $(cat …/.env | sed ‘s/^/export /’) export AWS_CONFIG_FILE="/home/web/.aws/config" SITE="${DB_USER//_/.}" ENVIRONMENT="$WP_ENV" TIMESTAMP=env TZ=America/Denver date +%Y-%m-%d-%H%M
ARCHIVE_PATH=/tmp/$SITE-$ENVIRONMENT-$TIMESTAMP ARCHIVE_FILENAME=$SITE-$ENVIRONMENT-$TIMESTAMP.tar.gz mkdir -p $ARCHIVE_PATH && cd /srv/www/$SITE/current && wp db export $ARCHIVE_PATH/db.sql && rsync -kavzP --exclude web/wp/ --exclude web/wp-config.php /srv/www/$SITE/current/web $ARCHIVE_PATH && rsync -kavzP /srv/www/$SITE/shared/uploads $ARCHIVE_PATH/web/app && tar…
Thanks, @ben. You rock.
Parameters
I believe that the trellis/group_vars/all/vault.yml
parameters should be:
aws_access_key_id: xxxxxxx
aws_secret_access_key: "xxxxxxx"
This way the dstil aws-cli template will be able to grab them:
vendor/roles/aws-cli/aws_cli_config.js
:
[default]
output = {{ aws_output_format }}
region = {{ aws_region }}
aws_access_key_id = {{ aws_access_key_id }}
aws_secret_access_key = {{ aws_secret_access_key }}
(I’m not sure if region matters, and not sure where the first two parameters would belong, perhaps as group_vars/production/wordpress_sites.yml
env variables?)
Shell Script
Name of script referenced in cron job is backup-to-s3.sh
, not backup.sh
.
Can we note that the s3
bucket referenced in the third line of the script must be modified to match an s3 bucket that has been:
- Created manually by the user either via the AWS interface or some other tool like awscli or s3cmd.
- Will need to be unique among all s3 buckets as they all share the same namespace.
It would also be nice to remind folks that if their server is already provisioned and they want to save time they can just run the modified tasks:
ansible-playbook server.yml -e env=production --tags "wordpress-setup, aws-cli"
Two more steps:
- add one parameter to
group_vars/all/users.yml
:aws_cli_user: web
Otherwise by default, aws-cli
credentials are set for admin-user
, while cron script (backup-to-aws.sh
) owned and run by web:www-data
.
- Permissions on backup-to-s3.sh need to be
755
aka-rwxr-xr-x
, akachmod +x backup-to-s3.sh
and not644
or/backup-to-s3.sh
won’t work.
The following task, also added to wordpress-setup/tasks/main.yml
:
- name: Update 'backup-to-s3.sh' permissions
file:
path: "{{ www_root }}/{{ item.key }}/{{ item.value.current_path | default('current') }}/scripts/backup-to-s3.sh"
owner: "{{ web_user }}"
group: "{{ web_group }}"
mode: 0755
with_dict: "{{ wordpress_sites }}"
Troubleshooting
On the production server:
- Read the output from the ansible provisioning process
- Confirm that the credentials from
group_vars/all/vault.yml
exist in/home/web/.aws/config
- As user
web
, runaws s3 ls s3://your-unique-namespace-site-backups
- As user
web
, runaws s3 cp some_arbitrary_file s3://your-unique-namespace-site-backups
- As user
web
, runbash /srv/www/example.com/current/scripts/backup-to-s3.sh
manually - Confirm that file
/etc/cron.d/backup-nightly-example_com
exists and contains
Contents should be:
0 12 * * * web cd /srv/www/example.com/current/scripts && ./backup-to-s3.sh > /dev/null 2>&1
To debug script, run ./backup-to-s3.sh
from its directory:
cd /srv/www/my-site.com/current/scripts && ./backup-to-s3.sh
instead
bash /srv/www/my-site.com/current/scripts/backup-to-s3.sh
Otherwise the line 2 throw an error:
$ bash /srv/www/my-site.com/current/scripts/backup-to-s3.sh
cat: ../.env: No such file or directory
/srv/www/my-site.com/current/scripts/backup-to-s3.sh: line 10: cd: /srv/www//current: No such file or directory
Also, I suggest to add this in the document guide:
Change vendor/roles/aws-cli/defaults/main.yml
aws_access_key_id: 'YOUR_ACCESS_KEY_ID'
aws_secret_access_key: 'YOUR_SECRET_ACCESS_KEY'
to
aws_access_key_id: '{{ vault_aws_access_key_id }}'
aws_secret_access_key: '{{ vault_aws_secret_access_key }}'
I believe these lines won’t make a difference as they are default values that get replaced by the values exported to the environment by the bash script:
export AWS_CONFIG_FILE="/home/web/.aws/config"
Thanks for sharing your findings, it is very helpful!
Something I ran across is, when following step 2 the permissions do get updated. However, on a new deploy the file permissions get written as 664. the file modification only occurs on a server provision.
This means that the update task of the backup files should be triggered during the deploy playbook if I understand it correctly.
I then added the following task in roles/deploy/hooks/build-after.yml to make sure the file was set to 775:
- name: Update ‘day-to-s3.sh’ permissions
file:
path: “{{ deploy_helper.new_release_path }}/scripts/day-backup-to-s3.sh”
mode: 0755
with_dict: “{{ wordpress_sites }}”
And from then on, after each deploy the update worked as expected.
NOTE: requirements.yml
is now galaxy.yml
.
So ansible-galaxy install -r galaxy.yml
If provisioning a server for the first time, the above task: Set permissions on the backup-to-s3
file, fails. This is because the script hasn’t been created yet via deploying.
A simple option is to simply provision the server and deploy at least once without this task, then re-provisioning with the backup script in place. Another option is to set the Ansible task to only change permissions on the file if it exists:
- name: Register if backup-to-s3.sh exists
stat:
path: "{{ www_root }}/{{ item.key }}/{{ item.value.current_path | default('current') }}/scripts/backup-to-s3.sh"
register: stat_result
with_dict: "{{ wordpress_sites }}"
- set_fact:
files_stat: "{{ dict(my_keys|zip(my_stats)) }}"
vars:
my_keys: "{{ stat_result.results|map(attribute='item.key')|list }}"
my_stats: "{{ stat_result.results|map(attribute='stat.exists')|list }}"
- name: Update 'backup-to-s3.sh' permissions
file:
path: "{{ www_root }}/{{ item.key }}/{{ item.value.current_path | default('current') }}/scripts/backup-to-s3.sh"
owner: "{{ web_user }}"
group: "{{ web_group }}"
mode: 0755
with_dict: "{{ wordpress_sites }}"
when: files_stat[item.key]
What is happening above is that in the first task references to item
(item.key
, etc) are pulled from the content of the wordpress_sites
python dictionary defined in group_vars/[env]/wordpress_sites.yml
, referenced via the with_dict
parameter.
We use Ansible’s stat
module to register the path, and one of the results of that registration will be whether or not the item exists. Now stat_result.results
will contain a list which set_fact
uses to create a new dictionary mapping item.key
to it’s associated stat.exists
(True or False) state. It does this by creating two lists (my_keys
and my_stats
) and zipping them together into a dictionary, assigned to the files_stat
variable, available to subsequent tasks.
So in the final task, “Set permissions…”, will check for True or False status of files_stat[item.key]
in it’s when
step and skip this task if the file doesn’t exist.
I will also note here, probably for my own future reference that:
- debug:
var: files_stat_or_any_other_key_from_wordpress_sites_dict
with_dict: "{{ wordpress_sites }}"
Is super useful in developing and debugging.
Thank you again, Stack Overflow.
Now that I’m updating an account to use a different aws key, finding that it does seem to be necessary to update vendor/roles/aws-cli/templates/aws_cli_credentials.yml
.
In my case, I have them within a dictionary
, so need to access them by dot notation:
aws_access_key_id = {{ vault_wordpress_env_defaults.aws_access_key_id }}
aws_secret_access_key = {{ vault_wordpress_env_defaults.aws_secret_access_key }}
Where the contents of group_vars/all/vault.yml
looks like:
vault_wordpress_env_defaults:
delicious_brains_username: "blabla"
delicious_brains_password: "bla bla"
aws_access_key_id: blablabla
aws_secret_access_key: "bla bla bla"
Additionally, the ansible galaxy
aws cli role is now depreciated by it’s maintainer, so going forward may need to use a fork or some other solution.
Having a new issue with this:
When script tries to run:
An error occurred (AccessDenied) when calling the CreateMultipartUpload operation
However, I can run the command generated successfully:
web@example: export AWS_CONFIG_FILE="/home/web/.aws/config"
web@example: /usr/local/bin/aws s3 cp /tmp/example.com-production-2022-05-09-1818.tar.gz s3://example-site-backups/
This quickly copies the file. The script doesn’t:
#!/bin/bash
eval $(cat ../.env | sed 's/^/export /')
export AWS_CONFIG_FILE="/home/web/.aws/config"
SITE="${DB_USER//_/.}"
ENVIRONMENT="$WP_ENV"
TIMESTAMP=`env TZ=America/New_York date +%Y-%m-%d-%H%M`
ARCHIVE_PATH=/tmp/$SITE-$ENVIRONMENT-$TIMESTAMP
ARCHIVE_FILENAME=$SITE-$ENVIRONMENT-$TIMESTAMP.tar.gz
mkdir -p $ARCHIVE_PATH
cd /srv/www/$SITE/current && wp db export $ARCHIVE_PATH/db.sql &&
rsync -kavzP --exclude web/wp/ --exclude web/wp-config.php /srv/www/$SITE/current/web $ARCHIVE_PATH &&
rsync -kavzP /srv/www/$SITE/shared/uploads $ARCHIVE_PATH/web/app &&
tar -C $ARCHIVE_PATH -czf /tmp/$ARCHIVE_FILENAME . &&
/usr/local/bin/aws s3 cp /tmp/$ARCHIVE_FILENAME s3://example-site-backups/ &&
rm -rf $ARCHIVE_PATH &&
rm /tmp/$ARCHIVE_FILENAME
Any ideas?
UPDATE: the credentials in the current/.env
file, generated from group_vars/all/vault.yml
were incorrect.
Curious if others are still utilizing this backup method and.
- If not, what are you using?
- Suggestions for dynamically cleaning up old, daily backups.
For #2 take a look at S3 lifecycle rules for purging old backups
Thanks, Ben.
Anyone else who these tips might help, you can do it manually via the web dashboard which is pretty immediate. I played around a bit with doing with Aws-cli.
aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json
Where lifecycle.json
might look like:
{
"Rules": [
{
"Expiration": {
"Days": 180,
"ExpiredObjectDeleteMarker": true
},
"ID": "Delete files after 180 days",
"Filter": {
"And": {
"Prefix": "test"
}
},
"Status": "Enabled"
}
]
}
(Of course there are more sophisticated options, like migrating to long-term storage, using tags, etc.)
Get the config with
aws s3api get-bucket-lifecycle --bucket my-bucket
Not to be confused with a similar aws cli
tool, s3control
as opposed to s3api
.
If you need to retrieve your account number:
aws sts get-caller-identity --query 'Account' --output text
@ben I can’t seem to get the original post up. The referenced URL keeps forwarding to this one.
@mZoo guides are currently in weird state due. They’re being temporarily redirected while we do some work on docs, and due to how they were posted to Discourse from the WordPress site they’re functioning odd on here.
You’ll have to use archive.org at the moment to reach the original content
Here’s an updated script that does a few things slightly differently:
- Send you an email if backup failed
- Remove the archive files even if backup fails, so they don’t clog up storage
- Set storage class to Glacier, Instant Retrieval, which is less expensive class to store.
#!/bin/bash
eval $(cat ../.env | sed 's/^/export /')
export AWS_CONFIG_FILE="/home/web/.aws/config"
SITE="${DB_USER//_/.}"
ENVIRONMENT="$WP_ENV"
TIMESTAMP=`env TZ=America/Denver date +%Y-%m-%d-%H%M`
ARCHIVE_PATH=/tmp/$SITE-$ENVIRONMENT-$TIMESTAMP
ARCHIVE_FILENAME=$SITE-$ENVIRONMENT-$TIMESTAMP.tar.gz
RESPONSE=$(mkdir -p $ARCHIVE_PATH &&
cd /srv/www/$SITE/current && wp db export $ARCHIVE_PATH/db.sql &&
rsync -kavzP --exclude web/wp/ --exclude web/wp-config.php /srv/www/$SITE/current/web $ARCHIVE_PATH &&
rsync -kavzP /srv/www/$SITE/shared/uploads $ARCHIVE_PATH/web/app &&
tar -C $ARCHIVE_PATH -czf /tmp/$ARCHIVE_FILENAME . &&
/usr/local/bin/aws s3 cp --storage-class GLACIER_IR /tmp/$ARCHIVE_FILENAME s3://some-site-backups/$ARCHIVE_FILENAME 2>&1)
if [ $? -gt 0 ] ; then
/usr/sbin/sendmail -i -t << MESSAGE_END
From: wordpress@example.com
To: admin@mystudio.com
Subject: There was an error running the backup script
Here's what happened:
$RESPONSE
-Your friend, shell.
MESSAGE_END
fi
rm -rf $ARCHIVE_PATH &&
rm /tmp/$ARCHIVE_FILENAME