Failure in Test Acme Challenges task when re-adding a site

Hi there!

I have a problem when reprovisioning a site. To make it more clear I will describe first what I want to achieve. ATM we have a staging server where I provisioned Trellis. We want to use this server as a preview server for our customer websites. So we need to add, delete, undelete and destroy sites. All sites are using letsencrypt.

Adding a new site is straight forward:

  1. Setup Letsencrypt
    $ ansible-playbook server.yml -e “env= site=” --tags letsencrypt
  2. Setup WordPress
    $ ansible-playbook server.yml -e “env= site=” --tags wordpress
  3. Deploy
    $ ansible-playbook deploy.yml -e “env= site=”

To delete, undelete and destroy a site I wrote new ansible playbooks.

The delete playbook basically takes all files for that site (the www-folder, nginx confs, ssl certs and keys), makes a db dump and moves all to a specified “delete folder” and after that it restarts nginx.

The destroy playbook will than really delete those files. These two playbooks work fine.

But I also want to have an undelete playbook where you can undo the changes done by the delete playbook and this is it where it gets complicated. To re-add the site I do the following:

  1. Setup Letsencrypt
    $ ansible-playbook server.yml -e “env= site=” --tags letsencrypt

  2. Setup WordPress
    $ ansible-playbook server.yml -e “env= site=” --tags wordpress

  3. Undelete site
    $ ansible-playbook undelete.yml -e “env= site=”
    –> imports the db dump and moves the site back to srv/www/

But it fails at step 1 on the letsencrypt task “Test Acme Challenges” for the site I want to re-add. The Task “Notify of challenge failures” tells me than that it can not access the challenge file for that domain. Restarting nginx is also not working anymore. It says: “nginx: configuration file /etc/nginx/nginx.conf test failed”, “Process: 32077 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=1/FAILURE)” and “Failed to start A high performance web server and a reverse proxy server.”

So I am pretty sure that my delete playbook does not work properly. Since I can not upload the playbook/role here I will tell you what I am doing in that role exactly:

  1. Create database dump and delete database
  2. Remove .conf files for that site from …nginx/sites-available and remove symlink from …nginx/sites-enabled
  3. Remove cron job file from …cron.d folder
  4. Remove certificates and keys for that site from …nginx/ssl/letsencrypt
  5. Restart nginx

On monday (in three days) I have to present my solution for our dedicated webserver in my office. So I would really appreciate if someone can help me out here.

Thanks in advance!

  • Philipp

I don’t know the exact issue here, but for a LE challenge to work, the following has to be true:

  1. DNS exists for that hostname and points to your server
  2. Path to the site exists on the server (/srv/www/site_name/current)

So I’m assuming that your delete/undelete process isn’t completing that 2nd requirement?

Thanks for your reply.
I investigated a little bit further and it seems like a letsencrypt (diffie-hellman) group problem. When it comes to SSL I am a total newbie but what I found out is after I deleted a site and its ssl certs/keys I need to generate a new Diffie-Hellman group and also regenerate all the ssl keys (for all the sites) and than it is working again. But I have to investigate a little bit more to get a better understanding of what is going on exactly.

Just FYI …
I adapted my playbooks. I guess the problem was that if you delete a website and then run the letsencrypt task it will fail because of the facts @swalkinshaw mentioned. Thats is why I have now a site-enable, site-disable and a site-destroy playbook. For the latter one you need to delete the entry for that site in wordpress_sites.yml and vault.yml. The enabling and disabling playbooks just set or delete the symlink in sites-enabled. Thats seems to work…