LetsEncrypt Acme Challenge error

I’m trying to provision an existing (http) server with LetsEncrypt. I changed my /group_vars/production/wordpress_sites.yml as follows:

    ssl:
      enabled: true
      provider: letsencrypt
    env:
      wp_home: https://mydomain.com
      wp_siteurl: https://mydomain.com/wp

The error is as follows:

TASK [letsencrypt : Test Acme Challenges] **************************************
failed: [mydomain] => (item={'value': {u'theme_slug': u'sage', u'multisite': {u'enabled': False, u'subdomains': False}, u'branch': u'master', u'cache': {u'duration': u'30s', u'enabled': False}, u'repo': u'git@github.com:username/mydomain.com.git', u'ssl': {u'enabled': True, u'provider': u'letsencrypt'}, u'local_path': u'../mydomain.com', u'env': {u'wp_env': u'production', u'disable_wp_cron': True, u'wp_siteurl': u'https://mydomain.com/wp', u'db_name': u'mydomain', u'db_user': u'mydomain', u'wp_home': u'https://mydomain.com'}, u'site_hosts': [u'mydomain.com']}, 'key': u'mydomain.com'}) => {"changed": false, "failed": true, "failed_hosts": ["www.mydomain.com", "mydomain.com"], "item": {"key": "mydomain.com", "value": {"branch": "master", "cache": {"duration": "30s", "enabled": false}, "env": {"db_name": "mydomain", "db_user": "mydomain", "disable_wp_cron": true, "wp_env": "production", "wp_home": "https://mydomain.com", "wp_siteurl": "https://mydomain.com/wp"}, "local_path": "../mydomain.com", "multisite": {"enabled": false, "subdomains": false}, "repo": "git@github.com:username/mydomain.com.git", "site_hosts": ["mydomain.com"], "ssl": {"enabled": true, "provider": "letsencrypt"}, "theme_slug": "sage"}}, "rc": 1}
...ignoring

TASK [letsencrypt : Notify of challenge failures] ******************************
fatal: [mydomain]: FAILED! => {"failed": true, "msg": "ERROR! 'dict object' has no attribute 'failed_hosts'"}

PLAY RECAP *********************************************************************
mydomain                      : ok=92   changed=3    unreachable=0    failed=1   

My assumption is that this has to do with the fact that the server has already been provisioned. Any help would be much appreciated. Thanks for your efforts in getting LetsEncrypt integrated with Trellis—it’s a fantastic enhancement to the toolset.

Unfortunately we neglected to test enabling Let’s Encrypt on an already provisioned server which is the problem here I think.

We create some Nginx site configs for Let’s Encrypt “challenges”. I’m guessing these conflict with your existing WP site Nginx conf. If you want to do a little manual work and can stand a few minutes of downtime, you can remove the symlink for your WP site conf in /etc/nginx/sites-enabled.

Then re-run the provisioning process and it may all work.

Actually one easier you could try is swapping the order of these two roles: https://github.com/roots/trellis/blob/76cad9dacc5f4ed84701feeef163c724658be95d/server.yml#L31-L32

Putting letsencrypt last could work too.

1 Like

Thanks, @swalkinshaw. I can confirm that removing the symlink allowed the LE challenge config to be created and executed successfully.

1 Like

(20 characters for discourse)

5 Likes

Thanks !!

Everthing works like a charm after installing the trellis update.

Hello,

I am having the same issue after a fresh pull of trellis on an already provisioned server. I was able to provision a new server with Lets Encrypt SSL without issue. My already provisioned server is attempting to put SSL on resolvable domains pointed to the server.

I have tried running the server playbook with tags, changing the order of the roles, removing symlinks from the server, and none of it has worked. I always get the same error.

failed: [<ip>] => (item=ssltest.com) => {"changed": false, "failed": true, "failed_hosts": ["l<domain>"], "item": "ssltest.com", "rc": 1}

Could not access the challenge file for the hosts/domains: domain.com. Let's Encrypt requires every domain/host be publicly accessible. Make sure that a valid DNS record exists for domain.com and that they point to this server's IP. If you don't want these domains in your SSL certificate, then remove them from 'site_hosts'. See https://roots.io/trellis/docs/ssl for more details. failed: <ip>] => (item=ssltest.com) => {"failed": true, "item": "ssltest.com"}

nginx: [emerg] BIO_new_file("/etc/nginx/ssl/letsencrypt/ssltest.com- bundled.cert") failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/etc/nginx/ssl/letsencrypt/ssltest.com-bundled.cert','r') error:2006D080:BIO routines:BIO_new_file:no such file) nginx: configuration file /etc/nginx/nginx.conf test failed fatal: [<ip>]: FAILED! => {"changed": false, "cmd": ["nginx", "-t"], "delta": "0:00:00.042807", "end": "2016-04-26 20:14:15.923493", "failed": true, "rc": 1, "start": "2016-04-26 20:14:15.880686", "stderr": "nginx: [emerg] BIO_new_file(\"/etc/nginx/ssl/letsencrypt/ssltest.com-bundled.cert\") failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/etc/nginx/ssl/letsencrypt/ssltest.com-bundled.cert','r') error:2006D080:BIO routines:BIO_new_file:no such file)\nnginx: configuration file /etc/nginx/nginx.conf test failed", "stdout": "", "stdout_lines": [], "warnings": []}

I would prefer to not have to provision and move over all of my sites to a new droplet if possible.

Any ideas?

Thanks.

@graphbb Could you try this?

  • Set ssl enabled: false
  • Run ansible-playbook server.yml -e env=<environment> --tags wordpress
    This will create a fresh Nginx conf, which this time shouldn’t be trying to load the apparently missing file /etc/nginx/ssl/letsencrypt/ssltest.com-bundled.cert
  • Set ssl back to enabled: true
  • Run ansible-playbook server.yml -e env=<environment> --tags letsencrypt
    This time the Nginx stuff shouldn’t choke on the missing cert file

If that doesn’t work, could you post your relevant wordpress_sites list?

6 Likes

Thank you @fullyint !!

That was it, I needed the ssl disabled for the wordpress-setup task!

Probably would be good to toss in a note about it for existing sites moving to Let’s Encrypt in the docs. :wink:

I have been loving using Trellis/Bedrock for my workflow and free SSL makes it that much better. Thanks guys!

2 Likes

I’m having a similar issue, and the steps above aren’t helping.

I first cloned Trellis on March 14 of this year, and today updated it via getting a patch from the latest commit on March 14 to the latest now, and applying it to my repo. I’m pretty confident I resolved all conflicts correctly, though it’s possible I screwed something up.

I’m trying to enable Let’s Encrypt on an existing site. Whether I remove the Nginx site configuration symlink or not, and whether I disable SSL first to run the wordpress tag tasks before enabling it again to run the letsencrypt tag tasks, I still always hit the challenge failures error.

I’m using the bedrock-site-protect role on this particular environment to add HTTP basic auth. Could that be affecting anything?

When I disable SSL and run the wordpress tasks the operation is successful. Then I switch SSL back on and run the letsencypt tasks. It’s all successful as far as “Create test Acme Challenge file”. Then it tries to “Test Acme Challenges”, and from this point I get the following:

TASK [letsencrypt : Test Acme Challenges] **************************************
failed: [staging.PRIMARY.DOMAIN] (item={'value': {u'repo_subtree_path': u'site', u'multisite': {u'enabled': False}, u'cache': {u'enabled': False}, u'repo': u'git@MY.GIT.HOST:MY/REPO.git', u'ssl': {u'enabled': True, u'provider': u'letsencrypt'}, u'local_path': u'../site', u'branch': u'master', u'site_hosts': [u'staging.PRIMARY.DOMAIN', u'staging.SECONDARY.DOMAIN']}, 'key': u'PRIMARY.DOMAIN'}) => {"changed": false, "failed": true, "failed_hosts": ["staging.PRIMARY.DOMAIN", "staging.SECONDARY.DOMAIN"], "item": {"key": "PRIMARY.DOMAIN", "value": {"branch": "master", "cache": {"enabled": false}, "local_path": "../site", "multisite": {"enabled": false}, "repo": "git@MY.GIT.HOST:MY/REPO.git", "repo_subtree_path": "site", "site_hosts": ["staging.PRIMARY.DOMAIN", "staging.SECONDARY.DOMAIN"], "ssl": {"enabled": true, "provider": "letsencrypt"}}}, "rc": 1}
...ignoring

TASK [letsencrypt : Notify of challenge failures] ******************************
failed: [staging.PRIMARY.DOMAIN] (item={u'changed': False, '_ansible_no_log': False, 'failed': True, '_ansible_item_result': True, 'item': {'value': {u'repo_subtree_path': u'site', u'multisite': {u'enabled': False}, u'cache': {u'enabled': False}, u'repo': u'git@MY.GIT.HOST:MY/REPO.git', u'ssl': {u'enabled': True, u'provider': u'letsencrypt'}, u'local_path': u'../site', u'branch': u'master', u'site_hosts': [u'staging.PRIMARY.DOMAIN', u'staging.SECONDARY.DOMAIN']}, 'key': u'PRIMARY.DOMAIN'}, u'rc': 1, 'invocation': {'module_name': u'test_challenges', u'module_args': {u'path': u'.well-known/acme-challenge', u'hosts': [u'staging.PRIMARY.DOMAIN', u'staging.SECONDARY.DOMAIN'], u'file': u'ping.txt'}}, u'failed_hosts': [u'staging.PRIMARY.DOMAIN', u'staging.SECONDARY.DOMAIN']}) => {"failed": true, "item": {"changed": false, "failed": true, "failed_hosts": ["staging.PRIMARY.DOMAIN", "staging.SECONDARY.DOMAIN"], "invocation": {"module_args": {"file": "ping.txt", "hosts": ["staging.PRIMARY.DOMAIN", "staging.SECONDARY.DOMAIN"], "path": ".well-known/acme-challenge"}, "module_name": "test_challenges"}, "item": {"key": "PRIMARY.DOMAIN", "value": {"branch": "master", "cache": {"enabled": false}, "local_path": "../site", "multisite": {"enabled": false}, "repo": "git@MY.GIT.HOST:MY/REPO.git", "repo_subtree_path": "site", "site_hosts": ["staging.PRIMARY.DOMAIN", "staging.SECONDARY.DOMAIN"], "ssl": {"enabled": true, "provider": "letsencrypt"}}}, "rc": 1}, "msg": "Could not access the challenge file for the hosts/domains: staging.PRIMARY.DOMAIN, staging.SECONDARY.DOMAIN. Let's Encrypt requires every domain/host be publicly accessible. Make sure that a valid DNS record exists for staging.PRIMARY.DOMAIN, staging.SECONDARY.DOMAIN and that they point to this server's IP. If you don't want these domains in your SSL certificate, then remove them from `site_hosts`. See https://roots.io/trellis/docs/ssl for more details.\n"}

RUNNING HANDLER [common : reload nginx] ****************************************
included: /home/ME/PROJECT-PATH/code/trellis/roles/common/tasks/reload_nginx.yml for staging.PRIMARY.DOMAIN

RUNNING HANDLER [common : command] *********************************************
System info:
  Ansible 2.0.2.0; Linux
  Trellis at "Add connection-related cli options to ping command"
---------------------------------------------------
nginx: [emerg] BIO_new_file("/etc/nginx/ssl/letsencrypt/PRIMARY.DOMAIN-
bundled.cert") failed (SSL: error:02001002:system library:fopen:No such file
or directory:fopen('/etc/nginx/ssl/letsencrypt/PRIMARY.DOMAIN-
bundled.cert','r') error:2006D080:BIO routines:BIO_new_file:no such file)
nginx: configuration file /etc/nginx/nginx.conf test failed
fatal: [staging.PRIMARY.DOMAIN]: FAILED! => {"changed": false, "cmd": ["nginx", "-t"], "delta": "0:00:00.007275", "end": "2016-05-05 20:18:46.940078", "failed": true, "rc": 1, "start": "2016-05-05 20:18:46.932803", "stderr": "nginx: [emerg] BIO_new_file(\"/etc/nginx/ssl/letsencrypt/PRIMARY.DOMAIN-bundled.cert\") failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/etc/nginx/ssl/letsencrypt/PRIMARY.DOMAIN-bundled.cert','r') error:2006D080:BIO routines:BIO_new_file:no such file)\nnginx: configuration file /etc/nginx/nginx.conf test failed", "stdout": "", "stdout_lines": [], "warnings": []}

RUNNING HANDLER [common : service] *********************************************
skipping: [staging.PRIMARY.DOMAIN]
	to retry, use: --limit @server.retry

PLAY RECAP *********************************************************************
staging.PRIMARY.DOMAIN      : ok=29   changed=2    unreachable=0    failed=2

This says at one point to make sure that valid DNS records exist for these domains – they do.

It then says a file /etc/nginx/ssl/letsencrypt/* doesn’t exist; I can confirm this: that letsencrypt directory exists but is empty.

My wordpress_sites.yml file looks like this:

wordpress_sites:
  PRIMARY_DOMAIN:
    site_hosts:
      - staging.PRIMARY.DOMAIN
      - staging.SECONDARY.DOMAIN
    local_path: ../site # path targeting local Bedrock site directory (relative to Ansible root)
    repo: git@MY.GIT.HOST:MY/REPO.git
    repo_subtree_path: site # relative path to your Bedrock/WP directory in your repo
    branch: master
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: letsencrypt
    cache:
      enabled: false
    htpasswd:
      name: WEBUSER
      password: WEBPASS

Any ideas? Any more information I can provide?

So can your site be accessed without HTTP basic authentication? If it’s password protected, there’s no way that Let’s Encrypt can access your site to verify it’s challenge code.

Steps I just tried:

  1. Starting state: SSL disabled
  2. Comment out the htpasswd section, thus disabling HTTP basic auth
  3. Run the provision command with --tags wordpress
  4. Deploy (since I’m not sure if this is necessary or not)
  5. Verify the site works (yes, and no HTTP basic challenge)
  6. Enable SSL
  7. Run the provision command with --tags letsencrypt

…and yes, this time it succeeded.

I could have sworn I tried this earlier, but I guess I must be mistaken.

So the problem now is how to get HTTP basic auth and SSL both working at once.

I just tried

  1. Re-enabling htpasswd
  2. Changing the order of tasks in deploy.yml so that the bedrock-site-protect one is last (it was before the letsencrypt one)
  3. Run the provision command (no --tags, so everything happens)

It completed successfully. But I wonder if it would have failed if I had run the --tags wordpress and --tags letsencrypt batches separately.

Trying it now, and without changing any configuration, running with --tags wordpress was fine. Running with --tags letsencrypt was then also fine. So everything seems to be working now.

Is there a way to test (early) the renewal process? I’m concerned that may end up failing.

If your site has htpasswd enabled, then the renewal process will fail.

Found this: https://github.com/letsencrypt/letsencrypt/issues/1744. You could potentially do a similar solution with Nginx?

I take it, then, that Let’s Encrypt does the domain check again when renewing; is that correct?

Okay, I understand the possible solutions, and I’m sure I could figure out a way to make the nginx configuration not require authentication for that particular route. But I don’t know how I’d go about adding that to the Trellis configuration so that it’d still exist and work if I run the ansible scripts again or provision a new server.

A push in the right direction would be greatly appreciated.

Yes, Let’s Encrypt does an HTTP request to your website to make sure it can verify a specific file exists.

As to how to modify things, maybe look into a custom include: https://roots.io/trellis/docs/nginx-includes/

Or you’d have to modify the base WP conf in roles/wordpress-setup/templates

Please see https://github.com/louim/bedrock-site-protect/issues/4 – the author of the HTTP auth plugin thinks it’s more likely a Trellis issue.

Case in point: even with the HTTP basic auth enabled, I can reach the /.well-known/acme-challenge/ping.txt file via HTTP without authenticating. And it seems that renewals do in fact work just fine.

I’m still not able to explain why this is happening, partly because I haven’t deployed a live domain with the new Let’s Encrypt setup :sweat: . I’ll try to replicate the error from a clean setup this week-end.

The basic auth is applied to the site server block, and the LE server block is separated from that one. @swalkinshaw unless I’m mistaken, LE only need to hit the /.well-known/acme-challenge/* path, right?

Oh yeah, good points! I was just thinking in terms of authentication for a domain as a whole.

But yes, Let’s Encrypt only needs to be able to request /.well-known/acme-challenge/*.

Just to reiterate my situation:

I started with a non-SSL site, provisioned on an AWS server with near-stock Trellis settings. I later added HTTP auth.

I then later wanted to add SSL with Let’s Encrypt. I first added the Let’s Encrypt task to the deploy.yml file before the HTTP auth task, but I’m not sure if this makes a difference. I ran the full provision command, which failed.

I later tried all the suggestions above, and you can see my results. I don’t know the exact orders of what I tried.

I’m also having this issue. Spun up new nodes 3 times now for testing and getting inconsistent errors each time.

none of theses suggestions worked for me