Could not access the challenge file for the hosts/domains: www.example.com

mZoo · November 9, 2020, 4:30pm

I see that this is not an undocumented probelm (here, here) and have tried:’

Set SSL to false in group_vars/env/wordpress-sites.yml
trellis provision --tags wordpress production
Set SSL to true in group_vars/env/wordpress-sites.yml
trellis provision --tags letsencrypt production
Manually edit /etc/nginx/sites-available/example.com.conf:

From:

109   location / {
110     return 301 http://example.com$request_uri;
111   }

to:

109   location / {
110     return 301 http://$host$request_uri;
111   }

Run: sudo service nginx reload

Additionally tried cycling nginx with

109   location / {
110     return 301 http://www.example.com$request_uri;
111   }

Still same error.

Does this look like correct location and content for acme-challenge-location.conf

$ /etc/nginx/acme-challenge-location.conf
location ^~ /.well-known/acme-challenge/ {
  alias /srv/www/letsencrypt/;
  try_files $uri =404;
}

In troubleshooting I also deleted some files, emptying /srv/www/letsencrypt/ and /var/lib/letsencrypt/csrs/.

swalkinshaw · November 9, 2020, 4:47pm

What’s the scenario here? Is this a new server with a new domain? Some more background details would be helpful.

edit: I just realized we tag the wordpress role with letsencrypt
One important thing is that you can’t just run the wordpress or letsencrypt tags when you toggle those values. I think just wordpress is fine if you turn SSL off, but you’d definitely want to run letsencrypt,wordpress when you toggle it back on.

mZoo · November 9, 2020, 4:51pm

It’s an older server. Ubuntu 18. I had, maybe partially updated the Trellis codebase. Renewal errored out because the letsencrypt emails hadn’t been set.

I also updated /roles/fail2ban/defaults/main.yml and

Had to change roles/wordpress-setup/tasks/nginx.yml to state: "{{ item.enabled | default(true) | ternary('link', 'hard') }}" (from absenttohard`)

And yes, get the error now when running the wordpress tasks with SSL set to true.

Thanks much, Scott. What would I do without you?

mZoo · November 9, 2020, 6:19pm

Ahah. On my local computer:

curl http://example.com/.well-known/acme-challenge/ping.txt -w "%{http_code}"
200%

And

curl http://www.example.com/.well-known/acme-challenge/ping.txt -w "%{http_code}"
200%

On the server of from another server:

curl http://www.example.com/.well-known/acme-challenge/ping.txt -w "%{http_code}"
curl: (6) Could not resolve host: www.example.com

No reference to the DNS in local /etc/hosts file.

I’m not sure what that means or how to fix it.

mZoo · November 9, 2020, 7:32pm

Looks like this is at issue:

 /etc/nginx/ssl/letsencrypt/example.com.key: No such file or directory

strarsis · November 9, 2020, 7:57pm

Also check whether there is an IPv6 AAAA record for your domain, Let’s Encrypt prefers those over the IPv4 A records for HTTP-01 validation. Verify that the server is correctly listening on IPv6 address.

mZoo · November 9, 2020, 8:10pm

Thanks. I’m not sure how to do that but will look into it. When I reprovision (wordpress tasks) without SSL, curl on the http address returns 301 permanent redirect.

curl http://example.com -w "%{http_code}"
301

When I check AAAA record with https://mxtoolbox.com it returns

Test	                Result	
DNS Record Published	DNS Record not found

Looks like nginx is listening:

netstat -tlnp | grep nginx
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      957/nginx: master p 
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      957/nginx: master p 
tcp6       0      0 :::80                   :::*                    LISTEN      957/nginx: master p 
tcp6       0      0 :::443                  :::*                    LISTEN      957/nginx: master p

swalkinshaw · November 9, 2020, 8:24pm

If there’s no key file, then yeah something has likely gone wrong. Can you just re-create the server from scratch and try again?

mZoo · November 9, 2020, 8:25pm

You mean with a new droplet/IP address?

swalkinshaw · November 9, 2020, 8:27pm

Yeah, unfortunately I think if you rebuild a droplet the IP will change (unless you’re using a floating IP).

mZoo · November 9, 2020, 8:28pm

It would be great to avoid having to change the IP. There’s a bit of bureaucracy between us and the registrar.

swalkinshaw · November 9, 2020, 8:32pm

To avoid that, I would delete any files in /etc/nginx/ssl/letsencrypt/ and /srv/www/letsencrypt, and any *.csr files in /var/lib/letsencrypt.

Then provision again (without any tags, so everything).

mZoo · November 9, 2020, 8:37pm

I’m running that now with -vvv and fingers crossed.

Same Error

Could not access the challenge file for the hosts/domains: www.example.com

Should the challenge file be trying to load over www?

    site_hosts:
      -
        canonical: example.com
        redirects:
          - www.example.com

Will probably provision a new server if this fails. For Ubuntu 20, should I use the master branch of Trellis? Maybe I should just stick with Ubuntu 18 for now.

mZoo · November 9, 2020, 9:45pm

I was able to restore the /etc/nginx/ssl/letsencrypt/example.com.key from a previous Snapshot.

Now getting a 200 on curl http://example.com/.well-known/acme-challenge/ping.txt -w "%{http_code}".

Going to try cycling ssl: false, ssl: true again.

Now getting:

non-zero return code
The required CSR file /var/lib/letsencrypt/csrs/phytrehab.com-3224635.csr
does not exist. This could happen if you changed site_hosts and have not yet
rerun the letsencrypt role. Create the CSR file by re-provisioning (running
the Trellis server.yml playbook) with `--tags letsencrypt`

The run:

ansible-playbook server.yml -e env=production --tags letsencrypt -vvv

And back to this again:

Could not access the challenge file for the hosts/domains: www.example.com

Running curl without www succeeds, with www it fails (curl: (6) Could not resolve host).

When I run the wordpress tasks with ssl set to false, the browsers are still trying to load the site over https. Is that to be expected?

mZoo · November 9, 2020, 10:23pm

Interesting. Removed the www redirect from wordpress-sites.yml and got a lot further. on the letsencrypt tasks.

Failed at `non-zero return code
nginx: [emerg] "resolver" directive is duplicate in
/etc/nginx/h5bp/directive-only/ssl-stapling.conf:37
nginx: configuration file /etc/nginx/nginx.conf test failed.

Running it a second time seemed to succeed.

Then success with:

ansible-playbook server.yml -e env=production --tags wordpress -vvv

Sites not loading on front end, though. Trying to run letsencrypt tasks again with redirect reinstated.

Content of ssl-stapling.conf:

 23 ssl_stapling on;
 24 ssl_stapling_verify on;
 25 
 26 resolver
 27   # (1)
 28   1.1.1.1 1.0.0.1 [2606:4700:4700::1111] [2606:4700:4700::1001]
 29   # (2)
 30   8.8.8.8 8.8.4.4 [2001:4860:4860::8888] [2001:4860:4860::8844]
 31   # (3)
 32   # 216.146.35.35 216.146.36.36
 33   valid=60s;
 34 #trusted cert must be made up of your intermediate certificate followed by root certificate
 35 #ssl_trusted_certificate /path/to/ca.crt;
 36 
 37 resolver 8.8.8.8 8.8.4.4 216.146.35.35 216.146.36.36 valid=60s;
 38 resolver_timeout 2s;

Commenting out line 37 seems to have solved that issue.

mZoo · November 9, 2020, 10:45pm

So for others with same issue (or next time I have it), I guess I would say:

Don’t delete /etc/nginx/ssl/letsencrypt/example.com.key!
DO Server backups may well be worth paying for (I had a snapshot by luck)
Set ssl to false, running wordpress tagged tasks, and possibly just letsencrypt tasks
Now set ssl back to true
Removing redirects from site hosts may also help, particularly if it’s one of the redirects that is coming up in the error output.

Thanks for the time and input @swalkinshaw and @strarsis.

mZoo · November 9, 2020, 11:41pm

@swalkinshaw Where on the server do the letsencrypt_contact_emails end up? I see that they are referenced by the python script that runs the renewal.

strarsis · November 9, 2020, 11:56pm

Be wary when setting the letsencrypt_contact_emails variable:

mZoo · November 10, 2020, 12:13am

I’m having trouble finding that file on the server. Do you know where it gets generated?

strarsis · November 10, 2020, 12:34am

The GItHub search indeed seems to have issues finding some files,
it is in the Trellis repository however, as a template:

github.com

roots/trellis/blob/master/roles/letsencrypt/templates/renew-certs.py

#!/usr/bin/env python3

import os
import sys
import time

from subprocess import CalledProcessError, check_output, STDOUT

failed = False
letsencrypt_cert_ids = {{ letsencrypt_cert_ids }}

for site in {{ sites_using_letsencrypt }}:
    csr_path = os.path.join('{{ acme_tiny_data_directory }}', 'csrs', '{}-{}.csr'.format(site, letsencrypt_cert_ids[site]))
    bundled_cert_path = os.path.join('{{ letsencrypt_certs_dir }}', '{}-bundled.cert'.format(site))
    bundled_hashed_cert_path = os.path.join('{{ letsencrypt_certs_dir }}', '{}-{}-bundled.cert'.format(site, letsencrypt_cert_ids[site]))

    # Generate or update root cert if needed
    if not os.access(csr_path, os.F_OK):
        failed = True
        print('The required CSR file {} does not exist. This could happen if you changed site_hosts and have '

This file has been truncated. show original