SSL problem; certificate points to another website on same server

When I try to reprovision a server it fails at TASK [letsencrypt : Generate the certificates].

Snippet of the error message:

Not creating a new certificate.\n\nCertificate file /etc/nginx/ssl/letsencrypt/mywebsite.nl-53859fd.cert already exists\nGenerating certificate for mywebsite.nl\nError while generating certificate for mywebsite.nl\nTraceback (most recent call last):\n File \"/usr/local/letsencrypt/acme_tiny.py\", line 198, in <module>\n main(sys.argv[1:])\n File \"/usr/local/letsencrypt/acme_tiny.py\", line 194, in main\n signed_crt = get_crt(args.account_key, args.csr, args.acme_dir, log=LOGGER, CA=args.ca)\n File \"/usr/local/letsencrypt/acme_tiny.py\", line 149, in get_crt\n domain, challenge_status))\nValueError: www.mywebsite.nl challenge did not pass: {u'status': u'invalid', u'validationRecord': [{u'addressesResolved': [u'server.ip.address', u'2a01:7c8:eb:0:149:210:209:163'], u'url': u'http://www.mywebsite.nl/.well-known/acme-challenge/S5FAhPVNa173VKf_7ZfNTnA0ZSjEeF2GVwBfyEACSog', u'hostname': u'www.mywebsite.nl', u'addressesTried': [], u'addressUsed': u'2a01:7c8:eb:0:149:210:209:163', u'port': u'80'}], u'keyAuthorization': u'S5FAhPVNa173VKf_7ZfNTnA0ZSjEeF2GVwBfyEACSog.ZvFiOy8_ZHj-j7QzmnPC6pTVimWXjsZyO3xpi9M9DnM', u'uri': u'https://acme-v01.api.letsencrypt.org/acme/challenge/', u'token': u'S5FAhPVNa173VKf_7ZfNTnA0ZSjEeF2GVwBfyEACSog', u'error': {u'status': 403, u'type': u'urn:acme:error:unauthorized', u'detail': u'Invalid response from http://www.mywebsite.nl/.well-known/acme-challenge/S5FAhPVNa173VKf_7ZfNTnA0ZSjEeF2GVwBfyEACSog: \"<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\\n<html><head>\\n<title>404 Not Found</title>\\n</head><body>\\n<h1>Not Found</h1>\\n<p\"'}

When I check the certificate on https://www.digicert.com/help/, I see that the domain name is not that of the domain which it should be. The domain name in the certificate is now that of another website on the same server. The website becomes unavailable with a ‘Your connection is not private’ browser error.

The worrying thing is that whenever I set ssl enabled to false (in wordpress_sites.yml) on another website and reprovision, then set ssl to true again and reprovision the same problem occurs there as well, where the provision fails at the task and now on this website the domain name in the cert points to yet another site on the same server. With again a browser error.

When I ssh into the server as admin and try to check the SSL folders: /var/lib/letsencrypt/csrs and /etc/nginx/ssl/letsencrypt (as I’ve read in this post), I get a permission denied.

I could really use some advice, as two websites are now inaccessible.

Thank you.

I’m sure this is frustrating. That error seems pretty unusual. I’m not certain the cause, but here are the ideas that come to mind.

404. The end of the output you shared shows a 404 Not Found. I wonder if that means the challenge location is not factoring into the nginx config. Could you try running the wordpress-setup role only (which could update your nginx conf with the challenge location info), then try running the letsencrypt role?

# first get nginx conf all sorted out (add acme challenge location block)
ansible-playbook server.yml -e env=production --tags wordpress

# now try the LE validation again
ansible-playbook server.yml -e env=production --tags letsencrypt

DNS. It looks like the error is occurring accessing the www version of the site. Make sure you have your DNS set up to accommodate www, but I’m guessing you do, because u'addressesResolved has a value. Also, the url appears to be processed by the server, but just returning a 404.

IPV6. That u'addressesResolved has an ipv6 value, so I wonder if ipv6 enabled on the server could have anything to do with it. However, I don’t remember hearing of this being an issue, nor have I tried LE on a DO droplet with ipv6 enabled.

ping.txt Trellis normally tries to reach http://www.website.nl/.well-known/acme-challenge/ping.txt (created by Trellis for testing). Can you access that url in your browser?

LE role. Are you intentionally skipping any letsencrypt role tasks or have you customized them in any way?

Trellis version. The LE role has evolved over time and it would help to know which iteration you have. Probably the best indicator you could share is what is the most recent entry in the project’s CHANGELOG.md?

sudo for permissions. If you use admin instead of root to access /var/lib/letsencrypt, you’ll need to invoke sudo using the admin_user password.

Multiple domains on single cert. A given certificate will have all domains listed in site_hosts for a given site, but I don’t think it should list any of the site_hosts from a different site.

3 Likes

Thanks so far!

404. I have run both commands, but it still errors during the --tags letsencrypt provision, at task Generate the certificates. Now for the super weird part, while the task still errors, both sites work again. And the digicert SSL checker also validates for both. While this is a big relieve, I still can’t provision. Seeing that one of the certs expires in 4 days, I hope we can resolve the underlying issue.

DNS. Yes the DNS has a www record, nothing changed there.

IPV6. IPV6 is not and hasn’t been enabled on my DO droplet.

ping.txt Yeah, it seems like I can access that url, it shows a blank page with no source code.

LE role. Nope, I haven’t changed or customized anything.

Trellis version. Sorry, should have provided that info in my first post. I’m on Trellis Head at #797

sudo for permissions. I can check the folders now. I did invoke sudo before, but I guess I probably typo’d the password (twice) in the heat of the moment earlier.

Multiple domains on single cert. I don’t know how or why that’s happening. The digicert SSL checker on both sites didn’t validate because both certs had (different) mismatching domain names.

The problems started with the failing provision of site 1 at the generating a certificate task. I disabled SSL on site 1, provisioned (no errors), then reenabled SSL and provisioned (with error). At that point I believe the certificate domain mismatch in site 1 occurred pointing to site 2 on the server. And the consequently browser error.

Then I tried to disable SSL on this site 2 (which was still working fine at the time) to see if that would help anything. Provision with SSL on both site 1 and site 2 disabled (no errors). Then I reenabled SSL on site 2, while site 1 was still disabled (if I remember correctly), it provisioned without errors. But that is when site 2 also got the domain mismatch pointing to yet another site (site 3) on the same server. And ofcourse the browser error.

With both sites inaccessible, by that time I thought it best to reach out for support :slight_smile:

1 Like

Oh, another thing I come to think of now, which might be of importance. When site 1 failed with the browser error, amongst trying other things in my slight panic mode, I checked pagespeed to see if/how google displayed site 1. And the thumbnail displayed site 2. So perhaps your first instinct is right, in that it could be an NGINX problem somewhere.

sudo nginx -t checks out succesfull though.

acme file. Could you make sure the server has this file:
/etc/nginx/acme-challenge-location.conf

acme include directive. Then ensure /etc/nginx/sites-available/website.nl.conf has three instances of
include acme-challenge-location.conf;
Here’s an abbreviated version of where these includes should appear
(there is vertical scroll on code block below):

server {
  <snip>
  ssl_certificate         /etc/nginx/ssl/letsencrypt/website.nl-53859fd-bundled.cert;
  ssl_certificate_key     /etc/nginx/ssl/letsencrypt/website.nl.key;

  include acme-challenge-location.conf;

  include includes.d/website.nl/*.conf;

  # Prevent PHP scripts from being executed inside the uploads folder.
    <snip>
  }
}

# Redirect to https
server {
  listen 80;
  <snip>

  include acme-challenge-location.conf;

  location / {
  <snip>
}

# Redirect some domains
server {
  listen 443 ssl http2;
  <snip>
  ssl_certificate         /etc/nginx/ssl/letsencrypt/website.nl-53859fd-bundled.cert;
  ssl_certificate_key     /etc/nginx/ssl/letsencrypt/website.nl.key;

  include acme-challenge-location.conf;

  location / {
  <snip>
}

Reload Nginx. There’s a chance that all your Nginx confs were correct but with the false starts, an Nginx reload was never triggered (a tiny bit related to issues in roots/trellis#783). That’s great you checked sudo nginx -t :+1: and let’s be sure you’ve done a sudo service nginx reload which could/should only help and not hurt.

I suspect that you were able to load www.website.nl/.well-known/acme-challenge/ping.txt only after it redirected to https and that the http version of anything at www.website.nl/.well-known/acme-challenge was actually giving the 404. This 404 could be due to Nginx not having been reloaded, despite the confs being correct.

Registrar’s domain parking page. Another possibility is that for a little while I noticed that your website.nl loaded your registrar’s domain parking page (at an IPv6 for your registrar; not the IPv4 in your DNS). I think that several hours ago I even got that parking page for the www.website.nl domain. I began wondering if your registrar – and its Name Servers in effect for your domains – could be exhibiting some strange behavior. However, this has all stopped now for me: website.nl forwards to www.website.nl and ultimately redirects to https :+1:. Oh wait, scratch that. In Chrome incognito after clearing hsts (chrome://net-internals/#hsts) I’m back to seeing the registrar domain parking page for both www.website.nl and website.nl

I find that rather surprising and I’d recommend moving your Name Servers to Digital Ocean.

Edit. I suspect the DNS entries visible to you with your registrar’s Name Servers only show an A record pointing your IPv4 and that your DO server is only configured to serve IPv4. However, it seems your Name Servers are nonetheless serving an IPv6 and it corresponds to your registrar, not your DO server.

The output of the original post shows that indeed 'addressesResolved' included your IPv4 and some other IPv6 (registrar). Then, the failed LE validation must have occurred due to 'addressUsed': u'2a01:7c8:eb:0:149:210:209:163' – that is your registrar’s IPv6, not your web server’s IPv4.

You could remove any AAAA (IPv6) records you find in your DNS. If there aren’t any, it seems puzzling (euphemism) that your registrar’s NS are returning an AAAA you haven’t set:

# (real domain replaced)
$ nslookup -query=AAAA www.website.nl                                                                                                                                                                                    ...
www.website.nl	canonical name = website.nl.
website.nl	has AAAA address 2a01:7c8:eb::149:210:209:163
3 Likes

Awesome, provisioning is working!

I’ve checked the DNS and indeed there was an AAAA record. The domain registrar account is that of my client, however I can’t imagine them creating the record (have yet to ask them). So it probably has been there since before I took over hosting of their website, about 4 months ago. Then again, if the record was there all along, it’s strange to me that this issue never happened before. Since I’ve done numerous succefull provisions since then. The result in pointing to different sites on the server is also unexpected.

By the way, you’re correct about the ping.txt, in that it redirected to https and failed on http.

Thanks again!

1 Like