Failing acme challenge on both new and existing site

Hi,

I have a server that has been hosting a single site for a while with no issues. I’m trying to add a second domain/website to this server but when I run the letsencrypt tag the acme challenge fails and based on the error it is failing for both the existing site and the new site (though the existing site continues to work).

I have verified that the DNS settings are correct and when I cURL the acme challenge domain I get a valid response. I’ve tried a bunch of different things:

Removed the new site and reprovisioned (same error on the existing site that has a working cert)
Disabling ssl on the new site (can’t disable on the existing one)
Removing the acme challenge verification step from the playbook (hoping it was a local issue and LE would still verify)

I’m out of ideas at this point… any help would be appreciated. My wordpress_sites.yml is below. The first site is the old, working one. The second is the new one. They should both point to the same IP (the load balancer). I removed the .com from the yml because Discourse only lets me put four links in a post You can also view the config here: https://pastebin.com/VjDw047z

Thanks in advance!

wordpress_sites:
realisadiamond:
site_hosts:
- canonical: realisadiamond
redirects:
- www.realisadiamond
local_path: …/site # path targeting local Bedrock site directory (relative to Ansible root)
repo: git@bitbucket.org:jontas/riad.git # replace with your Git repo URL
repo_subtree_path: site # relative path to your Bedrock/WP directory in your repo
branch: master
multisite:
enabled: false
ssl:
enabled: true
provider: letsencrypt
cache:
enabled: true

realisrareindia:
site_hosts:
- canonical: realisrareindia
redirects:
- www.realisrareindia
local_path: …/india # path targeting local Bedrock site directory (relative to Ansible root)
repo: git@bitbucket.org:jontas/riad.git # replace with your Git repo URL
repo_subtree_path: india # relative path to your Bedrock/WP directory in your repo
branch: master
multisite:
enabled: false
ssl:
enabled: true
provider: letsencrypt
cache:
enabled: true

What happened when you did this? Did you get a new cert for the 2nd site?

I assume you added back the verification at some point and did it still fail?

When I remove the verification step I get this error: https://pastebin.com/BGCwZYY8

So the 2nd site is using https and has a certificate, but the cert only has the two hostnames from the first site which obviously leads to errors.

ssl.CertificateError: hostname ‘realisrareindia.in’ doesn’t match either of ‘realisadiamond.com’, ‘www.realisadiamond.com

I’m not sure exactly what led to this situation, but you might have to do some manual work to get out of it.

Removed the new site and reprovisioned (same error on the existing site that has a working cert)

Unfortunately when you remove a site it doesn’t delete existing files. I’d probably try and manually delete a few things to be sure:

  • /srv/ww/realisrareindia
  • /etc/nginx/ssl/letsencrypt/realisrareindia*

I’d start with those and then try and re-provision again.

I actually just had to do the same on a server I am setting up. I had created staging.example.com and had to delete those entries manually on the server with

sudo rm -rf /srv/www/staging.example.com

and

sudo rm /etc/nginx/ssl/letsencrypt/staging.example* (or something like that.)

Then I was able to successfully provision again with Trellis.

I tried deleting those files/directories and now get this error when provisioning:

https://pastebin.com/KfdAtECJ

It looks like one error is from trying to regenerate a cert for the old site that already exists, but I’m not sure what is causing the error regenerating the cert for the new site. Is it save to delete all the certs (including for the old site–backed up of course) and then regenerate them all?

One more update:

If I disable SSL for the new site I can do a provision without errors. However, the new site’s domain does a redirect to the old site, not sure why. Additionally, my load balancer only accepts SSL connections anyway so I need to have SSL working for the new site. Very stuck here, not sure how to continue. I guess I could try provisioning a brand new box and then transferring assets.

I would suggest that unfortunately. I’m not sure if there’s an actual Trellis bug here or just some series of events that led to get into a bad state, but starting from scratch is the best way to ensure things work.

If I don’t want to reprovision a whole new server (since that would result in downtime on the live site so DNS can propagate to LE) is there must risk in blowing away all the nginx and LE configs and then starting those tags over?

I took the live prod server out of the load balancer and spun up a brand new server and provisioned it. I still get this error: https://pastebin.com/uRR6kaLQ

I’m out of ideas–any thoughts?

Thanks again for all the help.

Isn’t this a problem? How can Let’s Encrypt verify the challenge file if it can’t connect over HTTP?

You can see in the error:

Wrote file to /srv/www/letsencrypt/FOBZ5V3tlNLPf4Ihqx7dm9SDiA7GWigkTig2_NhA9wg, but couldn’t download http://realisrareindia.in/.well-known/acme-challenge/FOBZ5V3tlNLPf4Ihqx7dm9SDiA7GWigkTig2_NhA9wg"

curl -I http://realisrareindia.in/.well-known/acme-challenge/FOBZ5V3tlNLPf4Ihqx7dm9SDiA7GWigkTig2_NhA9wg                                                                                                 
curl: (7) Failed to connect to realisrareindia.in port 80: Connection refused
1 Like

You’re right, and I realized that last night and enabled http connections on the LB–that was before my last post, though, when a brand new instance failed to provision. I think I’m going to try them on separate servers next.