Trying to domain map a subsite of my trellis/bedrock-powered subdomain multisite install. Latest trellis/bedrock. Same multisite install I have posted about on here many times before. I have successfully done this process many times:
Change relevant part of group_vars/production/wordpress_sites from - canonical: subdomain.domain.tld to
change DNS so newdomain.tld & www.newdomain.tld point to my production server IP and ping from server to confirm it has propagated
run trellis provision --tags letsencrypt production to update the certificate and nginx config
edit site name and url in /network/site-settings
but I cannot get a successful provision to get the cert! it always fails with one of the following error messages:
Remote end closed connection without response
Non-Zero Retun Code
[Errno 104] Connection reset by peer
This multisite has about 35 subsites. I have confirmed every domain and redirect url in wordpress_sites are pinging the correct IP.
I should mention, I had an issue at the beginning of the month where my cert failed to auto update. It turned out to be that every subdomain needed an A record in DNS, strange only because I had previously used a wildcard subdomain A record and had success with letsencrypt. But once each subdomain had specific A records, I was able to reprovision and generate a new cert. That was on 9/7/23. Also successfully reprovisioned and redeployed on 9/15/23 without any cert changes.
I should also add that I searched /var/log/nginx
and I see r3.o.lencr.org could not be resolved (110: Operation timed out) while requesting certificate status, responder: r3.o.lencr.org, certificate: "/etc/nginx/ssl/letsencrypt/domain.tld-bundled.cert I tried lowering the MTU per something I read in letsencrypt forums …to no avail.
I have closely studied the doc, searched and searched through many of the posts on here, and all I can come up with is letsencrypt api is blocking my IP or theres some rate limit I’ve hit. All I can think to do at this point is wait and try again.
Tried lots of things with no success, convinced it had something to to with ipv6, - my IP is ipv4 and I am not using AAAA records for any urls.
I finally decided to spin up & move to a new server - copy db and uploads over, spoof DNS & first get it set up for SSL with a manual cert and finally try letsencrypt tasks after reassociating my IP to the new server. I was deflated when I saw the same error on the generate certificate task. Specifically I noticed the error was always "Wrote file to http://oneofmyurls.tld/.well-known/acme-challenge/etc, but couldn't download http://oneofmyurls.tld/.well-known/acme-challenge/etc
Hmm. I ran the provision again a few times and noticed the corresponding urls for each failure seemed to be advancing down the alphabet later and later in my LONG list of alphabetized site_hosts domains in wordpress_sites.yml.
I ran a ping on all the site urls and a wget -S -O - http://url-to-/.well-known/path for whichever had just failed, and the pings & wgets worked every time so I just kept running it again.
I’m happy to say that after several failures, and watching the error url get further and further down my list of sites, it finally succeeded!
I am chalking this up to Multisite Madness, and hoping for the best in the future!
One thing to note is that when there is an IPv6 (AAAA) record for a domain, the Let’s Encrypt http-01 challenge will prefer the IPv6 address over the IPv4 (A) of the hostname.
So either have a valid (pointing to a HTTP server that also listens on its IPv6 address) IPv6 (AAAA) address or no IPv6 address at all.
I am again trying to re-provision my multisite production site to add a domain to the SSL certificate.
running trellis provision --tags letsencrypt production
Every time a different error message, but never gets past the “Generate the certificate” task
When the error is: Could not access the challenge file for the hosts/domain...
I test the urls with pings and wgets - the IPs are correct and they work. so I rerun the command.
sometimes the error is earlier in the script: gnutls_handshake() failed: Error in the pull function
but it’s usually Error submitting challenges
or wrote file to... but couldn't download....
with Responses like Response: Remote end closed connection without response
or [Errno 104] Connection reset by peer"]
I suspect it could be a problem with my provider or the networking? - this is a private networked instance with an associated public IP - I have had issues related to my provider in the past, but pinging the url seems fine and I am able to ping out from the instance with no packet loss.
I followed the same course as above, made a new server with a manual cert;
got it working, then updated wordpress_sites.yml to use letsencrypt. Cannot get it to succeed this time even without adding a new domain.
having same issue but this time there is no real order to the errors.
I am now just running it over and over hoping it will work once, but I feel like I need to understand the underlying issue and it should not take several days to add a new domain to the cert.
Hmm ok, that makes sense I think, thanks @strarsis.
My local mac connection was showing as ipv6 on whatsmyip.com, though I do get an ipv4 at api.ipify.org. Is this the problem? I adjusted the network settings on my local machine to use ipv6 for link-local only and now get an ipv4 at whatsmyip.com.
Still having same issue. I will keep trying.
As I think back, pretty sure I was on a different connection when it worked.
Can you make a successful GET request to the affected WordPress site domain from within the terminal you run the ansible playbook in (wget or curl)? When this does not work, it is unlikely that Ansible can perform the test, and you can diagnose what issue occurs during a fetch from within the terminal.
Yes - I have always had successful wgets and curls to all of the domains my multisite is is using. I am able to provision everything else just fine with a manual cert, it is just this letsencrypt task.