Trellis provisioning with letsencrypt fails at nginx reload step

This same trellis server worked on production with letsencrypt, but now trying to do a staging server.

The staging subdomain IP has propagated now for several days and passes as green on for all but china and some of the more difficult countries.

There are no cert challenge failures.

The error in ansible is:

RUNNING HANDLER [common : reload nginx] ****************************************
fatal: [staging_host]: FAILED! => {"changed": true, "cmd": ["nginx", "-t"], "delta": "0:00:00.009640", "end": "2023-01-24 08:49:48.971955", "msg": "non-zero return code", "rc": 1, "start": "2023-01-24 08:49:48.962315", "stderr": "nginx: [emerg] open() \"/etc/nginx/fastcgi_params\" failed (2: No such file or directory) in /etc/nginx/sites-enabled/\nnginx: configuration file /etc/nginx/nginx.conf test failed", "stderr_lines": ["nginx: [emerg] open() \"/etc/nginx/fastcgi_params\" failed (2: No such file or directory) in /etc/nginx/sites-enabled/", "nginx: configuration file /etc/nginx/nginx.conf test failed"], "stdout": "", "stdout_lines": []}

So it is a configuration file test failure.
Line 122 of /etc/nginx/sites-enabled/ is include fastcgi_params;

The keys exist inside of /etc/nginx/ssl/letsencrypt

The nginx.service is active/running.

With https, the browser says “This site can’t be reached”.
With http it gives 404 nginx error.

I have tried:

  • deleting /etc/nginx on the server and provisioning again
  • provisioning without SSL and back again to SSL
  • setting ssh_client_ip_lookup: false in group_vars/all/main.yml
  • setting the full subdomain as the site name, and without
  • changing the letsencrypt email address

This indicates that the Linux distribution you are using on the staging system differs from what is used on production - and also what Trellis expects/supports.
Is this Ubuntu 20.04 LTS on the Staging server?

Yes, Ubuntu 20.04 (LTS) x64 on staging, prod and dev.

Is there actually a file /etc/nginx/fastcgi_params on that staging system?

No, that file is not on staging, but I see it on prod. Quite a few missing on staging.

What do you get when you invoke lsb_release -a on staging?

So the complete Trellis provision process runs, not just specific tags?

Yes the whole provision process ran, except that it produced that nginx reload error.

Figured it out now thanks to you. Luckily I had installed trash-cli in order to trash /etc/nginx so I could restore the missing files that way. I ran provision again, and now everything works. I had attempted to clear that out before to troubleshoot, but it may have been while the IP was not propagated yet and the provision was failing for a different reason. I figured the files would be put back upon reprovision - but that was not the case! Thanks @strarsis