Provisioning Error + Nginx Config Test Failed

Fernando_Garcia · September 3, 2018, 10:39am

Hey Guys,

So out of the blue while trying to re-provision my staging server I received an error. Now my site is un -responsive. I’ve been looking through all the issues and I just can’t seem to track how to fix this. After running the provision I get the following:

TASK [nginx : Enable Nginx to start on boot] ***********************************
task path: /Users/digital/git/micegroups.com/trellis/roles/nginx/tasks/main.yml:47
Running service
Using module file /usr/local/lib/python2.7/site-packages/ansible/modules/system/service.py
<206.189.210.91> ESTABLISH SSH CONNECTION FOR USER: admin
<206.189.210.91> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=admin -o ConnectTimeout=10 -o ControlPath=/Users/digital/.ansible/cp/e6be2c1615 206.189.210.91 '/bin/sh -c '"'"'sudo -H -S  -p "[sudo via ansible, key=lxpxgqxtrtodqxghzwmieepkclvfhwat] password: " -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-lxpxgqxtrtodqxghzwmieepkclvfhwat; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded
<206.189.210.91> (1, '\n{"msg": "Job for nginx.service failed because the control process exited with error code. See \\"systemctl status nginx.service\\" and \\"journalctl -xe\\" for details.\\n", "failed": true, "invocation": {"module_args": {"name": "nginx", "pattern": null, "enabled": true, "state": "started", "sleep": null, "arguments": "", "runlevel": "default"}}}\n', 'OpenSSH_7.4p1, LibreSSL 2.5.0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 70592\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 1\r\n')
System info:
  Ansible 2.5.3; Darwin
  Trellis version (per changelog): "Bump Ansible `version_tested_max` to 2.5.3"
---------------------------------------------------
Job for nginx.service failed because the control process exited with error
code. See "systemctl status nginx.service" and "journalctl -xe" for details.

fatal: [206.189.210.91]: FAILED! => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "arguments": "", 
            "enabled": true, 
            "name": "nginx", 
            "pattern": null, 
            "runlevel": "default", 
            "sleep": null, 
            "state": "started"
        }
    }
}
  to retry, use: --limit @/Users/digital/git/micegroups.com/trellis/server.retry

PLAY RECAP *********************************************************************

In my wordpress_sites for staging I have the following:

wordpress_sites:
  micegroups.com:
    site_hosts:
      - canonical: staging.micegroups.com
    admin_user: micegroups
    admin_email: webdev@micegroups.com
    initial_permalink_structure: /%category%/%postname%/
    local_path: ../site # path targeting local Bedrock site directory (relative to Ansible root)
    repo: git@github.com:fernandoagarcia/micegroups.git # replace with your Git repo URL
    repo_subtree_path: site # relative path to your Bedrock/WP directory in your repo
    branch: master
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: letsencrypt
    cache:
      enabled: false

The problem just started out of nowhere. Now I can see that nginx is looking in the wrong directory for the access logs but can’t see why that just happen without making any changes. Any help would be appreciated.

Running
ansible 2.5.3
MacOS Sierra 10.12.6

ben · September 3, 2018, 3:57pm

Why not just destroy the server and start over?

Not sure if this matters:

initial_permalink_structure: /%category%/%postname%/

But you probably want to use quotes here:

initial_permalink_structure: "/%category%/%postname%/"

MWDelaney · September 3, 2018, 5:31pm

A quick note about destroying and rebuilding your production environment: make sure you have a backup of your database and your uploads directory. Our existing Trellis project documentation has details on exporting these.

Fernando_Garcia · September 3, 2018, 8:26pm

I will try that. Just concerned about production websites. If it happens there it wont be so easy.

Fernando_Garcia · September 5, 2018, 7:13am

I wasn’t able to find the source of the issue. But for anyone needing to destroy and re-provision a new server I found taking a snapshot of the newly created server before you provision will allow you to start fresh without changing IP addresses and having to wait for propagation.

henscu · February 11, 2019, 5:27pm

@Fernando_Garcia, I’m getting the same error, and I’m considering destroying my current server and building a new one as you did. There seems to be no alternative.

Did you ever figure out what caused the error?

mcheck · March 20, 2019, 7:08pm

@Fernando_Garcia @henscu, I had a similar issue on my local dev copy in vagrant which was fixed by removing all of nginx after finding this article. Of course, I rec taking a snapshot and backup like you did, just in case.

I removed nginx from the server completely. ssh into server:

$ sudo apt-get remote nginx nginx-common # removes all but configs
$ sudo apt-get purge nginx nginx-common # removes everything
$ sudo apt-get autoupdate # removes anything else related

Reprovision server (should completely reinstall nginx)

ansible-playbook server.yml -e env=<environment>

YMMV. Hope it helps.