Failure to establish connection when provisioning via ansible-playbook server.yml

There are many possible causes for UNREACHABLE but here’s one that comes to mind:

If you are in fact using an Ubuntu 14.04 server, could you run your command again and share the full debug info?

ansible-playbook server.yml -e env=staging -vvvv

Very strange,
It seems to be working now. Although I had previously deleted my droplet and created a 14.0.4 version (which didn’t work for possibly the same reason as above), this time, there didn’t seem to be an issue connecting!

Thanks again for the suggestion as I think it was likely the solution.

I have been stuck on this vagrant / ansible / ssh issue for days now. I get this message when trying to ping the remote staging server:

ansible lc-dev1.co.uk -m ping -u root

lc-dev1.co.uk | UNREACHABLE! => {
“changed”: false,
“msg”: “Failed to connect to the host via ssh.”,
“unreachable”: true
}

I can ssh into the server without a password, that suggests to me that the ssh keys are setup correctly, but theres a problem with ansible / vagrant

I have uninstalled and reinstalled different versions of vagrant and ansible but no change.

any help would be greatly appreciated, thanks, simon

@jajouka Could you share the entire Ansible verbose debug info (add -vvvv):

ansible-playbook server.yml -e env=staging -vvvv

Could you let us know…

  • which VPS provider you are using (e.g., Digital Ocean, AWS, etc.)
  • what your admin_user name is (should be admin if DO or ubuntu if AWS)
  • what user (name) is making the successful manual ssh connection

If you dig in and find that the issue seems different from the rest of the thread above, go ahead and start a new thread, so that this one stays focused.

sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ansible-playbook server.yml -e env=staging -vvvv [10:40:01]
Using /home/sbeasley/Sites/lc-blogs-trellis/trellis/ansible.cfg as config file
Loaded callback output of type stdout, v2.0

PLAYBOOK: server.yml ***********************************************************
3 plays in server.yml

PLAY [Ensure necessary variables are defined] **********************************

TASK [Ensure environment is defined] *******************************************
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/variable-check.yml:8
skipping: [localhost] => {“changed”: false, “skip_reason”: “Conditional check failed”, “skipped”: true}

PLAY [Determine Remote User] ***************************************************

TASK [remote-user : Determine whether to connect as root or admin_user] ********
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/roles/remote-user/tasks/main.yml:2
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
ESTABLISH LOCAL CONNECTION FOR USER: sbeasley
EXEC /bin/sh -c ‘( umask 77 && mkdir -p “echo $HOME/.ansible/tmp/ansible-tmp-1472035206.49-189187861603852” && echo ansible-tmp-1472035206.49-189187861603852=“echo $HOME/.ansible/tmp/ansible-tmp-1472035206.49-189187861603852” ) && sleep 0’
PUT /tmp/tmpLRyRgT TO /home/sbeasley/.ansible/tmp/ansible-tmp-1472035206.49-189187861603852/command
EXEC /bin/sh -c ‘LANG=en_GB.UTF-8 LC_ALL=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 /usr/bin/python /home/sbeasley/.ansible/tmp/ansible-tmp-1472035206.49-189187861603852/command; rm -rf “/home/sbeasley/.ansible/tmp/ansible-tmp-1472035206.49-189187861603852/” > /dev/null 2>&1 && sleep 0’
ok: [lc-dev1.co.uk → localhost] => {“changed”: false, “cmd”: [“ansible”, “lc-dev1.co.uk”, “-m”, “ping”, “-u”, “root”], “delta”: “0:00:00.634070”, “end”: “2016-08-24 10:40:07.186104”, “failed”: false, “failed_when_result”: false, “invocation”: {“module_args”: {“_raw_params”: “ansible lc-dev1.co.uk -m ping -u root”, “_uses_shell”: false, “chdir”: null, “creates”: null, “executable”: null, “removes”: null, “warn”: true}, “module_name”: “command”}, “rc”: 3, “start”: “2016-08-24 10:40:06.552034”, “stderr”: “”, “stdout”: “lc-dev1.co.uk | UNREACHABLE! => {\n "changed": false, \n "msg": "Failed to connect to the host via ssh.", \n "unreachable": true\n}”, “stdout_lines”: [“lc-dev1.co.uk | UNREACHABLE! => {”, " "changed": false, ", " "msg": "Failed to connect to the host via ssh.", “, " "unreachable": true”, “}”], “warnings”: }

TASK [remote-user : Set remote user for each host] *****************************
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/roles/remote-user/tasks/main.yml:8
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
ok: [lc-dev1.co.uk] => {“ansible_facts”: {“ansible_ssh_user”: “root”}, “changed”: false, “invocation”: {“module_args”: {“ansible_ssh_user”: “root”}, “module_name”: “set_fact”}}

TASK [remote-user : Announce which user was selected] **************************
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/roles/remote-user/tasks/main.yml:12
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
Note: Ansible will attempt connections as user = root
ok: [lc-dev1.co.uk] => {}

PLAY [WordPress Server - Install LEMP Stack with PHP 7.0 and MariaDB MySQL] ****

TASK [setup] *******************************************************************
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
<lc-dev1.co.uk> ESTABLISH SSH CONNECTION FOR USER: root
<lc-dev1.co.uk> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/home/sbeasley/.ansible/cp/ansible-ssh-%h-%p-%r lc-dev1.co.uk ‘/bin/sh -c ‘"’"’( umask 77 && mkdir -p “echo $HOME/.ansible/tmp/ansible-tmp-1472035209.83-154266183055644” && echo ansible-tmp-1472035209.83-154266183055644=“echo $HOME/.ansible/tmp/ansible-tmp-1472035209.83-154266183055644” ) && sleep 0’“'”‘’
System info:
Ansible 2.1.1.0; Linux
Trellis 0.9.7: April 10th, 2016

Failed to connect to the host via ssh.
fatal: [lc-dev1.co.uk]: UNREACHABLE! => {“changed”: false, “unreachable”: true}
[WARNING]: Could not create retry file ‘server.retry’. [Errno 2] No such file or directory: ‘’

PLAY RECAP *********************************************************************
lc-dev1.co.uk : ok=3 changed=0 unreachable=1 failed=0
localhost : ok=0 changed=0 unreachable=0 failed=0

sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)»

thanks for your reply, i am using digital ocean, i am using root as the admin user

here’s my users.yml file

admin_user: root

users:

  • name: “{{ web_user }}”
    groups:
    • “{{ web_group }}”
      keys:
    • “{{ lookup(‘file’, ‘~/.ssh/digital_ocean.pub’) }}”
    • “{{ lookup(‘file’, ‘~/.ssh/id_rsa.pub’) }}”
    • https://github.com/sb-lc.keys
  • name: “{{ admin_user }}”
    groups:
    • sudo
      keys:
    • “{{ lookup(‘file’, ‘~/.ssh/digital_ocean.pub’) }}”
    • “{{ lookup(‘file’, ‘~/.ssh/id_rsa.pub’) }}”
    • https://github.com/sb-lc.keys

web_user: web
web_group: www-data
web_sudoers:

  • “/usr/sbin/service php7.0-fpm *”

i am using the user ‘web’ like default. I can successfully connect via ssh using root. I dont know what the password is for ‘web’ so I haven’t managed to ssh with this deploy user

i have been using this command to test connections:

ansible -m ping -u vagrant staging

lc-dev1.co.uk | UNREACHABLE! => {
“changed”: false,
“msg”: “Failed to connect to the host via ssh.”,
“unreachable”: true
}

if I make the change in change users.yml - admin_user: admin, I still get the same result from this command. I have reloaded vagrant and still no change.

I am getting the same output when attempting to provision the server after changing the user to admin:

ansible-playbook server.yml -e env=staging -vvvv [11:02:54]
Using /home/sbeasley/Sites/lc-blogs-trellis/trellis/ansible.cfg as config file
Loaded callback output of type stdout, v2.0

PLAYBOOK: server.yml ***********************************************************
3 plays in server.yml

PLAY [Ensure necessary variables are defined] **********************************

TASK [Ensure environment is defined] *******************************************
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/variable-check.yml:8
skipping: [localhost] => {“changed”: false, “skip_reason”: “Conditional check failed”, “skipped”: true}

PLAY [Determine Remote User] ***************************************************

TASK [remote-user : Determine whether to connect as root or admin_user] ********
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/roles/remote-user/tasks/main.yml:2
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
ESTABLISH LOCAL CONNECTION FOR USER: sbeasley
EXEC /bin/sh -c ‘( umask 77 && mkdir -p “echo $HOME/.ansible/tmp/ansible-tmp-1472036609.09-46831348447686” && echo ansible-tmp-1472036609.09-46831348447686=“echo $HOME/.ansible/tmp/ansible-tmp-1472036609.09-46831348447686” ) && sleep 0’
PUT /tmp/tmpFWvNYd TO /home/sbeasley/.ansible/tmp/ansible-tmp-1472036609.09-46831348447686/command
EXEC /bin/sh -c ‘LANG=en_GB.UTF-8 LC_ALL=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 /usr/bin/python /home/sbeasley/.ansible/tmp/ansible-tmp-1472036609.09-46831348447686/command; rm -rf “/home/sbeasley/.ansible/tmp/ansible-tmp-1472036609.09-46831348447686/” > /dev/null 2>&1 && sleep 0’
ok: [lc-dev1.co.uk → localhost] => {“changed”: false, “cmd”: [“ansible”, “lc-dev1.co.uk”, “-m”, “ping”, “-u”, “root”], “delta”: “0:00:00.312282”, “end”: “2016-08-24 11:03:29.458907”, “failed”: false, “failed_when_result”: false, “invocation”: {“module_args”: {“_raw_params”: “ansible lc-dev1.co.uk -m ping -u root”, “_uses_shell”: false, “chdir”: null, “creates”: null, “executable”: null, “removes”: null, “warn”: true}, “module_name”: “command”}, “rc”: 3, “start”: “2016-08-24 11:03:29.146625”, “stderr”: “”, “stdout”: “lc-dev1.co.uk | UNREACHABLE! => {\n "changed": false, \n "msg": "Failed to connect to the host via ssh.", \n "unreachable": true\n}”, “stdout_lines”: [“lc-dev1.co.uk | UNREACHABLE! => {”, " "changed": false, ", " "msg": "Failed to connect to the host via ssh.", “, " "unreachable": true”, “}”], “warnings”: }

TASK [remote-user : Set remote user for each host] *****************************
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/roles/remote-user/tasks/main.yml:8
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
ok: [lc-dev1.co.uk] => {“ansible_facts”: {“ansible_ssh_user”: “admin”}, “changed”: false, “invocation”: {“module_args”: {“ansible_ssh_user”: “admin”}, “module_name”: “set_fact”}}

TASK [remote-user : Announce which user was selected] **************************
task path: /home/sbeasley/Sites/lc-blogs-trellis/trellis/roles/remote-user/tasks/main.yml:12
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
Note: Ansible will attempt connections as user = admin
ok: [lc-dev1.co.uk] => {}

PLAY [WordPress Server - Install LEMP Stack with PHP 7.0 and MariaDB MySQL] ****

TASK [setup] *******************************************************************
File lookup using /home/sbeasley/.ssh/digital_ocean.pub as file
File lookup using /home/sbeasley/.ssh/id_rsa.pub as file
<lc-dev1.co.uk> ESTABLISH SSH CONNECTION FOR USER: admin
<lc-dev1.co.uk> SSH: EXEC ssh -C -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=admin -o ConnectTimeout=10 -o ControlPath=/home/sbeasley/.ansible/cp/ansible-ssh-%h-%p-%r lc-dev1.co.uk ‘/bin/sh -c ‘"’"’( umask 77 && mkdir -p “echo $HOME/.ansible/tmp/ansible-tmp-1472036612.09-237376651746114” && echo ansible-tmp-1472036612.09-237376651746114=“echo $HOME/.ansible/tmp/ansible-tmp-1472036612.09-237376651746114” ) && sleep 0’“'”‘’
System info:
Ansible 2.1.1.0; Linux
Trellis 0.9.7: April 10th, 2016

Failed to connect to the host via ssh.
fatal: [lc-dev1.co.uk]: UNREACHABLE! => {“changed”: false, “unreachable”: true}
[WARNING]: Could not create retry file ‘server.retry’. [Errno 2] No such file or directory: ‘’

PLAY RECAP *********************************************************************
lc-dev1.co.uk : ok=3 changed=0 unreachable=1 failed=0
localhost : ok=0 changed=0 unreachable=0 failed=0

sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)»

Do I need to do something else to change the user to admin? I have just changed the users.yml file but maybe there more ssh config I should be doing

I’d recommend updating Trellis to the latest HEAD version because your version 0.9.7…

Once you’ve updated Trellis, I’d recommend…

  • back up any important data from the DO droplet
  • change admin_user: admin (because Trellis will try root by default, only using admin as fallback)
  • rebuild the droplet (a destroy that maintains IP)

Ansible reports that it is running on Linux. If this means you’re using Windows with Ansible running from within a Vagrant VM, the VM will need your private SSH key in order to make connections. For example, copy/paste the relevant private key content into the VM at ~/.ssh/id_rsa or ~/.ssh/digital_ocean (whichever key corresponds to the public key you have loaded on your DO droplet), then set tighter permissions on the file(s): chmod 0400 ~/.ssh/key_name.

Note that if you’re not on Windows, then Vagrant and the vagrant user are typically irrelevant to connections to remote staging/production servers. It is just a connection between your Linux local machine to the remote DO servers. The Vagrant dev VM is not involved.

Could you run these two commands on your Ansible control machine? I’m referring to your regular machine if running Linux, or in Vagrant VM (e.g., after vagrant ssh) if running Windows.

  • ssh-agent bash # start your ssh-agent (in case it isn’t already running)
  • ssh-add ~/.ssh/private_key_name # load DO-related private key into ssh-agent

Finally, now try ansible-playbook server.yml -e env=staging

If the Ansible connection still fails and you’re still able to ssh manually, could you share your exact manual ssh command, then share the entire verbose output of the manual ssh command (add -v), e.g.,
ssh -v root@xxx.xxx.xxx.xxx
I hope that seeing your command and output could offer insight into what is going on with your SSH keys.

thanks for your reply. I am using ubuntu as my local machine. I updated trellis, restored my config in yml files, changed admin user to admin, rebuilt DO server droplet (ubuntu 14.04 64), destroyed vagrant box, created vagrant box, added private key to the ssh agent (I think it was already stored) (see code), ran vagrant up --provision (worked fine), tested ssh works (ssh -v root@xxx.xxx.xxx.xxx) (works fine), ran provision command for staging environment (this failed again with 'unreachable ssh error).

Here is my cmd log:

sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ssh-add -l [12:03:25]
2048 e5:b9:e2:ee:81:9f:49:89:0a:b0:6e:14:07:b1:94:af sbeasley@sbeasley-MS-7788 (RSA)
2048 dd:13:ce:f5:c4:f1:e8:f7:8c:8a:bf:d6:96:f1:86:90 sbeasley@leicestercollege.ac.uk (RSA)
2048 1f:c2:27:45:fa:d9:bb:30:a8:9a:44:69:62:7d:31:a2 lc-prod@78.109.168.63.srvlist.ukfast.net (RSA)
2048 6b:81:7f:b5:92:4d:20:7e:38:c7:ed:00:6a:30:f5:1f sbeasley@sbeasley-MS-7788 (RSA)
2048 4c:99:6a:8f:df:65:f9:24:94:fa:8f:38:0e:72:0c:5a sbeasley@sbeasley-MS-7788 (RSA)
sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ssh-agent bash [12:04:02]
sbeasley@sbeasley-MS-7788:~/Sites/lc-blogs-trellis/trellis$ ssh-add ~/.ssh/digital_ocean
Identity added: /home/sbeasley/.ssh/digital_ocean (/home/sbeasley/.ssh/digital_ocean)
sbeasley@sbeasley-MS-7788:~/Sites/lc-blogs-trellis/trellis$ exit
exit
sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ssh-add -l [12:05:04]
2048 e5:b9:e2:ee:81:9f:49:89:0a:b0:6e:14:07:b1:94:af sbeasley@sbeasley-MS-7788 (RSA)
2048 dd:13:ce:f5:c4:f1:e8:f7:8c:8a:bf:d6:96:f1:86:90 sbeasley@leicestercollege.ac.uk (RSA)
2048 1f:c2:27:45:fa:d9:bb:30:a8:9a:44:69:62:7d:31:a2 lc-prod@78.109.168.63.srvlist.ukfast.net (RSA)
2048 6b:81:7f:b5:92:4d:20:7e:38:c7:ed:00:6a:30:f5:1f sbeasley@sbeasley-MS-7788 (RSA)
2048 4c:99:6a:8f:df:65:f9:24:94:fa:8f:38:0e:72:0c:5a sbeasley@sbeasley-MS-7788 (RSA)
sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ansible-playbook server.yml -e env=staging [12:05:07]

PLAY [Ensure necessary variables are defined] **********************************

TASK [Ensure environment is defined] *******************************************
skipping: [localhost]

PLAY [Determine Remote User] ***************************************************

TASK [remote-user : Require manual definition of remote-user] ******************
skipping: [lc-dev1.co.uk]

TASK [remote-user : Check whether Ansible can connect as root] *****************
ok: [lc-dev1.co.uk → localhost]

TASK [remote-user : Set remote user for each host] *****************************
ok: [lc-dev1.co.uk]

TASK [remote-user : Announce which user was selected] **************************
Note: Ansible will attempt connections as user = admin
ok: [lc-dev1.co.uk]

TASK [remote-user : Load become password] **************************************
ok: [lc-dev1.co.uk]

PLAY [Install prerequisites] ***************************************************

TASK [Install Python 2.x] ******************************************************
System info:
Ansible 2.1.1.0; Linux
Trellis at “Fix #639 - WP 4.6 compatibility: update WP-CLI to 0.24.1”

Failed to connect to the host via ssh.
fatal: [lc-dev1.co.uk]: UNREACHABLE! => {“changed”: false, “unreachable”: true}
[WARNING]: Could not create retry file ‘server.retry’. [Errno 2] No such file or directory: ‘’

PLAY RECAP *********************************************************************
lc-dev1.co.uk : ok=4 changed=0 unreachable=1 failed=0
localhost : ok=0 changed=0 unreachable=0 failed=0

sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ssh -v root@178.62.35.88 [12:05:39]
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /home/sbeasley/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to 178.62.35.88 [178.62.35.88] port 22.
debug1: Connection established.
debug1: identity file /home/sbeasley/.ssh/id_rsa type 1
debug1: identity file /home/sbeasley/.ssh/id_rsa-cert type -1
debug1: identity file /home/sbeasley/.ssh/id_dsa type -1
debug1: identity file /home/sbeasley/.ssh/id_dsa-cert type -1
debug1: identity file /home/sbeasley/.ssh/id_ecdsa type -1
debug1: identity file /home/sbeasley/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/sbeasley/.ssh/id_ed25519 type -1
debug1: identity file /home/sbeasley/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8 pat OpenSSH_6.6.1* compat 0x04000000
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5-etm@openssh.com none
debug1: kex: client->server aes128-ctr hmac-md5-etm@openssh.com none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ECDSA 53:fe:f1:91:03:51:9a:7d:a3:1e:64:b4:e7:3c:4d:3e
debug1: Host ‘178.62.35.88’ is known and matches the ECDSA host key.
debug1: Found key in /home/sbeasley/.ssh/known_hosts:19
debug1: ssh_ecdsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/sbeasley/.ssh/id_rsa
debug1: Authentications that can continue: publickey,password
debug1: Offering RSA public key: sbeasley@sbeasley-MS-7788
debug1: Authentications that can continue: publickey,password
debug1: Offering RSA public key: sbeasley@leicestercollege.ac.uk
debug1: Authentications that can continue: publickey,password
debug1: Offering RSA public key: lc-prod@78.109.168.63.srvlist.ukfast.net
debug1: Authentications that can continue: publickey,password
debug1: Offering RSA public key: sbeasley@sbeasley-MS-7788
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug1: Authentication succeeded (publickey).
Authenticated to 178.62.35.88 ([178.62.35.88]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Sending environment.
debug1: Sending env LANG = en_GB.UTF-8
debug1: Sending env LC_CTYPE = en_GB.UTF-8
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-34-generic x86_64)

System information as of Thu Aug 25 10:42:12 UTC 2016

System load: 0.0 Processes: 104
Usage of /: 6.7% of 19.56GB Users logged in: 0
Memory usage: 11% IP address for eth0: 178.62.35.88
Swap usage: 0%

Graph this data and manage this system at:
https://landscape.canonical.com/

0 packages can be updated.
0 updates are security updates.

New release ‘16.04.1 LTS’ available.
Run ‘do-release-upgrade’ to upgrade to it.

Your Hardware Enablement Stack (HWE) is supported until April 2019.

Last login: Thu Aug 25 10:42:13 2016 from 212.219.188.10
root@lc-blogs-stage:~#

I also tried pinging again:

sbeasley➜Sites/lc-blogs-trellis/trellis(master✗)» ansible -m ping -u vagrant all [12:12:13]
192.168.50.5 | SUCCESS => {
“changed”: false,
“ping”: “pong”
}
lc-dev1.co.uk | UNREACHABLE! => {
“changed”: false,
“msg”: “Failed to connect to the host via ssh.”,
“unreachable”: true
}
leicestercollegeblog.co.uk | UNREACHABLE! => {
“changed”: false,
“msg”: “Failed to connect to the host via ssh.”,
“unreachable”: true
}

I tried changing the admin user back to root just in case the ssh key was owned by root and not admin, still get the same errors

also I can’t see how to obtain the password for the admin user, i can see the hashed version in users.yml but is there someway I can get it so I can ssh-copy-id -i ~/.ssh/digital_ocean admin@***** ?

The -u vagrant will probably always fail. A default DO Ubuntu droplet will only have the user root, not vagrant, so even if you have the correct ssh key, an attempt to connect as vagrant user will fail. I think the command below is better for testing:

ansible staging -m raw -a whoami -u root

You’ll notice that this is the command Trellis uses to test whether it can connect as root or whether it must fall back to the admin_user. If the connection as root succeeds, Trellis will use root. That’s why I don’t see any reason to change the admin_user to root. Trellis won’t even try the admin_user unless root has already failed. In addition, the purpose of the admin_user is to have a non-root user who can connect in case you’ve heightened security by disabling root login (see security docs).


I’m not perfectly familiar with all the ssh possibilities, so some of this may be unnecessary, but…
I’m guessing you’re using zsh instead of bash, so, sorry to make you repeat, could you try this:

ssh-agent zsh
ssh-add /home/sbeasley/.ssh/digital_ocean

# Connection Test 1: basic connection
ansible staging -m raw -a whoami -u root

# Connection Test 2: force choice of private ssh key
ansible staging -m raw -a whoami -u root --private-key=/home/sbeasley/.ssh/digital_ocean

If Connection Test 1 succeeds, then I guess that finally adds your key to the ssh-agent and I bet the ansible-playbook command will succeed. If it fails, but Test 2 succeeds, then apparently Ansible is having trouble finding the right ssh key on its own. You could try to figure out why, or just add the --private-key=/home/sbeasley/.ssh/digital_ocean to the end of your ansible-playbook commands. Or, set up your Trellis hosts/staging like this:

# hosts/staging
lc-dev1.co.uk ansible_host=178.62.35.88 ansible_ssh_private_key_file='/home/sbeasley/.ssh/digital_ocean'

[staging]
lc-dev1.co.uk

[web]
lc-dev1.co.uk

(ref for ansible_ssh_private_key_file)

If Connection Tests 1 and 2 both fail, then I’m not sure what to explore next.

  • Have you had a successful Trellis project before or is this project the first attempt? (helps isolate problem to your dev environment vs. your current project configuration)
  • Are you making any modifications to the default bare Ubuntu box from DO before running Trellis commands?
  • Any relevant configs in /home/sbeasley/.ssh/config or /etc/ssh/ssh_config (e.g., on line 19 for Host *)?
  • In case this is a duplicate of an obscure problem, you could try adding control_path = %(directory)s/%%h-%%r to your ansible.cfg under [ssh_connection] (details)

Given that you did all the work to update Trellis, I’d suggest rebuilding your droplet with ubuntu 16.04.


Again, I don’t see this as being necessary, because I don’t see that Vagrant has anything to do with your connection to a DO staging server. But I want to be sure I’m not missing something that could be the key to resolving the connection issue. Do you have Vagrant involved in some way? What is your understanding of how Vagrant is related to your Ubuntu machine’s connection to your DO staging server?


In the latest version of Trellis, you simply define the admin_user’s raw password in group_vars/<environment>/vault.yml. The admin_user does not exist on the DO bare Ubuntu box. Trellis creates the admin user (and any other users in group_vars/all/users.yml as part of the server.yml playbook in the users role. The SSH-keys docs describe these users and their purposes.

Current Trellis does not have a hashed version of passwords. Any chance you’re still seeing the old version of vault_sudoer_passwords removed in roots/trellis#614?

Trellis will assign the admin_user the password when it creates the admin_user, so you will not need to ssh-copy-id -i ~/.ssh/digital_ocean admin@*****

1 Like

thanks for your help, I followed the instructions carefully, I switched to ubuntu 16.04 on DO, added the ssh agent to zsh, solution 1 and 2 didn’t work until I did this in my hosts/staging file:

then this command succeeded:

I didn’t have any ssh config setup and it seems there was no need for any changes in the ansible.cfg file like you suggested:

it seems like I’ve run into more problems though. when attempting to provision the staging server i get this:

TASK [php : Start php7.0-fpm service] ******************************************
System info:
Ansible 2.1.1.0; Linux
Trellis at “Fix #639 - WP 4.6 compatibility: update WP-CLI to 0.24.1”

Job for php7.0-fpm.service failed because the control process exited with
error code. See “systemctl status php7.0-fpm.service” and “journalctl -xe”
for details.

fatal: [lc-dev1.co.uk]: FAILED! => {“changed”: false, “failed”: true}

NO MORE HOSTS LEFT *************************************************************
[WARNING]: Could not create retry file ‘server.retry’. [Errno 2] No such file or directory: ‘’

PLAY RECAP *********************************************************************
lc-dev1.co.uk : ok=46 changed=2 unreachable=0 failed=1
localhost : ok=0 changed=0 unreachable=0 failed=0

after a bit of research i found blob . however my version of trellis already had this set.

when I do [quote=“julianfox, post:9, topic:7418”]
sudo systemctl status php7.0-fpm.service
[/quote] after vagrant ssh as suggested here i get this -

vagrant@lc-blogs-trellis:~$ sudo systemctl status php7.0-fpm.service
● php7.0-fpm.service - The PHP 7.0 FastCGI Process Manager
Loaded: loaded (/lib/systemd/system/php7.0-fpm.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2016-08-27 01:33:49 UTC; 9h ago
Docs: man:php-fpm7.0(8)
Process: 1774 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 31024 (php-fpm7.0)
Status: “Processes active: 0, idle: 2, Requests: 19, slow: 0, Traffic: 0req/sec”
CGroup: /system.slice/php7.0-fpm.service
├─ 1779 php-fpm: pool wordpress
├─ 1969 php-fpm: pool wordpress
└─31024 php-fpm: master process (/etc/php/7.0/fpm/php-fpm.conf)

Aug 27 01:33:49 lc-blogs-trellis systemd[1]: Starting The PHP 7.0 FastCGI Process Manager…
Aug 27 01:33:49 lc-blogs-trellis systemd[1]: Started The PHP 7.0 FastCGI Process Manager.
Aug 27 01:36:16 lc-blogs-trellis systemd[1]: Reloading The PHP 7.0 FastCGI Process Manager.
Aug 27 01:36:16 lc-blogs-trellis systemd[1]: Reloaded The PHP 7.0 FastCGI Process Manager.

i looked at this forum post

i tried this -

➜ trellis git:(master) ✗ ansible “web:&staging” -m service -a “name=php7.0-fpm state=reloaded” -u web
lc-dev1.co.uk | FAILED! => {
“changed”: false,
“failed”: true,
“msg”: “Failed to start php7.0-fpm.service: Interactive authentication required.\nSee system logs and ‘systemctl status php7.0-fpm.service’ for details.\n”
}
➜ trellis git:(master) ✗ ansible “root:&staging” -m service -a “name=php7.0-fpm state=reloaded” -u web
ERROR! Specified hosts and/or --limit does not match any hosts
➜ trellis git:(master) ✗ ansible “admin:&staging” -m service -a “name=php7.0-fpm state=reloaded” -u web
ERROR! Specified hosts and/or --limit does not match any hosts
➜ trellis git:(master) ✗

when i looked in /etc/sudoers.d/web-services on the staging server as referenced here.

I get this -

root@lc-blogs-stage:~# cat /etc/sudoers.d/web-services

ansible managed: /home/sie/Sites/lc-blogs-trellis/trellis/roles/users/templates/sudoers.d.j2 modified on 2016-08-26 23:31:34 by sie on sie-Lenovo-G510

web ALL=(root) NOPASSWD: /usr/sbin/service php7.0-fpm *
root@lc-blogs-stage:~#

which seems correct.

so this is as far as I have got, i think the ubuntu update to 16.04 may be causing the error

as suggested here, I checked my /etc/sudoers on the staging server looks like this:

root@lc-blogs-stage:~# cat /etc/sudoers
#
# This file MUST be edited with the 'visudo' command as root.
#
# Please consider adding local content in /etc/sudoers.d/ instead of
# directly modifying this file.
#
# See the man page for details on how to write a sudoers file.
#
Defaults	env_reset
Defaults	mail_badpass
Defaults	secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

# Host alias specification

# User alias specification

# Cmnd alias specification

# User privilege specification
root	ALL=(ALL:ALL) ALL

# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL

# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL

# See sudoers(5) for more information on "#include" directives:

#includedir /etc/sudoers.d

I changed the permissions of the sudoers.d folder to 0440

now it looks like this:

dr–r----- 2 root root 4096 Aug 27 00:52 sudoers.d/

and the files inside that folder look like this

Last login: Sat Aug 27 11:36:23 2016 from 86.179.185.201
root@lc-blogs-stage:~# ll /etc/sudoers.d/
total 20
dr–r----- 2 root root 4096 Aug 27 00:52 ./
drwxr-xr-x 103 root root 4096 Aug 27 11:35 …/
-r–r----- 1 root root 119 Aug 26 19:22 90-cloud-init-users
-r–r----- 1 root root 958 Mar 30 19:57 README
-r–r----- 1 root root 210 Aug 27 00:52 web-services

another issue I’m having since updating trellis is that on the local machine dev environment if i use ssl with letsencrypt it doesnt work

i put this in development/wordpress_sites.yml

ssl:
  enabled: true
  provider: letsencrypt

when i run

vagrant up --provision

i get these errors

TASK [wordpress-install : Install WP] ******************************************
System info:
Ansible 2.1.1.0; Vagrant 1.8.5; Linux
Trellis at “Fix #639 - WP 4.6 compatibility: update WP-CLI to 0.24.1”

failed: [default] (item=leicestercollegeblog.co.uk) => {“changed”: true, “cmd”: [“wp”, “core”, “install”, “–allow-root”, “–url=https://${HTTP_HOST}”, “–title=leicestercollegeblog.co.uk”, “–admin_user=admin”, “–admin_password=admin”, “–admin_email=admin@example.dev”], “delta”: “0:00:02.807209”, “end”: “2016-08-27 12:13:55.186162”, “failed”: true, “item”: “leicestercollegeblog.co.uk”, “rc”: 255, “start”: “2016-08-27 12:13:52.378953”, “stderr”: “”, “stdout”: “”, “stdout_lines”: , “warnings”: }

NO MORE HOSTS LEFT *************************************************************

RUNNING HANDLER [common : restart memcached] ***********************************
changed: [default]

RUNNING HANDLER [common : reload php-fpm] **************************************
changed: [default]

RUNNING HANDLER [common : reload nginx] ****************************************
System info:
Ansible 2.1.1.0; Vagrant 1.8.5; Linux
Trellis at “Fix #639 - WP 4.6 compatibility: update WP-CLI to 0.24.1”

nginx: [emerg]
BIO_new_file(“/etc/nginx/ssl/letsencrypt/leicestercollegeblog.co.uk-
bundled.cert”) failed (SSL: error:02001002:system library:fopen:No such file
or directory:fopen(‘/etc/nginx/ssl/letsencrypt/leicestercollegeblog.co.uk-
bundled.cert’,‘r’) error:2006D080:BIO routines:BIO_new_file:no such file)
nginx: configuration file /etc/nginx/nginx.conf test failed
fatal: [default]: FAILED! => {“changed”: true, “cmd”: [“nginx”, “-t”], “delta”: “0:00:00.080448”, “end”: “2016-08-27 12:13:59.566172”, “failed”: true, “rc”: 1, “start”: “2016-08-27 12:13:59.485724”, “stderr”: “nginx: [emerg] BIO_new_file("/etc/nginx/ssl/letsencrypt/leicestercollegeblog.co.uk-bundled.cert") failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen(‘/etc/nginx/ssl/letsencrypt/leicestercollegeblog.co.uk-bundled.cert’,‘r’) error:2006D080:BIO routines:BIO_new_file:no such file)\nnginx: configuration file /etc/nginx/nginx.conf test failed”, “stdout”: “”, “stdout_lines”: , “warnings”: }

RUNNING HANDLER [fail2ban : restart fail2ban] **********************************
changed: [default]

RUNNING HANDLER [ferm : restart ferm] ******************************************
skipping: [default]

RUNNING HANDLER [ntp : restart ntp] ********************************************
changed: [default]

RUNNING HANDLER [sshd : restart ssh] *******************************************
changed: [default]
to retry, use: --limit @/home/sie/Sites/lc-blogs-trellis/trellis/dev.retry

PLAY RECAP *********************************************************************
default : ok=100 changed=79 unreachable=0 failed=2

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

but then when i run

vagrant provision

it works but the http shows an nginx msg and the https leads to site can’t be reached browser error

@jajouka Given that your ssh connection issue (the original topic of this thread) is resolved, please start a new thread for any other issues you are unable to resolve.

Regarding the Job for php7.0-fpm.service failed because the control process exited with error code, I believe you could resolve the issue by rebuilding your droplet. Hopefully roots/trellis#642 will prevent anyone from encountering this particular issue in the future. I don’t think there is a problem with your sudoers.

I recommend you stick with the development default of provider: self-signed for your dev VM. Let’s Encrypt will only issue a certificate to a publicly accessible server after confirming that it can access a challenge token on the server. Your development VM doesn’t satisfy this requirement.

Let’s Encrypt verifies and creates certificates through a publicly accessible web server for every domain you want on the certificate.
This means you need valid and working DNS records for every site host/domain you have configured for your WP site.

Note that if you end up choosing to set ssl enabled: false for development, your browser’s exposure to the letsencrypt setup for that domain will likely have an associated HSTS header for the domain. If you return to http for development, you’ll need to clear the HSTS header using something like this.

The HSTS header instructs your browser to remember to automatically load your site as https only for some period of time. If your site moves back to http only, the browser obediently won’t load that http version till the original HSTS header has expired, or till it is cleared manually. This is designed to prevent man-in-the-middle attacks that could try to “downgrade” a user’s connection from https to http.

i made the changes in roots/trellis#642, the ansible provisioning process complete without error, but the site doesn’t work, just a nginx 404 message, no surprise as this folder has no web site files in it

root@lc-blogs-stage:~# ll /srv/www/leicestercollegeblog.co.uk/
total 12
drwxr-xr-x 3 web www-data 4096 Aug 29 16:56 ./
drwxr-xr-x 4 web www-data 4096 Aug 29 17:03 …/
drwxr-xr-x 2 web www-data 4096 Aug 29 16:57 logs/
root@lc-blogs-stage:~#

I’m not sure how much longer i can spend trying to make trellis work, its consuming my life! error after error after error, its like an endurance test

Provisioning vs. deploying. For staging and production, there are two parts to Trellis.

  1. Provisioning. The server.yml playbook performs the basic setup of your server so that it is ready to host your site. Although the server.yml playbook must be run first, before deployment, you won’t need to run it often.
  2. Deployment. The deploy.yml playbook deploys your latest project code from your repo and should be run as often as you have new project code to deploy.

@jajouka I suspect you haven’t followed the “Deploying to remote servers” step in the README.

You’re reporting a new issue unrelated to the SSH connection topic of this thread. If deploying doesn’t resolve the matter, and you’re unable to resolve it by reading the README, docs, and searching discourse, please start a new thread for the new topic.

1 Like