Failure to establish connection when provisioning via ansible-playbook server.yml

fullyint · August 25, 2016, 8:09pm

The -u vagrant will probably always fail. A default DO Ubuntu droplet will only have the user root, not vagrant, so even if you have the correct ssh key, an attempt to connect as vagrant user will fail. I think the command below is better for testing:

ansible staging -m raw -a whoami -u root

You’ll notice that this is the command Trellis uses to test whether it can connect as root or whether it must fall back to the admin_user. If the connection as root succeeds, Trellis will use root. That’s why I don’t see any reason to change the admin_user to root. Trellis won’t even try the admin_user unless root has already failed. In addition, the purpose of the admin_user is to have a non-root user who can connect in case you’ve heightened security by disabling root login (see security docs).

I’m not perfectly familiar with all the ssh possibilities, so some of this may be unnecessary, but…
I’m guessing you’re using zsh instead of bash, so, sorry to make you repeat, could you try this:

ssh-agent zsh
ssh-add /home/sbeasley/.ssh/digital_ocean

# Connection Test 1: basic connection
ansible staging -m raw -a whoami -u root

# Connection Test 2: force choice of private ssh key
ansible staging -m raw -a whoami -u root --private-key=/home/sbeasley/.ssh/digital_ocean

If Connection Test 1 succeeds, then I guess that finally adds your key to the ssh-agent and I bet the ansible-playbook command will succeed. If it fails, but Test 2 succeeds, then apparently Ansible is having trouble finding the right ssh key on its own. You could try to figure out why, or just add the --private-key=/home/sbeasley/.ssh/digital_ocean to the end of your ansible-playbook commands. Or, set up your Trellis hosts/staging like this:

# hosts/staging
lc-dev1.co.uk ansible_host=178.62.35.88 ansible_ssh_private_key_file='/home/sbeasley/.ssh/digital_ocean'

[staging]
lc-dev1.co.uk

[web]
lc-dev1.co.uk

(ref for ansible_ssh_private_key_file)

If Connection Tests 1 and 2 both fail, then I’m not sure what to explore next.

Have you had a successful Trellis project before or is this project the first attempt? (helps isolate problem to your dev environment vs. your current project configuration)
Are you making any modifications to the default bare Ubuntu box from DO before running Trellis commands?
Any relevant configs in /home/sbeasley/.ssh/config or /etc/ssh/ssh_config (e.g., on line 19 for Host *)?
In case this is a duplicate of an obscure problem, you could try adding control_path = %(directory)s/%%h-%%r to your ansible.cfg under [ssh_connection] (details)

Given that you did all the work to update Trellis, I’d suggest rebuilding your droplet with ubuntu 16.04.

Again, I don’t see this as being necessary, because I don’t see that Vagrant has anything to do with your connection to a DO staging server. But I want to be sure I’m not missing something that could be the key to resolving the connection issue. Do you have Vagrant involved in some way? What is your understanding of how Vagrant is related to your Ubuntu machine’s connection to your DO staging server?

In the latest version of Trellis, you simply define the admin_user’s raw password in group_vars/<environment>/vault.yml. The admin_user does not exist on the DO bare Ubuntu box. Trellis creates the admin user (and any other users in group_vars/all/users.yml as part of the server.yml playbook in the users role. The SSH-keys docs describe these users and their purposes.

Current Trellis does not have a hashed version of passwords. Any chance you’re still seeing the old version of vault_sudoer_passwords removed in roots/trellis#614?

Trellis will assign the admin_user the password when it creates the admin_user, so you will not need to ssh-copy-id -i ~/.ssh/digital_ocean admin@*****