While provisioning EC2 with env=produciton - Permission denied (publickey)

bassman2112 · December 28, 2017, 7:31pm

Hey all!

I’ve looked through a lot of threads on this topic, and unfortunately haven’t been able to resolve my issue. Most recently I saw this thread, and though my issue is similar, I know the solution is not the same (in that post’s case, there was version incompatibility between Trellis & Ansible).

The bottom of this post will have more details with what I’ve done to address this; but first, here is my log when running ansible-playbook server.yml -e env=production (replaced IPs and usernames with x’s)

[DEPRECATION WARNING]: The use of 'include' for tasks has been deprecated. Use 
'import_tasks' for static inclusions or 'include_tasks' for dynamic inclusions.
 This feature will be removed in a future release. Deprecation warnings can be 
disabled by setting deprecation_warnings=False in ansible.cfg.

PLAY [Ensure necessary variables are defined] **********************************

TASK [Ensure environment is defined] *******************************************
skipping: [localhost]

PLAY [Test Connection and Determine Remote User] *******************************

TASK [connection : Require manual definition of remote-user] *******************
skipping: [18.xx.xx.xx]

TASK [connection : Specify preferred HostKeyAlgorithms for unknown hosts] ******
skipping: [18.xx.xx.xx]

TASK [connection : Check whether Ansible can connect as root] ******************
ok: [18.xx.xx.xx -> localhost]

TASK [connection : Warn about change in host keys] *****************************
skipping: [18.xx.xx.xx]

TASK [connection : Set remote user for each host] ******************************
ok: [18.xx.xx.xx]

TASK [connection : Announce which user was selected] ***************************
Note: Ansible will attempt connections as user = ubuntu
ok: [18.xx.xx.xx]

TASK [connection : Load become password] ***************************************
ok: [18.xx.xx.xx]

PLAY [Install prerequisites] ***************************************************

TASK [Install Python 2.x] ******************************************************
System info:
  Ansible 2.4.1.0; Darwin
  Trellis at "Add MariaDB 10.2 PPA"
---------------------------------------------------
Failed to connect to the host via ssh: Permission denied (publickey).

fatal: [18.xx.xx.xx]: UNREACHABLE! => {"changed": false, "unreachable": true}
	to retry, use: --limit @/Users/<xxxxuserxxxx>/Documents/GitHub/<xxxxprojectxxxx>/trellis/server.retry

PLAY RECAP *********************************************************************
18.xx.xx.xx               : ok=4    changed=0    unreachable=1    failed=0   
localhost                  : ok=0    changed=0    unreachable=0    failed=0

I know a lot of people choose to use DO instead of AWS/EC2; but I’m confined to AWS due to the parameters of this project. I’ve been following other users’ suggestions for its setup, and thought I had it right. I’ll try to go through everything I’ve done that is relevant (should be noted - I am doing all of the provisioning/etc from my Mac because Ansible is easier to work with here)

Set hosts/production to the public IP address of the instance
Set admin_user to ubuntu (in group_vars/all/users.yml)
Added my .pem key for the SSH to the {{ admin_user }} keys (along with the github one - both of which are confirmed to be in Keychain Access)
Ensured Ansible was compatible with my current build of Trellis (rolled back Ansible to a lower version)
Tried setting sshd_permit_root_login to both True and False
Ensured everything was set up properly with regards to DNS and the instance

I can SSH into the instance flawlessly when I use ssh -i KEYNAME.pem ubuntu@ec2-18-xx-xx-xx.us-xxxx-xx.compute.amazonaws.com, so I know the connection with the key and Keychain Access is working properly. This is also the key I point to in users (formatted like this: - "{{ lookup('file', '~/.ssh/KEYNAME.pem') }}")

Any help would be appreciated, I’m fairly stumped!

Thanks a ton =)

fullyint · December 28, 2017, 9:00pm

A more representative manual test would be ssh ubuntu@18.xx.xx.xx.

Trellis doesn’t not indicate the private key to use, as you have done via -i KEYNAME.pem, but you could test whether making the key explicit would help. Try one of these options:

ansible-playbook server.yml -e env=production --private-key=path/to/KEYNAME.pem
or add to the [defaults] section of ansible.cfg an entry private_key_file = /path/to/KEYNAME.pem
or add an entry for Host 18.xx.xx.xx to your local ~/.ssh/config, giving that host a parameter IdentityFile path/to/KEYNAME.pem
If none of the above have worked yet, followup the IdentityFile setting with IdentitiesOnly yes which tells your ssh client to try only the IdentityFile you’ve indicated. This matters if you have many SSH keys and you need your ssh client to try only the correct key, instead of potentially first trying a bunch that will fail, which could cause the remote server to disconnect before your client ever tries the correct key.

If any of those work, it suggests that your local machine’s ssh agent was not offering up the key. In that case, you’d want to check keys known to the agent using ssh-add -l to see if the output includes the desired key. If not, ssh-add -K path/to/KEYNAME.pem. This should prevent you from having to indicate the key explicitly as in the bullet points above.

In any case, if you’re on macOS Sierra or newer, make sure to set up your ssh config to not “forget” your ssh key.

The ssh keys docs point out that the users dictionary only lists public keys, not private keys. Trellis copies the keys from this list into the respective user’s authorized_keys file on the remote server.
You don’t want it copying your private keys to any remote location.

The users dict is used in creating create the respective users (if they don’t already exist) and grant them access. A common and understandable confusion is that these are the private keys used to access the server on initial provisioning. Not so! The very first connection happens using your private key and its corresponding public key that must already be on the server before Trellis enters the mix.

Be sure to remove your private keys from the users dictionary. I recommend having the only entry in each users’ keys be - https://github.com/bassman2112.keys. If other individuals need to access the server, you could add their corresponding URLs.

bassman2112 · December 28, 2017, 9:09pm

Cool!

So if I’m understanding you correctly, I should be creating a key pair on my Mac, and adding that public key into the authorized_keys file within its ~/.ssh folder?

If so, I think I had a fundamental misunderstanding of how the handshake was happening, and that clears things up a ton!

Edit:

It should be noted that after I made the appropriate changes (thanks again to your awesome reply, it was super insightful) the command still wasn’t working. Though initially discouraging, I restarted the computer and it worked the first time around (go figure). It is now whizzing away and provisioning the VM! Thank you for your help, this community is fantastic =)

fullyint · December 28, 2017, 9:52pm

Congrats on getting it working!

That’s correct in the general concept, but here are details for your specific context with Trellis.

Yes you need to have created a key pair, but in the simplest case you would only need to do so once, reusing the same key pair on projects going forward. However, you may want to use a new and different pair in some cases. For example, pair 1 for all your personal small clients’ projects, pair 2 for your work with agency A’s projects, pair 3 for your work with agency B, etc.

When you mention manually adding a public key to authorized_keys, I assume we’re addressing the topic of “what key does Trellis need on the server and how does it get there?” For its initial connection, Trellis just needs some key any key on the server. You typically would not have to add it manually to authorized_keys. Rather, at the time of creating a new server, DO, AWS, etc. give the option of preloading an ssh key, like you probably did with KEYNAME.pub.

Anyway, the minimum you need is 1) create ec2 and have it pre-load some public key and 2) indicate some public key in users dict to be loaded on server for the admin_user and web_user. Steps #1 and #2 may use the same public key or different keys. The step #1 key is involved in the first connection Trellis will make. The step #2 key is only relevant for the connections Trellis enables for future (e.g., for the web_user who deploys sites).

My earlier suggestion to only list your GitHub URL in users was just because I thought that would be convenient. But maybe it would seem simpler to list only KEYNAME.pub, corresponding to the KEYNAME.pem you use to connect. Then in a tidy manner you’d be involving only a single key pair for this project.