Cannot Deploy to Production (SSH Error)

Silverjerk · November 16, 2017, 10:53pm

When deploying to our staging server, everything works beautifully. Deploying to production, however, is a different story.

Staging and Production are both (separate) DO droplets, and are configured properly. I’ve checked all configuration files and have confirmed they’re also set up properly; I’ve even done a diff on the files from two Trellis projects, running on the exact same droplet build, version of Trellis/Bedrock, and nearly identical architecture, and they look identical in all the settings that are relevant to deployment. Yet, this deployment still doesn’t work and produces an SSH error:

Failed to connect to the host via ssh: Permission denied (publickey).

fatal: [138.197.29.202]: UNREACHABLE! => {
"changed": false, 
"unreachable": true
}**

Steps I’ve taken:

Recreated the droplet several times, using both the exact same settings, and then removing additional features like monitoring and backups, just to ensure the build was as clean as possible.
Removed the entries in known_hosts related to the droplets that were failing.
Manually connected (successfully) to the server via ssh.

Here is my verbose output from the production deployment:

gist.github.com

https://gist.github.com/silverjerk/06b412eac62501048e4ac53e9d500b26

gistfile1.txt

➜  trellis ./bin/deploy.sh production firstamericanmerchant.com -vvv  
ansible-playbook 2.4.0.0
  config file = /Users/commonmind/Development/Sites/emb-fam/trellis/ansible.cfg
  configured module search path = [u'/Users/commonmind/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /Library/Python/2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible-playbook
  python version = 2.7.10 (default, Jul 15 2017, 17:16:57) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)]
Using /Users/commonmind/Development/Sites/emb-fam/trellis/ansible.cfg as config file
Parsed /Users/commonmind/Development/Sites/emb-fam/trellis/hosts/development inventory source with ini plugin
Parsed /Users/commonmind/Development/Sites/emb-fam/trellis/hosts/production inventory source with ini plugin

This file has been truncated. show original

Any help would be greatly appreciated.

cfx · November 16, 2017, 11:44pm

I suspect your DNS settings need to be changed. You’re trying to connect to 138.197.29.202 but your domain doesn’t resolve to that IP:

PING firstamericanmerchant.com (72.52.171.53): 56 data bytes
64 bytes from 72.52.171.53: icmp_seq=0 ttl=47 time=63.719 ms

Silverjerk · November 16, 2017, 11:54pm

I changed the DNS back to our old server, which is why the DNS is resolving differently now. When attempting the deployment, the DNS was set up properly.

MWDelaney · November 17, 2017, 1:24am

Is your Trellis hosts/production set up? Can you share its contents?

MWDelaney · November 17, 2017, 1:25am

Oh! When you created the droplet did you add your SSH key to the server as part of the process?

cfx · November 17, 2017, 2:13am

Good call @MWDelaney. @Silverjerk see if you can just login to the production droplet via SSH first. If you can’t do that then something’s wrong with your droplet config or your local config. If you can do that then something’s wrong with your Trellis config.

MWDelaney · November 17, 2017, 2:15am

For completeness you say here that you connected manually with success. Was it with a key or a password?

fullyint · November 17, 2017, 2:30am

I noticed your gist output includes:

The authenticity of host '138.197.29.202 (138.197.29.202)' can't be established.
ED25519 key fingerprint is SHA256:KhEIUDlU32mrluOvo96KZBqeGgkJwW2MrVC9gvbhXCE.
Are you sure you want to continue connecting (yes/no)? yes

If you are having to accept a hostkey, perhaps it is just because you…

Removed the entries in known_hosts related to the droplets that were failing.

If, however, the new hostkey means this is your first connection to this iteration of the production server, that could mean that you haven’t yet run server.yml. If that’s the case, the server.yml playbook hasn’t yet created the web user that Trellis tries to use for deploys.

You certainly have run server.yml for staging, but have you run server.yml for production?

…and did you test connecting manually with the web user (the relevant user for deploys)?

MWDelaney · November 17, 2017, 2:50am

@Silverjerk specifically step 6 at the bottom of this page in the docs:

Silverjerk · November 17, 2017, 3:38am

It is, and here’s the contents:

[production]
138.197.29.202

[web]
138.197.29.202

Silverjerk · November 17, 2017, 3:48am

I can log into the server via SSH without issue. Also, to answer @MWDelaney’s other question, I did add the SSH key to the server during creation of the droplet.

@fullyint, I followed all of the usual steps, per the docs, which is why I’m so thoroughly baffled. I run an atext string when doing Trellis installs to make life a little easier, and I even went back and did everything manually to ensure I hadn’t missed anything.

I’m going to create a new droplet and provision and deploy again and follow the docs to the letter. I’m certain I’m missing something simple, and likely very obvious. Thanks for the assistance; if I find the solution (or realize my error), I will post the results here.

MWDelaney · November 17, 2017, 4:03am

Not sure if this is helpful or not, but I always use DNS names here, rather than IPs. That way if the IP changes for any reason, Trellis can still provision and deploy. If you’re provisioning before making public DNS changes, a local HOSTS file entry does the trick to make it work until DNS is updated.

Silverjerk · November 17, 2017, 4:32am

That’s a good tip, and makes a lot of sense. Thanks!

Silverjerk · November 17, 2017, 9:05pm

After trying again this morning, the deploying went off without a hitch. Maybe this was DNS related, and a propagation issue? I’m not certain, but it was resolved. Thanks for the quick replies, guys. This community is always responsive, helpful, and thorough. I appreciate it very much.