When deploying to our staging server, everything works beautifully. Deploying to production, however, is a different story.
Staging and Production are both (separate) DO droplets, and are configured properly. I’ve checked all configuration files and have confirmed they’re also set up properly; I’ve even done a diff on the files from two Trellis projects, running on the exact same droplet build, version of Trellis/Bedrock, and nearly identical architecture, and they look identical in all the settings that are relevant to deployment. Yet, this deployment still doesn’t work and produces an SSH error:
Failed to connect to the host via ssh: Permission denied (publickey).
fatal: [138.197.29.202]: UNREACHABLE! => {
"changed": false,
"unreachable": true
}**
Steps I’ve taken:
Recreated the droplet several times, using both the exact same settings, and then removing additional features like monitoring and backups, just to ensure the build was as clean as possible.
Removed the entries in known_hosts related to the droplets that were failing.
Manually connected (successfully) to the server via ssh.
Here is my verbose output from the production deployment:
I changed the DNS back to our old server, which is why the DNS is resolving differently now. When attempting the deployment, the DNS was set up properly.
Good call @MWDelaney. @Silverjerk see if you can just login to the production droplet via SSH first. If you can’t do that then something’s wrong with your droplet config or your local config. If you can do that then something’s wrong with your Trellis config.
The authenticity of host '138.197.29.202 (138.197.29.202)' can't be established.
ED25519 key fingerprint is SHA256:KhEIUDlU32mrluOvo96KZBqeGgkJwW2MrVC9gvbhXCE.
Are you sure you want to continue connecting (yes/no)? yes
If you are having to accept a hostkey, perhaps it is just because you…
Removed the entries in known_hosts related to the droplets that were failing.
If, however, the new hostkey means this is your first connection to this iteration of the production server, that could mean that you haven’t yet run server.yml. If that’s the case, the server.yml playbook hasn’t yet created the web user that Trellis tries to use for deploys.
You certainly have run server.yml for staging, but have you run server.yml for production?
…and did you test connecting manually with the web user (the relevant user for deploys)?
I can log into the server via SSH without issue. Also, to answer @MWDelaney’s other question, I did add the SSH key to the server during creation of the droplet.
@fullyint, I followed all of the usual steps, per the docs, which is why I’m so thoroughly baffled. I run an atext string when doing Trellis installs to make life a little easier, and I even went back and did everything manually to ensure I hadn’t missed anything.
I’m going to create a new droplet and provision and deploy again and follow the docs to the letter. I’m certain I’m missing something simple, and likely very obvious. Thanks for the assistance; if I find the solution (or realize my error), I will post the results here.
Not sure if this is helpful or not, but I always use DNS names here, rather than IPs. That way if the IP changes for any reason, Trellis can still provision and deploy. If you’re provisioning before making public DNS changes, a local HOSTS file entry does the trick to make it work until DNS is updated.
After trying again this morning, the deploying went off without a hitch. Maybe this was DNS related, and a propagation issue? I’m not certain, but it was resolved. Thanks for the quick replies, guys. This community is always responsive, helpful, and thorough. I appreciate it very much.