Why is Trellis so unpredictable? Failed to connect to the host via ssh: Permission denied (publickey)

Being new to Trellis I’ve been following the instructions as closely as I can. Yet I’ve found that no matter how many times I try to follow them I get different results than the previous time frequently.

I’m trying to deploy from a local development to staging and at first I got this issue, then once that was cleared I had this issue. So I destroyed my Digital Ocean Droplet and re-provisioned and now I’m getting this error. But I haven’t touched my SSH keys and every other time I’ve re-provisioned I have not received this error:

Because I know my SSH keys haven’t been modified locally, and that I added them to the DO server, I don’t know what keys it’s referring to. So I can’t regenerate them even if I want to (from my perspective).

I see the power of Trellis + Bedrock + Sage, but I feel like it’s unbelievably complicated and not stable. Be honest; do you think I’m just making mistakes or are these things likely weaknesses in Trellis / Bedrock / Sage Docs or code?

A couple excerpts from the troubleshooting docs might help:

Golden rule to debugging any failed command with Ansible:

  1. Re-run the command in verbose mode … to get more details. [ add -vvvv]

The subsequent permission denied error may be due to a host key change:

Your server may occasionally offer a different host key than what your local machine has on record in known_hosts. This could happen if you rebuild your server…
If this change in host keys is expected, then clear the old host key from your known_hosts by running the following command (with your real IP or host name).
ssh-keygen -R 12.34.56.78

3 Likes

See my reply here:

You’ll need to delete the known_hosts entry each time you reprovision the server.

2 Likes

Thanks for the link to the troubleshooting docs @fullyint. If it’s okay with you I’d like to submit a pull request and move the “Troubleshooting” section to the beginning of the docs. I didn’t know that was there because I’m only on the “Deploys” section right now.

Now that I know I can do that I feel silly having posted some of my questions without running that first and including it’s output. :slight_smile:

Regarding the host key change, I did clear the ssh-keygen and re-ran the script, getting the same issue. But it must have been cached somehow because I reran it a few minutes after posting this and the error disappeared and the provision continued.

Thank you both for your input and feedback!

@s3w47m88 I don’t know why you’ve had so much trouble but to ensure this isn’t actually a Trellis problem, I just created a new server on latest Bedrock + Trellis from scratch in about 10 minutes.

Here it is: https://k-thx.com/

I pulled down latest Trellis, cloned latest Bedrock and quickly configured a single site + Lets Encrypt SSL as well.

Here are the commands I did and the git diff:

: 1491518961:0;composer create-project roots/bedrock site
: 1491530686:0;ansible-playbook server.yml -e env=staging
: 1491531150:0;./bin/deploy.sh staging k-thx.com
  1. Created a new Bedrock project (really just a git clone)
  2. Already had Trellis cloned from master
  3. Edited Trellis as seen below
  4. Provisioned the server
  5. Deployed the site
  6. Went to https://k-thx.com/ and installed WP
diff --git a/group_vars/all/security.yml b/group_vars/all/security.yml
index 2d9df3d..41c219b 100644
--- a/group_vars/all/security.yml
+++ b/group_vars/all/security.yml
@@ -13,5 +13,5 @@ ferm_input_list:
 # Documentation: https://roots.io/trellis/docs/security/
 # If sshd_permit_root_login: false, admin_user must be in 'users' (`group_vars/all/users.yml`) with sudo group
 # and in 'vault_users' (`group_vars/staging/vault.yml`, `group_vars/production/vault.yml`)
-sshd_permit_root_login: true
+sshd_permit_root_login: false
 sshd_password_authentication: false
diff --git a/group_vars/staging/vault.yml b/group_vars/staging/vault.yml
index 754b854..13ad84d 100644
--- a/group_vars/staging/vault.yml
+++ b/group_vars/staging/vault.yml
@@ -4,21 +4,22 @@ vault_mysql_root_password: stagingpw
 # Documentation: https://roots.io/trellis/docs/security/
 vault_users:
   - name: "{{ admin_user }}"
-    password: example_password
-    salt: "generateme"
+    password: "fh2397qasdfjhdSKJFhSDUFG#@&*Y@#&(HUDS)"
+    salt: "skdfjhaf327r(#(*FDsdf13f29wehj))"
 
 # Variables to accompany `group_vars/staging/wordpress_sites.yml`
 # Note: the site name (`example.com`) must match up with the site name in the above file.
 vault_wordpress_sites:
-  example.com:
+  k-thx.com:
     env:
-      db_password: example_dbpassword
+      db_password: "DFf29y23kjhqae98y23#$*&@YFIUdsf"
       # Generate your keys here: https://roots.io/salts.html
-      auth_key: "generateme"
-      secure_auth_key: "generateme"
-      logged_in_key: "generateme"
-      nonce_key: "generateme"
-      auth_salt: "generateme"
-      secure_auth_salt: "generateme"
-      logged_in_salt: "generateme"
-      nonce_salt: "generateme"
+      once_salt: "generateme"
+      auth_key: "*#05oOtePN|Qn25fyP6BSk:ivVVw3>mz34]@|BK|OZKj?urL7snvcKtH7B2|X:_)"
+      secure_auth_key: "-;OVCQoI})>hgHaG(,u47ui@M{]&wInRGSZ?n@9^sV:=,/ECe4)7#1xR`-hw_*oK"
+      logged_in_key: ">44Q2g8|6dawX#3SbyRUb@*8<[=1p^SUJs#N,E.i}emW8!6_wV8Z:iEoxU<JR.r8"
+      nonce_key: "M>qQxEpMG@)ZYShhTE5}Z)[_r$R-}T_hFp_G.}=1vPAN/xx0X3M0rYfWws8T3R}C"
+      auth_salt: "f3(/H&dB$_nya%@N6Gwqin4#n]F%xhh9vW|ljgx%G2st2]{HGDv_>$1VE1/f{yy@"
+      secure_auth_salt: "E*zh,5R}Ym0k^1h$J79bkGT)p]hX$FdJVLD5BPWii{q;JBM/-;Z4s]cT>ywMv9jO"
+      logged_in_salt: "-L_V:lYG.(,gfJwa*@C+dac67hfbe;)!$j4%Z7?}V+;0^5s9*eiaTI:xlB9hXHbV"
+      nonce_salt: "4,(H^3i$+OSX:;5k4.S0XPV*GSPi$%OW%KbF7RQpSd6Br:AuxH]UDO^M*1!:9i45"
diff --git a/group_vars/staging/wordpress_sites.yml b/group_vars/staging/wordpress_sites.yml
index 054770e..a4fdc09 100644
--- a/group_vars/staging/wordpress_sites.yml
+++ b/group_vars/staging/wordpress_sites.yml
@@ -3,19 +3,18 @@
 # Define accompanying passwords/secrets in group_vars/staging/vault.yml
 
 wordpress_sites:
-  example.com:
+  k-thx.com:
     site_hosts:
-      - canonical: staging.example.com
+      - canonical: k-thx.com
         # redirects:
         #   - otherdomain.com
     local_path: ../site # path targeting local Bedrock site directory (relative to Ansible root)
-    repo: git@github.com:example/example.com.git # replace with your Git repo URL
-    repo_subtree_path: site # relative path to your Bedrock/WP directory in your repo
+    repo: git@github.com:roots/bedrock.git # replace with your Git repo URL
     branch: master
     multisite:
       enabled: false
     ssl:
-      enabled: false
+      enabled: true
       provider: letsencrypt
     cache:
       enabled: false
diff --git a/hosts/staging b/hosts/staging
index 012b254..5e6e09f 100644
--- a/hosts/staging
+++ b/hosts/staging
@@ -2,7 +2,8 @@
 # List each machine only once per [group], even if it will host multiple sites.
 
 [staging]
-your_server_hostname
+k-thx.com
 
 [web]
-your_server_hostname
+k-thx.com
+
1 Like

Thanks @swalkinshaw for putting in so much effort.

So it sounds like you and @fullyint in another thread are advising I try to deploy the example project and if that goes smoothly rebuild this install?

That makes sense to me and it’s not something I’ve tried at this point.

Well no, I wasn’t really suggesting that. Do you mean roots-example-project? I just deployed the bedrock repo itself because I didn’t want to fork it.

But why not, you can try pretty much exactly what I did to try and minimize other factors.

I also wanted to reply to this.

Right away: every single software project has bugs and Roots’ tools are no exception. Our documentation also isn’t perfect and can’t teach people every concept either. But that doesn’t necessarily mean it’s complicated and it’s definitely not “unstable”. Plenty of people have been using all or some of our tools successfully for years and that includes us. Roots members regularly create new servers with Trellis and have no issues.

Trellis isn’t magic. Someone with next to no knowledge of these concepts could create a server by following instructions but if/when things go wrong, they won’t be able to troubleshoot properly.

You created this thread because of a lack of understanding around SSH which is completely fine and normal. This isn’t something Trellis can realistically solve unfortunately (clearing out your known hosts for example). And there’s a lot of potential issues like this.

It’s likely you are making mistakes but I can’t say for sure what exactly they are. Obviously something went wrong and that was probably compounded by troubleshooting steps which made it worse and then your code/server got into a bad state.

You’ve started 10 threads on this forum in the past 3 days which might be a record. You’ve also posted replies to existing threads. This has made it very hard for us to help you. At this point we have no idea what steps you took at any point and where you’re currently at.

I understand you’re probably frustrated with things not working and just want help. But remember that everyone on these forums is spending their own time helping others out.

7 Likes

Oh I hope I didn’t come off ungrateful. Everyone has been awesome. And I’m not criticisizing Roots and co. I’m just trying to share my perspective and figure out if it’s a Spencer-needs-training thing or Roots needs folks-like-Spencer (less-experienced-in-best-practices dev) to contribute to the docs to help others.

I have a couple friends and 4-5 developers that have tried Trellis and co and given up because - in their own words - it was complicated. So I think I represent a demographic. And looking at the user-base of everything Roots contrasted to other WordPress projects it seems like the adoption rate isn’t what it could be.

So this is like my 7th time trying Roots and co and I’m determined to get through to a finished project because it’s obvious that what you guys are doing is spot on, best practice, things that the entire WP community needs to adopt. But I’m hoping that sharing these 10+ specific threads about specific issues will help my staff and others that maybe aren’t as skilled or qualified as you guys (I mean that as a compliment) will be able to get on board.

:slight_smile:

I manage about 12 developers doing WordPress theming and Plugins - so my goal here is to do this successfully myself and then train them so we can all be evangelists for the project(s) and contribute to docs and core.

So if you guys will continue to be awesomely patient with me, my staff and I will be faithful contributors and help others in these forums in the future to come.

Thank you!

1 Like

It’s not about being ungrateful and you didn’t come off that way. And I don’t care much if people criticize us :slight_smile:

It’s two things: 1) it’s just hard to help with so much information spread around, 2) people are less likely to help when so many threads are created in a short time.

Anyway, Trellis probably is complicated for a lot of people. I solely meant that just because you’ve had some issues doesn’t necessarily mean it is. I’ve run into issues with some of the simplest software too :frowning:

If/when you figure things out, we’d be more than happy if you’d contribute back to the docs to make it easier on newcomers.

1 Like

I can certainly understand how a lot of spread out threads gets confusing.

I’m getting new errors just following the GitHub instructions so I’m thinking there is something fundamentally wrong with my environment. So I’m going to create a new user on my Mac and start from absolute scratch and see where that gets me.

I’ll see if I can conclude all my open threads by tomorrow and lay it all to rest.

Sleep well!

Here’s what I suggest:

  1. Start from scratch again.
  2. Use my diff/code changes above to base your changes off of.
  3. Make note of the steps/commands you take, and the code changes like I did.
  4. If it doesn’t work, post a new thread detailing the above and the errors you encounter.
  5. After that, we can get back to trying to deploy your actual site.
1 Like

Okay, that sounds like a good plan. Thank you!

Having about a dozen sites running on Trellis now I can tell you you’ll learn a lot of things the documentation doesn’t/can’t tell you just by sticking with it. And it gets easier to debug with every install.

2 Likes

Since my last post, I have create a new Mac user, reinstalled all of the pre-requisites, destroy and created a new Droplet, and followed the Docs without variation.

Yet I ended up getting the same SSH error saying it couldn’t read / write my GitLab repository.

So I removed the SSH keys from my local machine, GitLab, Github and the server and recreated them all and now I’m getting some non-descriptive error out of nowhere.

I’m beyond aggravated at this point. It’s unbelievable to me that I’m making a mistake somewhere. There has to be something wrong with the Trellis docs or the codebase. I am completely out of options.

What would it take to have someone just sit here and walk through this with me until it’s done?

Are you able to try to rule out GitLab as a compounding factor by using GitHub for these tests?

It seems to me that that basic troubleshooting steps apply: if the problem (according to the error) is GitLab, then try replacing GitLab with something else to confirm the problem lies there.

You could also confirm that GitLab has your SSH key, and that SSH forwarding is working. You mentioned you’re on a Mac; MacOS security settings require that you type ssh-add -k to forward your local ssh key when connecting to a remote server.

However! All the remote servers involved (your DO droplet and GitLab) have to agree to accept that forwarded key.

Since you said you created a new user, and regenerated your ssh key at least once (not strictly necessary because it means you’ll need to add that new key to both DO and GitLab; ssh key problems shouldn’t require regenerating your key; you should only need to make sure the remote servers you’re talking to are willing to accept your key whatever it is) you may simply be running into a key mismatch.

If this is the case, everything might actually be working as intended. Key mismatches should prevent deploys. Can you confirm that your DO droplet has your (current, new!) ssh key added to it, and that GitLab also accepts that key? You can test the latter by trying to clone your GitLab repo locally over ssh.

4 Likes

I suggested trying out my basic example from above which avoids any potential SSH key issues. I just deployed the public Bedrock repo itself.

You can always buy support like anyone else can and @fullyint would walk you through it.

1 Like

I think you’re partially correct by saying:[quote=“s3w47m88, post:9, topic:9328”]
in their own words - it was complicated. So I think I represent a demographic.
[/quote]

But that’s kind of the beauty of it. I’d like to consider myself one of dumbest people that has managed to learn Roots, and it’s definitely challenging sometimes. It takes a lot of self-research, reading through this discourse constantly, and spinning up new projects over and over until it’s basically muscle memory. By doing this I’ve also learned that 99% of the time the issue was on my end, not Root’s end.

Even then, it’s a stretch to say I’ve “learned” roots. Not one person will ever know everything about it.

I think the important thing to keep in mind is to just help the team help you as much as possible, so try to be very detailed without being overwhelming.

And when all else fails, hire them for support. Or even just when you want to understand something very thoroughly. It’s a small price to pay for all the free support and work they put towards this amazing open source tool. If you’re managing a large team running Roots I’d recommend even doing something like weekly consults and sometimes they may be cool with you recording it (so you can turn it into content, or share it with your team and new hires). It’s a very worthy investment if your want your team to run a tight ship.

Also, I keep a public cheatsheet for myself, but I must warn that I’ve too busy to keep it updated lately. So I can’t say if it’s accurate or note. I also have a quick setup tutorial. I had my friend shadow me during it who was just learning Roots so that we could do troubleshooting. Roots updates so fast that it makes it hard to keep these things updated, but you don’t have to use the most updated version. I state my exact versions I’m using so you can use older versions of everything.

Point being, this community is some of the best coders I can think of in the world (therefore very busy), so it’s only understandable things don’t work sometimes, and you have to pay for help sometimes. But if you stick with it, your Dev company will save so much money down the road, and be able to charge much higher prices. You will also become a much better developer having pushed through it. :slight_smile:

5 Likes

Haha, thanks for the word. I’m glad to know I’m not the only dummy out here. :stuck_out_tongue:

I think that’s good advice and I’m going to take it (particularly the part about hiring them).

I’ve purchased support so we’ll see where that gets me.

2 Likes

I’m pretty sure it’s ok to hire earlier on :stuck_out_tongue:

2 Likes