[CircleCI] breaks deploy flow after trellis-cli update to 1.4.0 (from 1.0.0)

Hi guys,
I couldn’t find anything helpful on Github or here so I’m creating this topic.

My CircleCI setup stopped working after upgrading to trellis-cli 1.4.0 (from 1.0.0). Unfortunately, I don’t have any data in-between (1.0.0-1.4.0).

Few things that I think have an impact on it:

To compare - November deploys with trellis-cli 1.0.0 had any of the above issues.

When I deploy with trellis-cli from my local machine everything works fine.

Below some additional info about the setup

  • repo_accept_hostkey: true
  • Orb that I use is based on ItinerisLtd/tiller-circleci-orb
  • node v16.13.2
  • npm 8.1.2
  • ansible 2.10.16
  • virtualenv 20.10.0
  • pip 21.2.4

I double-checked the SSH keys, everything should be fine. I lost an entire weekend trying to solve it and I couldn’t.

This is just a warning which shouldn’t affect anything. If you aren’t running trellis init then that means you’re managing dependencies (like Ansible) yourself and presumably it worked before as you said.

That’s coming from Ansible and the deploy playbook; nothing related to trellis-cli which just invokes ansible-playbook.

But I’m assuming that’s the actual issue here since that’s an interactive prompt and it won’t get any input so it fails.

repo_accept_hostkey: true doesn’t matter here either. That’s for when git is cloning your repo on the remote server but the deploy is failing well before then. It’s failing trying to SSH from CircleCI to your staging server as the web user.

This is an SSH client issue on CircleCI. This means you need to configure CircleCI to accept your staging’s server as a known host.

Add additional SSH keys to CircleCI - CircleCI might help but I don’t use CircleCI anymore so I don’t know exactly.

Yeah that was my first thought, but I have 3 websites, different nodes/IPs, domains etc, the same circleci setup.

The only thing that has changed is trellis-cli (1.0.0->1.4.0) and trellis (1.7.1 → 1.12.0)

I’ve very doubtful this is related to trellis-cli. To confirm that you can remove it entirely and just run ansible-playbook deploy.yml -e env=staging -e site=whatever

Yeah you are right, I just recreated the deploy from November 30th where it was successful. I used same env trellis-cli 1.0.0 trellis 1.7.1 and it stuck on SSH connection.

What puzzles me is that my all 3 projects stuck at the same step. Means what? Some circleci updates?

I did provision those servers tho from my local machine with the new trellis code & trellis-cli, maybe something there?

I will keep digging, but yeah I think it’s false alarm - sorry for that.

I’m guessing it’s something on circle’s side. Maybe something caused their algorithm to go from RSA to ED25519? Either way I don’t know about CircleCI to know how to fix it off the top of my head.

Btw we recently created GitHub - roots/trellis-deploy-action: GitHub Action for deploying Trellis sites. That combined with trellis-cli’s trellis key generate should automate basically everything. But be warned, it hasn’t been extensively tested yet. I’m not sure I’d switch away from CircleCI just to try and solve this known hosts issue.

Oh wow that’s brilliant, will check Github Action for sure. Thanks.

Re RSA, yeah I think that’s the problem. Will confirm and let know.

Ok, regarding Github Action I’m trying to make this work…
I hit an error with trellis key generate

[✓] GitHub private key secret set [TRELLIS_DEPLOY_SSH_PRIVATE_KEY]
[✓] GitHub deploy key added [Trellis deploy]
Error: could not set SSH known hosts. ssh-keyscan command failed.
exit status 1

But that’s not a big deal, I added it manually.

But then after workflow setup (based on example). I got this error in Github Actions

Current runner version: '2.286.0'
Operating System
Virtual Environment
Virtual Environment Provisioner
GITHUB_TOKEN Permissions
Secret source: Actions
Prepare workflow directory
Prepare all required actions
Getting action download info
Error: Unable to resolve action `roots/trellis-deploy-action@v1`, unable to find version `v1`

and I can’t find your Github Action in the Marketplace

or maybe I miss something?

Sorry for keeping this in the same thread.

Ah sorry, didn’t realize that example file is wrong.

You need to use uses: roots/trellis-deploy-action@main for now. As mentioned, it’s still early and we wanted to test it more first.

I’ll also add more error output for that keyscan failing, but glad you worked around it :+1:

Yes in fact I forked it to remove the part that I don’t need it

 - uses: actions/setup-node@v2
    with:
      node-version: ${{ inputs.node-version }}
      cache: yarn
      cache-dependency-path: site/web/app/themes/${{ inputs.theme-name }}/yarn.lock

I understand this is for Sage theme? In that case it won’t work for me I have non-Sage theme.
So I got this error:

Error: Some specified paths were not resolved, unable to cache dependencies.

Maybe make it optional?

And because of that error I forked it and remove that part and I got another issue:

Run curl -sL https://roots.io/trellis/cli/get | bash -s -- -b "$HOME/.local/bin"
  curl -sL https://roots.io/trellis/cli/get | bash -s -- -b "$HOME/.local/bin"
  echo "$HOME/.local/bin" >> $GITHUB_PATH
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.9.9/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.9/x64/lib
    SSH_AUTH_SOCK: /tmp/ssh-8YpchYWp87UD/agent.1683
    SSH_AGENT_PID: 1684
roots/trellis-cli info checking GitHub for latest tag
roots/trellis-cli info found version: 1.5.0 for v1.5.0/Linux/x86_64
roots/trellis-cli info installed /home/runner/.local/bin/trellis
Run trellis init
  trellis init
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.9.9/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.9/x64/lib
    SSH_AUTH_SOCK: /tmp/ssh-8YpchYWp87UD/agent.1683
    SSH_AGENT_PID: 1684
No Trellis project detected in the current directory or any of its parent directories.
Error: Process completed with exit code 1.

Does it mean it didn’t clone repo or was initiated in wrong folder?

The thought had occurred to me, so yeah can definitely do that.

What’s the folder structure for your repo look like? I realize I definitely wrote this workflow assuming the standard one, so it might need an option to set the site/trellis dir. The 3 steps that use trellis-cli at the bottom need a working-directory set in non-standard cases.

My repo structure (in two of my projects) is the same like the one in Roots docs (Installing Trellis | Trellis Docs | Roots)

One of my first deploy attempts showed this in GH Actions:

Initializing the repository
  /usr/bin/git init /home/runner/work/*project_name*/*project_name*
  hint: Using 'master' as the name for the initial branch. This default branch name
  hint: is subject to change. To configure the initial branch name to use in all
  hint: of your new repositories, which will suppress this warning, call:
  hint: 
  hint: 	git config --global init.defaultBranch <name>
  hint: 
  hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
  hint: 'development'. The just-created branch can be renamed via this command:
  hint: 
  hint: 	git branch -m <name>
  Initialized empty Git repository in /home/runner/work/*project_name*/*project_name*/.git/
  /usr/bin/git remote add origin https://github.com/*my_org_profile*/*project_name*

In fact when I added working-directory: trellis/ to forked ver of your repo it make it work.

Until it died again at SSH… I created comment in this Issues → https://github.com/roots/trellis-deploy-action/issues/1