Deploy hanging indefinitely at copy project files step

Hi guys, I have a deploy issue when trying to switch to another repository.
I have added my new repo URL to wordpress_sites.yml like this:

wordpress_sites:
  igsdemo.ml:
    site_hosts:
      - canonical: igsdemo.ml
        # redirects:
          # - www.igs-hcmc.de
    local_path: ../site # path targeting local Bedrock site directory (relative to Ansible root)
    repo: ssh://git@git.conceptual.site:2222/diffusion/1/igs.git # replace with your Git repo URL
    repo_subtree_path: site # relative path to your Bedrock/WP directory in your repo
    branch: master
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: letsencrypt
    cache:
      enabled: true

All my ssh keys are added properly but when I ran deploy script, it always hang at [Clone project files] task. Here’s the full log with -vvv:

TASK [deploy : Clone project files] ********************************************
task path: /Users/conceptualcode/Workspace/igs-hcmc.de/trellis/roles/deploy/tasks/update.yml:24
Using module file /Library/Python/2.7/site-packages/ansible/modules/core/source_control/git.py
<188.166.211.104> ESTABLISH SSH CONNECTION FOR USER: web
<188.166.211.104> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=web -o ConnectTimeout=10 -o ControlPath=/Users/conceptualcode/.ansible/cp/ansible-ssh-%h-%p-%r 188.166.211.104 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo ~/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575 `" && echo ansible-tmp-1490605045.71-160687618048575="` echo ~/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575 `" ) && sleep 0'"'"''
<188.166.211.104> PUT /var/folders/b9/y4z_l37d07n6g4q7ls6blxsc0000gn/T/tmpEz3gzr TO /home/web/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575/git.py
<188.166.211.104> SSH: EXEC sftp -b - -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=web -o ConnectTimeout=10 -o ControlPath=/Users/conceptualcode/.ansible/cp/ansible-ssh-%h-%p-%r '[188.166.211.104]'
<188.166.211.104> ESTABLISH SSH CONNECTION FOR USER: web
<188.166.211.104> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=web -o ConnectTimeout=10 -o ControlPath=/Users/conceptualcode/.ansible/cp/ansible-ssh-%h-%p-%r 188.166.211.104 '/bin/sh -c '"'"'chmod u+x /home/web/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575/ /home/web/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575/git.py && sleep 0'"'"''
<188.166.211.104> ESTABLISH SSH CONNECTION FOR USER: web
<188.166.211.104> SSH: EXEC ssh -vvv -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=web -o ConnectTimeout=10 -o ControlPath=/Users/conceptualcode/.ansible/cp/ansible-ssh-%h-%p-%r -tt 188.166.211.104 '/bin/sh -c '"'"'/usr/bin/python /home/web/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575/git.py; rm -rf "/home/web/.ansible/tmp/ansible-tmp-1490605045.71-160687618048575/" > /dev/null 2>&1 && sleep 0'"'"''

Do you have any ideas how I can further debug this? What are the main causes for this? Thank you.

If you are on MacOS try to import your SSH key password into Keychain by running
ssh-add -K

ssh-agent will forget this key, once it gets restarted during reboots.

Also make sure that you understand that trellis uses ssh agent forwarding to connect to your git repository as described in the docs. There is also a section on Github which contains a few tips and tricks for further troubleshooting if agent forwarding does not work.

2 Likes

@yyyyaaa Here is a possible explanation for why the task is hanging:

The git clone task will cause your server to reach out to the git.conceptual.site host. The Ansible docs for the related git module mention this:

If the task seems to be hanging, first verify remote host is in known_hosts. SSH will prompt user to authorize the first contact with a remote host.

The implication is that if your server doesn’t already “know” the git.conceptual.site host, the task could be hanging with the prompt The authenticity of host ... can't be established. Are you sure you want to continue connecting (yes/no)? – but Ansible doesn’t show you the prompt and it hangs, and you can’t just type yes because it is not an interactive session.

This usually doesn’t happen on the git clone task because the Trellis default is accept_hostkey: true (because of the default repo_accept_hostkey: true), avoiding the prompt that would cause the task to hang.

Good news. Trellis gives you a means to make the git.conceptual.site host known ahead of time, avoiding all the trouble above. You could add git.conceptual.site to group_vars/all/known_hosts.yml. This command may help you find the key to add:

ssh-keyscan git.conceptual.site

This is probably the key you want:

git.conceptual.site ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKTAnYTdH7CghvIVHS8NLyAFTXwjgnZHbcUWntkxJ+c3

Review this thread for discussion of known_hosts in general and how it is used in Trellis.

3 Likes

Hi @fullyint, sorry for late reply but I’m using an older version of Trellis, so it doesn’t have that known_hosts file. How should I continue to solve this? If I have to upgrading Trellis to the newest version, what is the easiest way to do it?

If you haven’t tried it already, you may gain more info following the docs’ troubleshooting/debugging advice:

SSH into your server and manually run the command where Ansible failed.

Example: if a Git clone task failed during deploys, then SSH into the server as the web user (which is what deploys use) and run the manual command such as git clone . This will give you a much better clue as to what’s going wrong.

Running git clone on the server may reveal the problem, or may simply prompt you to accept the host key, after which deploys may work fine.


Ultimately you’ll want to update Trellis. For keeping Trellis updated, some keep their Trellis separate from Bedrock and Sage (basic idea). Others maintain a project combining Trellis, Bedrock, and Sage, using subtrees or cherry-picking commits (recommended).

I think a lot of people just update Trellis manually, grabbing the latest files from upstream master then 1) pasting in upstream files that they themselves haven’t customized in their local projct, then 2) identifying and incorporating updates in files they have modified (like group_vars files), then 3) committing the changes.

1 Like

@fullyint thank you, I have updated my trellis folder with the help of this thread:

Hoping for everything to work correctly :smiley:

Hi @fullyint, would you mind if I follow up with another question? (because I’m in the process of doing the steps you suggested). So I have updated trellis to the latest version. Upon running vagrant up --provision it stopped at this role with a SSL error:

TASK [geerlingguy.daemonize : Download daemonize archive.] *********************
System info:
  Ansible 2.2.1.0; Vagrant 1.9.0; Darwin
  Trellis at "Check Ansible version before Ansible validates task attributes"
---------------------------------------------------
Failed to validate the SSL certificate for github.com:443. Make sure your
managed systems have a valid CA certificate installed. You can use
validate_certs=False if you do not need to confirm the servers identity but
this is unsafe and not recommended. Paths checked for this platform:
/etc/ssl/certs, /etc/pki/ca-trust/extracted/pem, /etc/pki/tls/certs,
/usr/share/ca-certificates/cacert.org, /etc/ansible
fatal: [default]: FAILED! => {"changed": false, "failed": true}

I have googled for 20 minutes but haven’t been able to find an explanation or a viable solution to this. I will appreciate very much if you could help me.

Failed to validate the SSL certificate for... errors are very often connectivity issues that resolve themselves if you try later (example), maybe 30-60 minutes later. Sometimes you can regain connectivity faster by changing your IP (enable a VPN, go to a coffee shop, etc.).

Such connectivity issues simply happen sometimes, but seem more frequent with dependencies pulling from github.com after you’ve been provisioning your VM multiple times in a short period (i.e., potentially a rate-limiting thing from GitHub).

1 Like

Would adding support for GitHub API authentication be helpful?

Yes that issue was due to connectivity. I have tried all the keys from running

ssh-keyscan git.conceptual.site

but none of them worked for me. I have provisioned the server successfully but while running deploy script, I have this issue:

Git repo ssh://git@git.conceptual.site:2222/diffusion/1/igs.git cannot be
accessed. Please verify the repository exists and you have SSH forwarding set
up correctly.

What could be wrong? I have my local id_rsa.pub set up correctly and when I tried checking ssh this command it showed no error (we use phabricator for hosting projects):
echo {} | ssh git@git.conceptual.site -p 2222 conduit conduit.ping

The result:

{"result":"phabricator","error_code":null,"error_info":null}

I’m fairly new to devops, I will greatly appreciate if you could help me. Thank you.

Just an update that having a version of Ansible that was installed with Homebrew, as opposed to pip can also cause this error. As of today looks like ansible 2.4.2 is the version we want. At least version 2.4.

I think it’s basically brew uninstall ansible and pip install ansible==2.4.2.

which ansible
which python (needs to be 2 and not 3 until later this year, I read)
which openssl

Might also be insightful. For me I’m using versions located in /usr/local/bin, which is in my $PATH before local/bin so that those commands will find binaries to use there before getting to system versions which are probably located in /usr/bin.

I’m not quite clear on how or why exactly the brewed Python pip installs one version of openssl, while the brew-installed openssl is different.