Bedrock-ansible deploys from Codeship

Hey guys,

I’ve been trying to setup CodeShip to deploy using bedrock-ansible deploy playbook, I get stuck on this task:

TASK: [deploy | Copy project templates] *************************************** 
<107.170.237.109> ESTABLISH CONNECTION FOR USER: web
<107.170.237.109> EXEC ssh -C -tt -vvv -o ForwardAgent=yes -o ControlMaster=no -o ControlPersist=60s -o ControlPath="/home/rof/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=web -o ConnectTimeout=10 107.170.237.109 /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1432860797.37-191618025850755 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1432860797.37-191618025850755 && echo $HOME/.ansible/tmp/ansible-tmp-1432860797.37-191618025850755'
EXEC previous known host file not found for 107.170.237.109
fatal: [107.170.237.109] => SSH Error: ssh: connect to host 107.170.237.109 port 22: Connection timed out
while connecting to 107.170.237.109:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
to retry, use: --limit @/home/rof/deploy.retry

107.170.237.109            : ok=7    changed=4    unreachable=1    failed=0  

Im not sure why all the other tasks before this one didn’t timed out, when i check our server i can see the release directory was created and everything look fine.

Any help will be much appreciated

Does it consistently fail on that task?

Not too much we can do to help unfortunately since it’s not an error with the Ansible task itself. It’s just an SSH timed out error.

Definitely weird that it fails mid playbook but that “Copy templates” task is pretty simple and shouldn’t be causing the issue itself.

Yes consistently on the same task, is weird. Do you know if by increasing this ConnectTimeout=10 will help at all? or what else can potentially cause that timeout?

Might as well try to bump that up and maybe also ControlPersist (these can be set in ansible.cfg).

Does that task fail instantly? Or does it hang for a while first?

it hangs for like 5 secs then it fails, I’ll give it a try with those ssh arguments.

Ok i tried a couple of thing and still no luck, when i run it with ControlMaster=auto i dont get past the gathering facts. So i went back to try with ControlMaster=no, and this lets me run the tasks until it times out again on the same task. This time installed ansible 1.8.4 since thats the version i use on my local machine and im able to deploy no problem. I got a more detailed error this time:

TASK: [deploy | Copy project templates] *************************************** 
<107.170.237.109> 
<107.170.237.109> ConnectTimeout=10 PasswordAuthentication=no KbdInteractiveAuthentication=no User=web ForwardAgent=yes ControlPath=/home/rof/.ansible/cp/ansible-ssh-%h-%p-%r PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey ControlMaster=no ControlPersist=30m

fatal: [107.170.237.109] => SSH encountered an unknown error. The output was:
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /home/rof/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Control socket "/home/rof/.ansible/cp/ansible-ssh-107.170.237.109-22-web" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to 107.170.237.109 [107.170.237.109] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: connect to address 107.170.237.109 port 22: Connection timed out
ssh: connect to host 107.170.237.109 port 22: Connection timed out


FATAL: all hosts have already failed -- aborting

PLAY RECAP ******************************************************************** 
to retry, use: --limit @/home/rof/deploy.retry

107.170.237.109            : ok=8    changed=6    unreachable=1    failed=0 

This are my ssh_args in my ansible.cfg

[ssh_connection]
ssh_args = -o ForwardAgent=yes -o ControlMaster=no -o ControlPersist=30m

This is the ~/.ssh/config i get on codeship VM.

UserKnownHostsFile=/dev/null
StrictHostKeyChecking=no
ServerAliveInterval 3
ServerAliveCountMax 600

any ideas what else i can try?

You might not like it… but Codeship has Capistrano support out of the box. Maybe that would be a better fit? Just an idea.

You could manually SSH’ing into Codeship and trying out that task’s commands/the SSH connection and see what happens.

yeah i saw that when i started playing around with Codeship, but i think it make more sense to try get this going, if i cant figured this out, that option is there.

I created an ssh debug build in codeship to ran the commands directly on the VM.
First i ran this:

ansible web -i hosts/staging -vvvv -u web -m ping

i get the success "ping": "pong" message.

Then i tried to run the deploy playbook and it fails again on the same task with that timeout message.
After this if i try to run the ansible ping, and it fails now with the same timeout error as the deploy playbook.

I’m wondering, it this may be something on the remote server, which is a digital ocean droplet provisioned with bedrock-ansible. I’m going to try to reprovision it with the latest changes on your repo, and give it a new try.

You can try running the actual commands that are in that task manually. The most common cause of Ansible hanging is a command which ends up needing sudo so it prompts for a password behind the scenes and hangs.

Of course that command should not do that normally, but maybe something weird is going on.

Another thought just occurred to me that if you’re doing more SSH connections than normal it could result in a temporary ban of it caused by ferm.

See here. You could bump those hits up a lot higher.

1 Like

Just to documented here, i was not able to get trellis deployment to work with CodeShip, their support team couldn’t figured out the problem either, so i switched to Capistrano deployments and that worked fine. Anyway thank you for the help.