Deployment failure - Could not create retry file 'deploy.retry'

anguspaterson · August 25, 2016, 5:12pm

Hi Roots people,

I am currently having issues with deployment to production server.

I’ve triple checked all settings in group_vars and all seem correct.
I have also checked my public key and permissions and am able to ssh onto the server without problem.

On deployment - no dice and i am seeing the below message:

Any ideas?

fullyint · August 25, 2016, 5:43pm

If you’re on a mac could you run ssh-add -K ~/.ssh/id_rsa (may need to edit/adjust private key name) then try the deploy again?

Otherwise, could you give us a jumpstart on debugging by sharing what you’ve tried so far? There are many discourse threads with many possible suggestions for UNREACHABLE.

It is often particularly helpful to share the full verbose output from this command:

ansible-playbook deploy.yml -e "env=production site=example.com" -vvvv

anguspaterson · August 25, 2016, 10:12pm

Hi fullyint,

Thanks for the response & apologies for not sharing a huge amount of info.

So far all i’ve done is to double check that all info is correct in the vars files and see if I can ssh which i can. Attached is an image showing the output as requested above.
I also tried ssh-add -K ~/.ssh/id_rsa and no luck.

Apologies for the somewhat amateur level of debugging - i’m just getting into this new but excellent work flow

anguspaterson · August 25, 2016, 11:09pm

Hi Phil,

Still having a crack at this.
As per the above my ansible.cfg looks like this:

[defaults]
roles_path = vendor/roles
force_handlers = True

[ssh_connection]
ssh_args = -o ForwardAgent=yes -o ControlMaster=auto -o ControlPersist=60s

I’m guessing i need to add my hosts to this?

fullyint · August 25, 2016, 11:28pm

Trellis version. I’m guessing you are on a version of Trellis from before roots/trellis#544 (April 2, 2016). If you are not on the latest, I strongly recommend updating your Trellis to the latest HEAD, which will

accommodate Ansible 2.1.1.0 (roots/trellis#631)
accommodate Ubuntu 16.04 (DO default) (roots/trellis#626) and its superior HTTP2
accommodate WP 4.6 (roots/trellis#640)

Provisioning vs. deploying. For production, there are two parts to Trellis.

Provisioning. The server.yml playbook performs the basic setup of your server so that it is ready to host your site. Although the server.yml playbook must be run first, before deployment, you won’t need to run it often.
Deployment. The deploy.yml playbook deploys your latest project code from your repo and should be run as often as you have new project code to deploy.

Did you provision your server before attempting to deploy? That is, did you run
ansible-playbook server.yml -e env=production
before running ./deploy.sh production disruptivehr.co.uk ? If not, that would account for the Failed to connect to the host via ssh. The web user that is used for deploys would not yet have been created on the remote, so the connection would fail. The web user is created on the initial run of server.yml.

Hosts. Your second screenshot shows a different error: skipping: no hosts matched. The deploy.yml and server.yml playbooks attempt to connect to the hosts matching the pattern web:&{{ env }}, which is web:&production in this case. This means “match hosts that are simultaneously listed in the web and production groups.”

You error skipping: no hosts matched indicates that you do not have a host satisfying those match criteria. Could you ensure that your Trellis file hosts/production looks like this:

#hosts/production

# Add each host to the [production] group and to a "type" group such as [web] or [db].
# List each machine only once per [group], even if it will host multiple sites.

[production]
46.101.89.138

[web]
46.101.89.138

Then you should no longer get the skipping: no hosts matched problem.

ansible.cfg. Trellis typically does not require you to add anything to ansible.cfg.

SSH. If the ideas above don’t resolve the issue, could you share what your command is for your manual ssh? I’m interested to know which user name can connect. Is it ssh root@46.101.89.138? After you run server.yml as described above, you should be able to ssh web@46.101.89.138. For info on users that will run each playbook, and their ssh keys, etc., see the SSH-keys docs.

Posting code. Screenshots can be preferable in many cases, but for the issues addressed in roots discourse it is typically preferable to post code/output in code blocks, using the triple backtick (`) code fence on a separate line above and below your code. I like that just because it is searchable and can accommodate long scrolling logs without requiring readers to zoom screenshot images etc. Your screenshots above are perfectly legible. I just mention it for future.