Unable to re-provision on DigitalOcean UNREACHABLE!

pacotole · March 24, 2021, 9:58am

In the last days I can’t run server.yml playbook on my servers on DigitalOcean.
I can do deploys and I can connect via ssh with web and admin user but if I run server.yml playbook return this:

PLAY [Ensure necessary variables are defined] ********************************************************

TASK [Ensure environment is defined] *****************************************************************
skipping: [localhost]

PLAY [Test Connection and Determine Remote User] *****************************************************

TASK [connection : Require manual definition of remote-user] *****************************************
skipping: [example.com]

TASK [connection : Specify preferred HostKeyAlgorithms for unknown hosts] ****************************
skipping: [example.com]

TASK [connection : Check whether Ansible can connect as root] ****************************************
ok: [example.com]

TASK [connection : Warn about change in host keys] ***************************************************
skipping: [example.com]

TASK [connection : Set remote user for each host] ****************************************************
ok: [example.com]

TASK [connection : Announce which user was selected] *************************************************
Note: Ansible will attempt connections as user = admin
ok: [example.com]

TASK [connection : Load become password] *************************************************************
ok: [example.com]

PLAY [Set ansible_python_interpreter] ****************************************************************

TASK [python_interpreter : Get Ubuntu version] *******************************************************
System info:
  Ansible 2.9.10; Linux
  Trellis 1.8.0: February 12th, 2021
---------------------------------------------------
Failed to connect to the host via ssh: ssh: connect to host
example.com port 22: Connection refused
fatal: [example.com]: UNREACHABLE! => {"changed": false, "unreachable": true}

PLAY RECAP *******************************************************************************************
localhost     : ok=0    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
example.com   : ok=4    changed=0    unreachable=1    failed=0    skipped=3    rescued=0    ignored=0

And after run server.yml playbook I can’t ssh to server for a few minutes:

$ ssh web@example.com
ssh: connect to host example.com port 22: Connection refused

Server is Ubuntu 20 and I have the settings:

sshd_permit_root_login: false
sshd_password_authentication: false

Could the server ban me when trying to connect as root (task Check whether Ansible can connect as root)?
Any ideas?

strarsis · March 24, 2021, 11:22am

Yes, this may be the case. You got console access with your DO account? Check the logs whether you IP has been banned because of too many failed SSH connection attempts. I had a similar issue with failing SSH connections as the host key algorithm was changed by Trellis (to something safer), but my SSH already stored the host key and didn’t like it.

mockey · March 25, 2021, 1:45pm

I have exactly the same problem. Only that the server is on Hetzner and not DO. Provisioning errors exactly at the same task and also SSH says Connection refused for a couple of minutes afterwards.
SSH generally works (deployment also works, probably because no root access), so SSH keys should be OK. Never had this before.
I was wondering if fail2ban is interfering here in some way?

mockey · March 25, 2021, 1:51pm

Hm, yes. In fail2ban.log I can see that my IP got banned and then unbanned after 10 minutes.
How to avoid that?

mockey · March 25, 2021, 2:08pm

OK, I see that fail2ban uses that ip_whitelist list for fail2ban_ignoreip. And ip_whitelist includes ipify_public_ip, which is probably the current IP, when provisioning.
I guess it’s quite common that I’ll have a different IP when re-provisioning. So I’m wondering: Did re-provisioning from a different IP ever work?

30 minutes later:
Well, so I manually edited /etc/fail2ban/jail.local on the server, put my current IP in ignore_ip, then: systemctl restart fail2ban (forgot this the first time, cost me another 10 minutes) and then re-provisioning works just fine.
I guess I could write a script, that does that for me before every re-provisioning.
But seriously, this cannot be the solution, can it?

strarsis · March 25, 2021, 8:37pm

I had a similar issue some years ago - and the reason had something to do with changed SSH host key algorithms. Maybe this also has something to do with your issue?

mockey · March 26, 2021, 12:28pm

I don’t know. These are general SSH issues. I don’t see any issues there. And as I wrote: Deploy worked just fine.

Maybe you can just try to re-provision a machine from a different IP (using a VPN e.g.)?

Twansparant · March 26, 2021, 1:21pm

I just had the exact same issue;

Yesterday I provisioned & deployed one of my sites on my laptop from my home IP to a DO droplet without any problems, today at work with same laptop and ssh keys provisioning fails.

As soon as I try to (re-)provision (with sshd_permit_root_login set to false), fail2ban bans my work IP address, resulting in the UNREACHABLE error. After waiting for 10 mins, I try ssh admin@droplet-ip directly, which works and add the IP address to the fail2ban ip_whitelist and restart fail2ban.

Now provisioning also works from my IP address.
Question is; why does fail2ban ban my IP address in the first place?

mockey · March 26, 2021, 2:09pm

I’m not a fail2ban expert, but I guess the way it is configured in trellis (with ssh service) it only allows ssh root access from IPs in ip_whitelist resp. ignore_ip. Maybe there is a recent change in fail2ban somewhere to handle this more strictly. or something, because I can’t remember having this before. But it’s also possible that my IPs just did not change that often.
I just had to do that manual change again. So I guess I’ll write a script for that now.

mockey · March 26, 2021, 4:14pm

Well, root access generally works of course, but I think every ansible task is a separate ssh connection, so there are several connections in a short amount of time and as you see in the log, only after the 5th task ssh becomes unreachable. Maybe there are certain fail2ban settings that could be tweaked or the ignore_ip setting could be changed as a first task in provisioning.

Twansparant · March 26, 2021, 6:34pm

Yeah that would make sense!

system · May 5, 2021, 9:58am

This topic was automatically closed after 42 days. New replies are no longer allowed.