TASK: [fail2ban | ensure fail2ban is installed]

Hi,
I’m getting this error on vagrant up. Any ideas?

TASK: [fail2ban | ensure fail2ban is installed] *******************************
changed: [default]

TASK: [fail2ban | ensure fail2ban is configured] ******************************
fatal: [default] => Failed to template {{ lookup(‘file’, ‘~/.ssh/id_rsa.pub’) }}: could not locate file in lookup: ~/.ssh/id_rsa.pub

FATAL: all hosts have already failed – aborting

It looks like you don’t have a SSH key generated on your computer (or maybe there’s one but under a non standard name).

If you havent’t generated a key, you can do so using the instructions here: https://help.github.com/articles/generating-ssh-keys/

4 Likes

Thank you so much for the quick reply!!

I followed the instruction from the link you posted. Worked like a charm!!

I just needed to generate a new public key.

Thanks again!!

I am working on windows, and setup ssh forwarding. I tested the key is setup fine on the host & guest machine by running ssh-add -l and both return the same hash. I ssh to the remote staging server from the vagrant guest machine, and it logs me in just fine with no password using the ssh key on my host machine (no ssh key on the vagrant guest).

But when I run the following command, I get the exact error the original post mentioned

ansible-playbook -i /srv/www/hosts/staging server.yml --ask-become-pass

I disabled root remote login, which ix why I use the become-pass switch.

@nbyloff Try either of these options:

1 - use hosted ssh key. If you have your public ssh key on github or some other git host that exposes the key, comment out the two instances of lookup('file', '~/.ssh/id_rsa.pub') in your group_vars/all/users.yml. Then make sure the web user and admin_user have a keys entry like this (with your actual username):

- https://github.com/nbyloff.keys

2 - make ssh key available to vm. The option above is probably better, but you could alternatively point the lookup to a path on the vm that does have the public ssh key. For example, you could copy the public key to the directory with server.yml (on your Windows machine), and change to

- "{{ lookup('file', '/home/vagrant/id_rsa.pub') }}"

(I’m not sure if that’s the right path on the vm for the default vagrant share folder with windows.)

I don’t know why the lookup related to the users hash tends to execute during the fail2ban role.

Your Ansible control machine (i.e., your vagrant vm when provisioning staging or production, as a Windows user) needs access to a public key to put on the remote server you’re provisioning so that your admin_user can have ssh access to run server.yml and so your web user can have ssh access to run deploy.yml.

The Trellis SSH Keys docs give some background on the keys and their use with Trellis.

Play around with the options and concepts above and see if you can get it working. If not, definitely post back. I’m not on Windows, so maybe I missed something.

1 Like

Option 1 worked perfectly. Thank you!

1 Like

I’ve identified one additional cause for this error under Windows It happens when the group_vars/all/users.yml file specifies a local public key using a ~ path to the user’s directory, e.g. ~/.ssh/keyname_rsa.pub and a vagrant provision is attempted.

Under Windows, Vagrant logs in to the VM as root when executing the shell provisioner, so when the Ansible fail2ban role is executed, in turn executing the users related lookup, it looks for the public key in /root/.ssh/ instead of /home/vagrant/.ssh/ and fails with the following error:

==> default: fatal: [127.0.0.1] => Failed to template {{ lookup('file', '~/.ssh/id_rsa_vagrant.pub') }}: could not locate file in lookup: /.ssh/id_rsa_vagrant.pub

id_rsa_vagrant.pub is the public key I use to deploy to production, and I’ve put it in /home/vagrant/.ssh/, as Ansible commands have to be run from the VM under Windows.

I fixed the error by editing my Vagrantfile, adding sh.privileged = false in the following section:

  if Vagrant::Util::Platform.windows?
    config.vm.provision :shell do |sh|
      sh.path = File.join(ANSIBLE_PATH, 'windows.sh')
      # Fixes vagrant provision under Windows
      sh.privileged = false
    end
  else

I discovered the source of the error message by comparing the verbose output under Windows and Linux.

Running > vagrant provision from Windows (note the PUT...TO /root...):

[...]
==> default: <127.0.0.1> PUT /tmp/tmpNPQtoS TO /root/.ansible/tmp/ansible-tmp-1443889992.18-46406665813175/command
==> default: <127.0.0.1> EXEC /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=ydznmgydwcersrmotyvypfgovohgxbtg] password: " -u root /bin/sh -c '"'" 'echo BECOME-SUCCESS-ydznmgydwcersrmotyvypfgovohgxbtg; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /root/.ansible/tmp/ansible-tmp-1443889992.18-46406665813175/command; rm -rf /root/.ansible/tmp/ansible-tmp-1443889992.18-46406665813175/ >/dev/null 2>&1'"'"''
==> default: ok: [127.0.0.1] => {"changed": false, "cmd": ["cat", "/etc/timezone"], "delta": "0:00:00.010372", "end": "2015-10-03 16:33:12.459853", "rc": 0, "start": "2015-10-03 16:33:12.449481", "stderr": "", "stdout": "Etc/UTC", "stdout_lines": ["Etc/UTC"], "warnings": []}
==> default:
==> default: TASK: [common | Set timezone] *************************************
[...]

Successfully running $ windows.sh from the Vagrant Ubuntu VM (note PUT... TO /home/vagrant):

[...]
<127.0.0.1> PUT /tmp/tmpf4qg2_ TO /home/vagrant/.ansible/tmp/ansible-tmp-1443890971.16-245571724493675/command
<127.0.0.1> EXEC /bin/sh -c 'sudo -k && sudo -H -S -p "[sudo via ansible, key=nbhgweenwftolrdratizkytomjhxqgjk] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-nbhgweenwftolrdratizkytomjhxqgjk; LANG=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1443890971.16-245571724493675/command; rm -rf /home/vagrant/.ansible/tmp/ansible-tmp-1443890971.16-245571724493675/ >/dev/null 2>&1'"'"''
ok: [127.0.0.1] => {"changed": false, "cmd": ["cat", "/etc/timezone"], "delta": "0:00:00.010763", "end": "2015-10-03 16:49:31.419609", "rc": 0, "start": "2015-10-03 16:49:31.408846", "stderr": "", "stdout": "Etc/UTC", "stdout_lines": ["Etc/UTC"], "warnings": []}

TASK: [common | Set timezone] *************************************************
[...]

As an addendum, am I configuring local public keys incorrectly? After thinking about this issue, it occurred to me that destroying the VM and starting over with vagrant up will break. The fail2ban role will execute the users related lookup, which will fail because I won’t have had the chance to copy the production deployment public key over to the VM’s /home/vagrant/.ssh/ directory.

This key is only meant for production deployment though, and shouldn’t affect an initial vagrant up, so it doesn’t seem correct to define it in the “all” group_vars directory. To have it only apply to the “production” group_vars, I’d have to duplicate the whole users object, which would complicate future maintenance.

Thanks @juanpescador. Good points.

I believe that taking the approach in Option 1 above (use a remotely hosted ssh key) would bypass the issue of ~/.ssh expanding to /root/.ssh instead of /home/vagrant/.ssh.

However, it’s great that you pointed out that the Trellis default lookup in ~/.ssh will fail for Windows users unless they take some kind of action: either switching to a remotely hosted ssh key or putting a key somewhere for the VM to access locally.

As you mentioned, the public ssh keys aren’t absolutely necessary for a vagrant dev machine, so it would be nice to not require them for dev. We could skip adding the key if currently provisioning a development environment, adding something like this to the add keys task:
when: ansible_ssh_user != 'vagrant'
An alternative might be to skip adding the key if the local key file is not present, e.g., like this

I agree the users/keys definition seems unnecessary for dev. Yet, I’m still inclined to define it in group_vars/all to avoid having to repeat it in both staging and production group_vars. I think that would be ok once the dev is skipping adding keys via one of the methods above. Sound right?

I’m not on Windows to check, but I think the VM’s /home/vagrant/.ssh directory will translate to your local Windows machine’s trellis_dir/.ssh directory. I think that latter directory will remain on your Windows machine after the VM is destroyed. So, I think you’d only need to add the key once per Trellis project.

Thanks for your comments @fullyint.

Yes, this is probably what I’ll do in the meantime.

In case I wasn’t clear, it only fails when running vagrant provision, because Vagrant logs in to the VM as root by default. This is fixed with the sh.privileged = false setting, ensuring Vagrant logs in as user “vagrant”. Running the ansible playbooks from inside the VM logged in as user “vagrant” works correctly.

My first inclination is to skip adding the key for the development environment, as this environment is the special case vs staging and production. As a user, I would expect an error when the key is missing when deploying to staging and production. The second method you mention would allow for a warning at most, which might not be obvious enough (fail fast, fail hard and such).

Rather than testing for development environment via the ansible_ssh_user, is there an “environment” variable that could be queried? I had a quick look and found project_environment.WP_ENV, but I’m not sure if it’s accessible from the add keys task, nor if it holds the value I’m assuming (development/staging/production).

I agree, having the definitions in group_vars/all is the best option, maintenance-wise.

I just checked, it doesn’t seem so. I don’t have a trellis_dir/.ssh directory on Windows. I touched a file in the VM’s /home/vagrant/.ssh directory then did a search on all my drives in Windows, and it doesn’t exist. I haven’t been able to find any reference to this folder syncing in the documentation either.

I really like the idea of it, though. Rather than having to manually copy keys over to the VM, the Windows %userprofile%\.ssh directory could be synced with the VM’s /vagrant/home/.ssh directory. This would mimic behaviour on Linux, where ansible is run in the host OS, so access to the user’s keys in ~/.ssh is a given. If I’ve correctly understood @nbyloff’s situation, this feature could have prevented the issue they had (presuming “I tested the key is setup fine on the host & guest” means @nbyloff’s keys are stored in %userprofile%\.ssh on the Windows host).

The documentation would have to be updated to reflect the need for a .ssh directory in the Windows %userprofile% directory.

What do you think?

edit: Vagrant’s passwordless public key would have to be appended to %userprofile%\.ssh\authorized_keys so that Vagrant could log in, which might pose a security risk if anyone is running a Windows SSH server that also uses the authorized_keys file. From a quick search, the two popular Windows SSH servers seem to be freeSSHd, which uses files named after users (p. 13) instead of the authorized_keys file, and Bitvise SSH Server, which has an advanced option to enable use of the authorized_keys file. I think the risk is low, and a warning in the documentation would be enough.