Trellis provisioning ansible server yml fails when root login not permitted

Hi all,

I’m trying to provision a staging server with root login not permitted and it keeps failing. I have provisioned the server previously with root and it was fine, but I want to do without root for security purposes.

Initially it was failing because the sudo password for admin_user was incorrect, but that’s because I was trying to use ansible vault encryption (that’s another issue i’ll have to look into later) but without the encryption the admin_user is successful.

However, it is now failing here:

TASK [users : Add web user sudoers items for services] ************************* Incorrect sudo password fatal: [188.166.171.125]: FAILED! => {"failed": true}

I don’t understand why this is failing. I thought it was only deploy.yml which used the web_user.

I also find that, now it has failed at this stage, if I try and provision again, it now fails because the admin_user sudo password is now incorrect. It’s like ansible has changed the password during the last attempted provisioning.

Is it worth going to all this trouble, or should I just allow root login as i’m very much leaning towards that resolution at the moment.

Thanks

1 Like

If you changed the sudoer password hash in group_vars/<environment>/vault.yml, then what you’ve said sounds right, because the task right before the failed task is the Setup users task, which would be the one to update the sudoer password hash on the server. I’ll explain.

With root login disabled, you would have had to pass the admin_user sudoer password to the ansible-playbook command using the option --ask-become-pass. You probably passed in the old sudoer password, which would still have been in effect on the server and would have worked for all tasks up till the point the password hash was changed in the Setup users task. Then the next task would fail because Ansible was still trying to use the sudoer password passed to --ask-become-pass, which was suddenly out-of-date.

My guess is that the playbook will run completely and successfully if you run it again, but this time pass in the new sudoer password with --ask-become-pass.


Changing the sudoer password hash is appropriate given that you were enabling vault and wanted the newly encrypted vault.yml file to have a new hash that wasn’t in plain text in your git history.

However, changing an existing password for a sudoer while connecting as that same sudoer does present the complication you experienced here. It’s not a scenario I’ve considered before. I suppose a person who wants to change the sudoer password after root login is disabled has four options after changing the sudoer password hash in group_vars/<environment>/vault.yml.

First, you could take your approach of just running server.yml, having to enter the old sudoer password with the --ask-become-pass, then just let the playbook fail right after the password is changed, as you experienced. Then just use the new password with Ansible commands going forward (with --ask-become-pass).

Second, an alternative would be to change back to sshd_permit_root_login: true and run the sshd role of server.yml to apply that change (e.g., ansible-playbook server.yml -e env=<environment> --tags sshd). Then run the users role, which will change the sudoer password. It will connect as root now, so no need for --ask-become-pass (e.g., just run ansible-playbook server.yml -e env=<environment> --tags users). Then finally, change back to sshd_permit_root_login: false and apply the change by running the sshd role again. That’s a bit of a hassle in effort to avoid seeing the failed task that you saw.

Third, an alternative would be to change the hash manually by ssh-ing into the server (and still change it in your group_vars/<environment>/vault.yml of course).

I guess a fourth alternative would be to back up and database, uploads, etc. (if you even need them), then rebuild the server completely, this time with the new sudoer password hash, of course.


You’re right that it is only deploy.yml that uses web_user, but that user must be set up first before it can be used. The user setup takes place in server.yml, in the users role where you saw the failed task.

5 Likes

Many thanks for your highly detailed reply, I’ve seen similar lengthy and detailed replies on other topics and I commend you and the rest of the roots team for that.

Regards this specific issue it’s very strange because i’m pretty certain (i’ve tried provisioning multiple times so could be wrong) it was the first provision so the admin_user was created from the users role with no issue about sudoer passwd, but then it failed on web_user sudoer passwd, and when I tried to provision again, it then failed admin_user passwd, even though i hadn’t changed anything.

I think i’m just going to destroy and rebuild the droplet and start again and see how I get on as I was also having issues with the letsencrypt SSL implementation as playbook kept skipping it for some reason.

A couple of questions if you don’t mind:

  1. i’m using db_import on the wordpress_sites.yml file, would the db import have any effect on the users role task, or on the letsencrypt task? I presumed not but want to cover all bases.

  2. I’m right in thinking I do not need to pre-create admin_user on the server before provisioning? The user role takes care of this.

Thanks

1 Like

Thanks for the thanks. It does take a lot of work.

Sounds good. Rebuilding seems to be the most efficient option at this point, assuming your staging server can afford the downtime.

Off the top of my head, the only reason I can think of for why letsencrypt role would skip is if these two conditions were not met:

No effect that I can think of.

You are correct. You should not need to pre-create any user at all, assuming your VPS provider gives you at least one user with SSH access. This is commonly root (e.g., DO) or ubuntu (AWS). See this thread for discussion of the situation when the VPS provider does not make available an SSH user at VPS creation time.

1 Like

After destroying the droplet and starting again I managed to get letsencrypt working. Thanks again for all the help. Just a couple of follow up questions if you don’t mind…

I’ve just changed the mail.yml file in group_vars/all/ locally and ran the ssmtp ansible playbook role. Is that the correct thing to do to get just the mail changes on the production server? I presume I will need to deploy as well, but i’m worried it will do like a fresh install and I’ll need to back up the production db and then import it etc.

Thanks

Congrats on perseverance and success!

I see that every config in group_vars/all/mail.yml is only used in the ssmtp role, so running just that role should be sufficient. I am unaware of deploy.yml having any related effect, so you shouldn’t need to “deploy” those changes in group_vars/all/mail.yml. “Deploy” has to do with changes to the site (e.g., stuff in site directory in your repo), as opposed to how “provisioning” applies changes to the server more generally.

1 Like

Many thanks again @fullyint. I had a similar issue where all of a sudden I was getting Incorrect sudo password when provisioning when I had no problems before with admin_user (admin). Using --ask-become-pass worked, but it was the same password in my vault file. I tried decrypting/re-encrypting the vault file, but that didn’t work.

So, just documenting what worked, per above:

set sshd_permit_root_login: true , then:

ansible-playbook server.yml -e env=production --tags sshd --ask-become-pass
ansible-playbook server.yml -e env=production --tags users

set sshd_permit_root_login: false , then:

ansible-playbook server.yml -e env=production --tags sshd

@merchantguru
I discovered several hours ago that Ansible 2.2.1.0 was handling the sudo password differently than earlier versions. A fix was pushed in roots/trellis#758.

Assuming you are running 2.2.1.0 and were affected by the issue in roots/trellis#758, I believe your procedure effectively changed the password on your server to
{% raw %}my_password{% endraw %}

I recommend these steps to get your password back to normal and working:

  1. Apply the fix from roots/trellis#758

  2. Run the users role to change your password back to normal on the server
    ansible-playbook server.yml -e env=production --tags users -K
    When prompted, enter your password as {% raw %}my_password{% endraw %}
    which just allows admin to invoke sudo this time, and change your password back to the version that omits {% raw %}.

Going forward, you shouldn’t need --ask-become-pass. Things should be back to normal.

Most users will only need step #1. Step #2 is necessary only for users who took steps similar to what @merchantguru described, and now need to change the password back to normal on the server.

3 Likes

my god, thanks!! pure genius. Yep, was running 2.2.1.0 and low and behold, saw your post, noticed that my provisioning was broken again, ran those steps, and I’m up and running. nice patch.

1 Like

Ah! I was struggling with this today and just assumed it was because I am fairly new to Trellis!

In fact, I updated Ansible yesterday which I believe is the root of the problem as you described. I’ll try and patch, thanks in advance for turning it round so quickly @fullyint.

EDIT

Worked a treat :slight_smile:

Only needed to patch the role and then reprovision with no addition tags.

1 Like