Ansible connection role error while provisioning server TASK [connection : Warn about change in host keys]

I ran “trellis droplet create production” (for Digitalocean), or whenever I try to re-provision the server I get the error.

I’m a newbie so I’m not really sure how to get around this, I have no clue what’s causing the error? Could it be related to my ssh?

This is my error:

TASK [connection : Warn about change in host keys] *****************************
fatal: [170.64.180.207]: FAILED! => {“msg”: “The conditional check ‘‘REMOTE HOST IDENTIFICATION HAS CHANGED’ in connection_status.stdout’ failed. The error was: error while evaluating conditional (‘REMOTE HOST IDENTIFICATION HAS CHANGED’ in connection_status.stdout): ‘dict object’ has no attribute ‘stdout’. ‘dict object’ has no attribute ‘stdout’\n\nThe error appears to be in ‘/Users/danbraine/Trellis Projects/wallaciaprogressassociation.com.au/trellis/roles/connection/tasks/main.yml’: line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Warn about change in host keys\n ^ here\n”}

Trellis: v1.19.0
Bedrock: v1.21.1
Ansible: 2.10.17
Python 3.10.7

Any help with this will be SUPER appreciated!! Thank you!

Although this appears to be an issue with Ansible, you should be able to work around this issue:

  • Manually connect via SSH to the target server.
  • You should see a remote host key identification warning, you can use the command proposed in the warning message to get rid of the non-matching, stored host key.
  • Manually connect via SSH again to the target server. Accept (let ssh store) the new host key.
  • Manually connect a third time via SSH, now the SSH connection should just work.

Trellis/Ansible should now also work, as its SSH connections work correctly now.

The host key changes as Trellis also updates the SSH server configuration for improved security, which causes new host keys to be generated. Those new host keys should be handled automatically (AFAIK), but sometimes it doesn’t.

1 Like

Thanks for that, I’ve definitely got the public key on the server matching my own in authorized_keys but strangely I’m the same error. I can log in with root but I tried with admin and got denied public key error. Could it be that it’s trying to provision with admin and it’s erroring that way? Thanks for the help!

No, this is not about the authorized keys, but the server host keys.
When you manually connect to that server with SSH - on the same workstation, in the same terminal where you run the Trellis playbook, you get a SSH connection warning, right?

That’s right, I get the warning about host keys,

I’ve added the fingerprint to my known hosts and I an ssh into the server just fine, when I try to provision though I get the same error. Thanks

This is the exact output when provisioning production

Does the roles/connection/tasks/main.yml in your Trellis project really look like this?:

You can run the test task that populates the connection_status variable (which apparently has an unexpected value) manually:

- hosts: <production_or_staging_host_defined_in_ansible>
  gather_facts: no
  tasks:
    - name: Check whether Ansible can connect as web
      command: |
        ansible <production_or_staging_host_defined_in_ansible> -m raw -a whoami -u web -vvvv
      delegate_to: localhost
      failed_when: false
      changed_when: false
      check_mode: no

(Minimal test ansible task taken from a tangentially related issue)

Put that YAML into a file in the Trellis project directory (e.g. test-connection.yml), replace <production_or_staging_host_defined_in_ansible> with the target system host name (as defined in ansible) and then run it manually:

ansible-playbook test-connection.yml -e env=<environment>

You can also run ansible in very verbose -vvvv mode in order to see what values stdout will have from that task. Does stdout have a string as value? You can post it here for further evaluation.

I made a fresh copy of the project, same issue arises, this code is the latest code

---
- name: Require manual definition of remote-user
  fail:
    msg: |
      When using `--ask-pass` option, use `-u` option to define remote-user:
      ansible-playbook server.yml -e env={{ env | default('production') }} -u root --ask-pass
  when: dynamic_user | default(true) and ansible_user is not defined and cli_ask_pass | default(false)

- name: Check whether Ansible can connect as {{ dynamic_user | default(true) | ternary('root', web_user) }}
  command: |
    ansible {{ inventory_hostname }} -m raw -a whoami
    -u {{ dynamic_user | default(true) | ternary('root', web_user) }} {{ cli_options | default('') }} -vvvv
  delegate_to: localhost
  failed_when: false
  changed_when: false
  check_mode: no
  register: connection_status
  tags: [connection-tests]

- name: Warn about change in host keys
  fail:
    msg: |
      WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!

      If this change in host keys is expected (e.g., if you rebuilt the server
      or if the Trellis sshd role made changes recently), then run the following
      command to clear the old host key from your known_hosts.

        ssh-keygen -R {{ connection_status.stdout | regex_replace('(.|\n)*host key for (.*) has changed(.|\n)*', '\2') }}

      Then try your Trellis playbook or SSH connection again.

      If the change is unexpected, cautiously consider why the host identification
      may have changed and whether you may be victim to a man-in-the-middle attack.

      ---------------------------------------------------
      {{ (connection_status.stdout.replace('Please contact your system administrator.\r\n', '') |
          regex_replace ('(.|\n)*(The fingerprint for the(.|\n)*Host key verification failed.)(.|\n)*', '\2') |
          regex_replace('(\\r\\n|\\n)', '\n\n')).replace('\"', '"') }}
  when: "'REMOTE HOST IDENTIFICATION HAS CHANGED' in connection_status.stdout"
  tags: [connection-tests]

- block:
  - name: Set remote user for each host
    set_fact:
      ansible_user: "{{ ansible_user | default((connection_status.stdout_lines | intersect(['root', '\e[0;32mroot', '\e[0;33mroot']) | count) | ternary('root', admin_user)) }}"
    check_mode: no

  - name: Announce which user was selected
    debug:
      msg: |
        Note: Ansible will attempt connections as user = {{ ansible_user }}

  - name: Load become password
    set_fact:
      ansible_become_pass: "{% raw %}{% for user in vault_users | default([]) if user.name == ansible_user %}{{ user.password | default('') }}{% endfor %}{% endraw %}"
    when: ansible_user != 'root' and not cli_ask_become_pass | default(false) and ansible_become_pass is not defined
    no_log: true

  when: dynamic_user | default(true)

And what stdout does that test have?

Thanks @strarsis for your patience! Running it it -VVV mode was helpful in identifying the blockage, I’m not exactly sure how it wasn’t fixed earlier but I deleted known_hosts (tried it previously) according to the error message in the -VVV mode and it’s now provisioning properly. Thanks for your timely replies!

1 Like

This could be a bug in Trellis though, as it didn’t show the host-specific error and ansible aborted because of the missing stdout in the registered variable.

Here you’ll notice when I run “trellis provision production” I get the host keys error, when I manually run ansible-playbook server.yml -e env=production It doesn’t error

1 Like

@dan213213: Ah, I notice in your terminal output that the test task is skipped by ansible. :thinking:
Is something in the test task not matching the target system/env, hence ansible skips it?

Sorry! I’m not sure, It’s just a Digitalocean droplet, I noticed the same error when running trellis deploy production, I had to run the ansible task manually again…