Hey guys, this has happened a few times. Now I know how to fix it, but it seems to happen every time I try and run servers.yml
for the first time.
Here’s the most recent example:
- Run
ansible-playbook server.yml -e env=staging
for the first time. - Everything provisions fine except for this error:
TASK [letsencrypt : Generate CSRs] ***********************************************************************************************************************************************************
changed: [server IP address] => (item=mydomainname.com)
TASK [letsencrypt : Generate certificate renewal script] *************************************************************************************************************************************
changed: [server IP address]
TASK [letsencrypt : Generate the certificates] ***********************************************************************************************************************************************
System info:
Ansible 2.4.3.0; Darwin
Trellis at "Add support for includes.d on all sites"
---------------------------------------------------
non-zero return code
fatal: [server IP address]: FAILED! => {"changed": false, "cmd": ["./renew-certs.py"], "delta": "0:00:01.835522", "end": "2018-03-21 03:43:12.677579", "rc": 1, "start": "2018-03-21 03:43:10.842057", "stderr": "", "stderr_lines": [], "stdout": "Generating certificate for mydomainname.com\nError while generating certificate for mydomainname.com\nTraceback (most recent call last):\n File \"/usr/local/letsencrypt/acme_tiny.py\", line 198, in <module>\n main(sys.argv[1:])\n File \"/usr/local/letsencrypt/acme_tiny.py\", line 194, in main\n signed_crt = get_crt(args.account_key, args.csr, args.acme_dir, log=LOGGER, CA=args.ca)\n File \"/usr/local/letsencrypt/acme_tiny.py\", line 149, in get_crt\n domain, challenge_status))\nValueError: staging.mydomainname.com challenge did not pass: {u'status': u'invalid', u'validationRecord': [{u'url': u'http://staging.mydomainname.com/.well-known/acme-challenge/NGe2whdzodNRWN_xGoOFjZgropE-_R2CLbgNE4baYjY', u'hostname': u'staging.mydomainname.com', u'port': u'80'}], u'keyAuthorization': u'NGe2whdzodNRWN_xGoOFjZgropE-_R2CLbgNE4baYjY.jgekYkWtlQyDHjESf2b2t9a-co3qXisH1wMfnu0IkkU', u'uri': u'https://acme-v01.api.letsencrypt.org/acme/challenge/ANYzuZunxqF1U154ibyW4gB2I2oxO1WiwxMQEz-9HPs/3900005802', u'token': u'NGe2whdzodNRWN_xGoOFjZgropE-_R2CLbgNE4baYjY', u'error': {u'status': 400, u'type': u'urn:acme:error:unknownHost', u'detail': u'No valid IP addresses found for staging.mydomainname.com'}, u'type': u'http-01'}", "stdout_lines": ["Generating certificate for mydomainname.com", "Error while generating certificate for mydomainname.com", "Traceback (most recent call last):", " File \"/usr/local/letsencrypt/acme_tiny.py\", line 198, in <module>", " main(sys.argv[1:])", " File \"/usr/local/letsencrypt/acme_tiny.py\", line 194, in main", " signed_crt = get_crt(args.account_key, args.csr, args.acme_dir, log=LOGGER, CA=args.ca)", " File \"/usr/local/letsencrypt/acme_tiny.py\", line 149, in get_crt", " domain, challenge_status))", "ValueError: staging.mydomainname.com challenge did not pass: {u'status': u'invalid', u'validationRecord': [{u'url': u'http://staging.mydomainname.com/.well-known/acme-challenge/NGe2whdzodNRWN_xGoOFjZgropE-_R2CLbgNE4baYjY', u'hostname': u'staging.mydomainname.com', u'port': u'80'}], u'keyAuthorization': u'NGe2whdzodNRWN_xGoOFjZgropE-_R2CLbgNE4baYjY.jgekYkWtlQyDHjESf2b2t9a-co3qXisH1wMfnu0IkkU', u'uri': u'https://acme-v01.api.letsencrypt.org/acme/challenge/ANYzuZunxqF1U154ibyW4gB2I2oxO1WiwxMQEz-9HPs/3900005802', u'token': u'NGe2whdzodNRWN_xGoOFjZgropE-_R2CLbgNE4baYjY', u'error': {u'status': 400, u'type': u'urn:acme:error:unknownHost', u'detail': u'No valid IP addresses found for staging.mydomainname.com'}, u'type': u'http-01'}"]}
- Oh, I had a DNS error, ok no problem, fix DNS,
ping
server to be sure DNS is set up correctly this time, all set. - Run
ansible-playbook server.yml -e env=staging
again to try an fix the error. - SSH access suddenly broken:
System info:
Ansible 2.4.3.0; Darwin
Trellis at "Add support for includes.d on all sites"
---------------------------------------------------
WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!
If this change in host keys is expected (e.g., if you rebuilt the server
or if the Trellis sshd role made changes recently), then run the following
command to clear the old host key from your known_hosts.
ssh-keygen -R [server IP address]
Then try your Trellis playbook or SSH connection again.
If the change is unexpected, cautiously consider why the host identification
may have changed and whether you may be victim to a man-in-the-middle attack.
---------------------------------------------------
The fingerprint for the ED25519 key sent by the remote host is
SHA256:hMbBtaouSGqWdOVGDDLJDfe5ZxEosgSOhqcdH3yo/d4.
Add correct host key in /Users/josephroberts/.ssh/known_hosts to get rid of
this message.
Offending ECDSA key in /Users/josephroberts/.ssh/known_hosts:9
ED25519 host key for [my IP address] has changed and you have requested strict
checking.
Host key verification failed.
fatal: [server IP address]: FAILED! => {"changed": false}
to retry, use: --limit @/Users/josephroberts/localdev/papasteamstores.com/trellis/server.retry
PLAY RECAP ***********************************************************************************************************************************************************************************
[server IP address] : ok=1 changed=0 unreachable=0 failed=1
localhost : ok=0 changed=0 unreachable=0 failed=0
- No ssh access works not the ‘root’ account (to be expected) OR the ‘joe’ account (Definitely not expected).
Info about my set up:
- I’m using a DO Droplet running ubuntu 16.04
- The website I’m deploying is virtually a blank install, except for some changes to the trellis config files. Just standard changes done when setting up development, staging, and production environments.
- Locally, I’m running macOS High Sierra 10.13.3
- Running ansible 2.4.3.0
- Im not using my
~/.ssh/id_rsa.pub
key. The key I’m actually using, and the one that is set up in mytrellis/group_vars/all/users.yml
file is~/.ssh/id_rsa_digitalocean.pub
, as well as my GitHub account keys. - I secured the server using this tutorial before running
server.yml
the first time. I also made the user created in the tutorial account a sudoer. The users account was ‘joe’ - I set
sshd_permit_root_login: false
intrellis/group_vars/all/security.yml
as well.
users.yml
:
# Documentation: https://roots.io/trellis/docs/ssh-keys/
admin_user: admin
# Also define 'vault_users' (`group_vars/staging/vault.yml`, `group_vars/production/vault.yml`)
users:
- name: "{{ web_user }}"
groups:
- "{{ web_group }}"
keys:
- "{{ lookup('file', '~/.ssh/id_rsa_digitalocean.pub') }}"
- https://github.com/broskees.keys
- name: "{{ admin_user }}"
groups:
- sudo
keys:
- "{{ lookup('file', '~/.ssh/id_rsa_digitalocean.pub') }}"
- https://github.com/broskees.keys
web_user: web
web_group: www-data
web_sudoers:
- "/usr/sbin/service php7.2-fpm *"
~/.ssh/config
(The DigitalOcean portion):
# Digital Ocean Droplets
# Host IP IP IP IP IP
Host [another_servers_ip] [this_servers_ip]
IdentityFile ~/.ssh/id_rsa_digitalocean
ForwardAgent yes
I already looked at:
What I’m trying to solve is avoiding this from happening every time I provision a new server. I regain root
access and probably be able to fix it via the DO console. However, I want to avoid this from happening EVERY time I provision the new server. In the past I’ve gotten around this by running the server.yml, provided there is no errors, regaining root and only running deployment scripts from there.
Why is this happening? I feel like, due to a lack of understanding of ssh, I’m missing some crucial step.