Ssh error on remote provision - fails at TASK [Install Python 2.x]

jajouka · April 30, 2017, 12:09am

I had trellis working perfectly, my local vagrant server was working fine, my staging server was provisioned and i could deploy, i was not getting any errors. Then I have changed the domain name for my staging server and now I can’t provision the server . I get this error :

TASK [Install Python 2.x] …

$ ansible-playbook -vvvv server.yml -e “site=mysite.co.uk env=staging”

OpenSSH_7.2p2 Ubuntu-4ubuntu2.1, OpenSSL 1.0.2g 1 Mar 2016
debug1: Reading configuration data /home/sie/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug2: fd 3 setting O_NONBLOCK
debug2: mux_client_hello_exchange: master version 4
debug3: mux_client_forwards: request forwardings: 0 local, 0 remote
debug3: mux_client_request_session: entering
debug3: mux_client_request_alive: entering
debug3: mux_client_request_alive: done pid = 4200
debug3: mux_client_request_session: session request sent
debug1: mux_client_request_session: master session id: 2
debug3: mux_client_read_packet: read header failed: Broken pipe
debug2: Received exit status from master 100
Shared connection to 138.xx.xxx.xx closed.

fatal: [mysite.co.uk]: FAILED! => {
“changed”: true,
“failed”: true,
“invocation”: {
“module_args”: {
“_raw_params”: “sudo apt-get install -qq -y python-simplejson”
},
“module_name”: “raw”
},
“rc”: 100,
“stderr”: “OpenSSH_7.2p2 Ubuntu-4ubuntu2.1, OpenSSL 1.0.2g 1 Mar 2016\r\ndebug1: Reading configuration data /home/sie/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 4200\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 100\r\nShared connection to 138.xx.xxx.xx closed.\r\n”,
“stdout”: “E: dpkg was interrupted, you must manually run ‘sudo dpkg --configure -a’ to correct the problem. \r\n”,
“stdout_lines”: [
"E: dpkg was interrupted, you must manually run ‘sudo dpkg --configure -a’ to correct the problem. "
]
}

i have read through several posts on this forum, starting here :

here are my installed program versions:

$ ansible --version
ansible 2.3.0
config file = /home/sie/Sites/wbba/trellis/ansible.cfg
configured module search path = Default w/o overrides

$ vagrant -v
Vagrant 1.8.5

virtualbox --help
Oracle VM VirtualBox Manager 5.1.8

I can login successfully with ssh from my local machine :

$ ssh root@138.xx.xxx.xx -i ~/.ssh/id_rsa

my ~/.ssh/config file looks like this:

Host 52.30.72.186
User ubuntu
ForwardAgent yes
Host 52.30.72.186
User web
ForwardAgent yes
Host github.com
User git
Port 22
Hostname github.com
IdentityFile ~/.ssh/id_rsa
TCPKeepAlive yes
IdentitiesOnly yes
Host mysite.co.uk
IdentitiesOnly yes
IdentityFile ~/.ssh/id_rsa
ForwardAgent yes

the results of command :

$ ansible staging -m raw -a whoami -u root

mysite.co.uk | SUCCESS | rc=0 >>
root
Shared connection to 138.xx.xxx.xx closed.

I tried this to change known_hosts:

$ ssh-keygen -R 138.xx.xxx.xx
# Host 138.xx.xxx.xx found: line 21
/home/sie/.ssh/known_hosts updated.
Original contents retained as /home/sie/.ssh/known_hosts.old

and this for authorised keys:

$ cat ~/.ssh/id_rsa.pub | ssh root@138.xx.xxx.xx “cat >> ~/.ssh/authorized_keys”

this command results in can_connect :

$ ssh -o PasswordAuthentication=no root@138.xx.xxx.xx “echo can_connect” || echo cannot_connect

I have tried this also, ‘no hosts matched’ :

$ ansible 138.xx.xxx.xx -m ping -i hosts/staging -u root -vvvv
Using /home/sie/Sites/wbba/trellis/ansible.cfg as config file
[WARNING]: No hosts matched, nothing to do

Loading callback plugin minimal of type stdout, v2.0 from /usr/local/lib/python2.7/dist-packages/ansible-2.3.0-py2.7.egg/ansible/plugins/callback/init.pyc

my hosts/staging file looks like this:

mysite.co.uk ansible_host=138.68.131.10 ansible_ssh_private_key_file=‘~/.ssh/id_rsa’

[staging]
mysite.co.uk

[web]
mysite.co.uk

any ideas why this is happening? thanks

fullyint · April 30, 2017, 12:49am

Thanks for trying so many steps and reporting them.
With no message about permission denied or unreachable, this doesn’t strike me as an SSH error.

It looks like the SSH connection succeeded, but then dpkg was interrupted. The one prior report of this error was apparently resolved by following the error message instruction to run sudo dpkg --configure -a on the server. You could try that, or because it’s staging, just back up any files and DB you need and rebuild the server fresh.

Given that you’re using Ansible 2.3, be sure you have the updates from roots/trellis#813.

If you are using the host name mysite.co.uk in hosts/staging, be sure you are using a different name in hosts/production so Ansible doesn’t struggle to know which variables to use (example of how this creates a problem in Trellis; discussion at Ansible). For example, you could use the alias staging.mysite.co.uk in hosts/staging.

Note that the server.yml playbook doesn’t need – and will not make use of – the site variable. Only the deploy.yml playbook needs site. You could specify your command more simply:

ansible-playbook -vvvv server.yml -e env=staging

s3w47m88 · October 5, 2017, 7:25am

I am experiencing this issue again as well but on a new site since my original post.

I started this project a couple of weeks ago and had other SSH issues that went unresolved. So today I re-started the project from scratch hoping to bypass that issue. Which worked. But now I’m experiencing this issue.

I re-read my original post, which stated I simply re-ran the command after a little time and it seemingly worked itself out.

And then I took the exact same steps he mentioned above.

I also read several articles in search results including this and this.

My Ansible version is 2.4.0.0, Vagrant is 2.0.0.

I, too, and successfully ssh from my local machine to the server using ssh root@project2.mysite.com

I do not have a ~/.ssh/config

I double checked and cleared my ~/.ssh/known_hosts file of any references to the server.

The first time I ran without verbose. Then I ran verbose output, and maybe my education level is inadequate but I didn’t see anything that explained anything except that it’s failing to connect via SSH.

In total, I’ve spent about two hours re-troubleshooting this issue and I’m unsure of how to debug it any further from here. The issue seems like maybe it’s trying to connect to the server with a user beside root@, such as admin or web - and the script indicates it successfully created them. Yet when I’m on the server and I check /home there are no user folders.

This is my hosts/production:

# Add each host to the [staging] group and to a "type" group such as [web] or [db].
# List each machine only once per [group], even if it will host multiple sites.

[staging]
thewaitstaffteam.theportlandcompany.com

[web]
thewaitstaffteam.theportlandcompany.com`

This is my group_vars/staging/wordpress_sites.yml
# Documentation: https://roots.io/trellis/docs/remote-server-setup/
# wordpress_sites options: https://roots.io/trellis/docs/wordpress-sites
# Define accompanying passwords/secrets in group_vars/staging/vault.yml

wordpress_sites:
  thewaitstaffteam.theportlandcompany.com:
    site_hosts:
      - canonical: thewaitstaffteam.theportlandcompany.com
        # redirects:
        #   - otherdomain.com
    local_path: ../site # path targeting local Bedrock site directory (relative to Ansible root)
    repo: git@gitlab.iteratemarketing.com:theportlandcompany/thewaitstaffteam.com.git # replace with your Git repo URL
    repo_subtree_path: site # relative path to your Bedrock/WP directory in your repo
    branch: master
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: letsencrypt
    cache:
      enabled: false

All of the topics I see that people created about this issue are unresolved - albeit with attempted solutions.

Thanks to anyone who can shed light on this issue.

fullyint · October 5, 2017, 8:55am

@s3w47m88 Your gist shows this:

➜  trellis git:(master) ✗ ssh root@thewaitstaffteam.theportlandcompany.com
Enter passphrase for key '/Users/s3w47m88/.ssh/id_rsa':

If you must interactively enter the password for your ssh key, this suggests you need to load your ssh key’s password into your Mac keychain, and perhaps into your ssh agent generally. For tips and context, this was mentioned to you here, mentioned by you here, by the docs, and by github’s general guide for setting up SSH on your Mac. I don’t believe Ansible accommodates interactive entry of ssh key password.

I think you probably should have an ~/.ssh/config, as mentioned to you previously here:

s3w47m88 · October 10, 2017, 6:03am

ssh-add -k did not resolve this SSH connectivity issue I mentioned above, unfortunately.
ssh-add -k is mentioned in the docs - as you pointed out - but it is not in the logical, or chronological spot. Meaning, when someone, like me, is trying to follow the instructions verbatim (because they’re learning a new system) and a step is missing… it leaves them stranded, and confused. After I posted one of the two links you referenced I submitted a pull request to the docs specifically so others, and myself, wouldn’t get stuck there again in the future. However it was rejected by whoever is in charge of the docs, for reasons unclear to me.
The Github docs also specifically state that you can still connect using your password if you choose, that’s part of why I wasn’t alarmed by seeing the passphrase prompt. I understand that’s different than Ansible, but that’s also why I didn’t find that alert alarming.
Although I did not need to manually create an ~/.ssh/config file with my first site to provision and deploy successfully, this time I created an ~/.ssh/config file according to the documents you linked to. But if I understand them correctly they’re simply stating.
After your last post I re-read all of the links you sent, and almost all of the links they linked to. Deleted my DO server and followed all of the steps again and got the same result.

I see this was unlisted. If the community doesn’t want me to continue to post on this topic for any reason I will stop, please let just me know. But it seems it has not been resolved for me, or the others in this thread and those I linked to previously - so I’m assuming it’s okay to continue the thread.

Thanks.

swalkinshaw · October 10, 2017, 3:12pm

Can you provide any reference to this PR? We’re always trying to improve our docs, but I don’t see any PR related to this.

fullyint · October 10, 2017, 3:41pm

@s3w47m88 You’ll notice in your gist output the trouble centers around the task Check whether Ansible can connect as root. Trellis tries to spare users from having to indicate the SSH user, whether root or the admin_user. Until you’ve sorted out your local machine’s SSH connection issues, you could manually specify the remote user. You’ll see as early as on Ansible’s getting started page (or in output from ansible --help) an example of manually specifying the SSH user via --user=root or -u root:

ansible-playbook server.yml -e env=staging -u root

As you work on resolving your SSH issue, you should google search the error from your verbose output: ERROR! conflicting action statements: raw, async. One search result suggests you may resolve the issue with a clean reinstall of Ansible github.com/ansible/ansible/issues/31116. It appears especially similar because it involves Ansible 2.4 and an ad hoc command (ping, whereas your ad hoc command is raw) plus the async keyword.

Try to uninstall Ansible system files completely (not Trellis project files, just Ansible system files). Then a fresh install, probably using pip. Don’t install using the git clone method, which installs the unstable devel branch.

Aside from that I can’t think of anything as yet undiscussed (in this thread and other communications) so I’m unable to try to help further. If SSH connections were working previously and Trellis really hasn’t changed anything related to SSH recently (nothing related that I can think of), you could think back through changes you’ve made to your local environment that are potentially relevant.

Your posts over time show switching between Ubuntu and Mac, regularly reinstalling big stuff. There’s virtue and bravery in being so dynamic with your dev environment, but it brings with it the risk and responsibility of troubleshooting issues that arise during the constant change.