Trellis hanging on WP install

A longstanding Trellis project of mine has started failing to resolve in the browser when running Vagrant. While troubleshooting, it seems to be consistently sticking on the WP Install task.

I first noticed it after running the usual trellis up, which completed without errors. But then my local dev site wouldn’t resolve, resulting in a timeout. I tried in different browsers, thinking maybe it was a simple caching issue, but it persisted. After reviewing the latest releases, I updated Trellis (it was only one version behind) from 1.20.0 to 1.20.1. Then I ran trellis up again, followed by trellis provision development. This ran up until
TASK [wordpress-install : Create web root of sites] ****************************,
at which point it stalled and I eventually killed the process.

Rewinding further, I did a vagrant destroy followed by trellis up. Here the process succeeds up until the Install WP task

TASK [wordpress-install : Install Dependencies with Composer] ******************
ok: [default] => (item=modernadventure.com)

TASK [wordpress-install : Install WP] ******************************************

at which point it just hangs indefinitely (I’ve left it for over ten minutes with no movement).

If I ssh into the half-completed vm I see the message System information disabled due to load higher than 1.0 - probably due to the provisioning fail?

I recently updated my host machine to Ventura 13.2.1 but I’m not 100% sure they’re related. My issue is similar to the one reported here: What version of VirtualBox is everyone running

Wondering if anyone has any hints?

Trellis 1.20.1
trellis-cli 1.10.1
Vagrant box Ubuntu 18.04.6

trellis check
Checking Trellis requirements...

Required:

[✓] Python [>= 3.8.0]: 3.9.7

Optional:

[✓] Vagrant [>= 2.1.0]: 2.3.4
[✓] VirtualBox [>= 4.3.10]: 7.0.6r155176

This solved my problems and it was really easy to get set up: Introducing Lima to Trellis for Faster Local Development and https://roots.io/introducing-lima-to-trellis-for-faster-local-development/

Thanks @toddsantoro, I just tried it with Lima, but with less success than you I’m afraid. Running trellis vm start results in a hang at
TASK [wordpress-install : Install Dependencies with Composer] ******************

..."msg": "Composer could not find a composer.json file in /srv/www/site.com/current To initialize a project, please create a composer.json file...

But I do hav a file at site/composer.json - it’s the same one I’ve been using with Vagrant when it was working fine before. So perhaps Composer is at the heart of this, but I still can’t figure out why it’s not happy.

Try deleting the entry in your hosts file. sudo nano /etc/hosts

If that doesn’t work back up your local db, your theme should be on version control if not get a copy of those including your plugins and uploads. Save all that then blow the project up. Do everything like normal but not trellis up do trellis vm start

Was the project created with trellis new?

In your first attempt + error with Vagrant, it’s always best to follow our golden rule of debugging and try to run the command manually on the VM to get more output.

In the second attempt with Lima, I’d suggest at least verifying that /srv/www/site.com/current contains your local site directory. In either Vagrant or Lima, it has to share/sync your local directory onto the VM. If that failed for whatever reason, then composer.json definitely wouldn’t be there.

Thank you for reminding me of that troubleshooting page @swalkinshaw - there’s a lot of helpful tips in there . The project was not created with trellis new, but I have been using trellis-cli and have run trellis init.

I’ve been pressing ahead on the Lima front and have a few more clues:
When I run trellis vm shell and cd /srv/www/site.com/current/web that directory does exist, but it is empty except for the .env file, which looks complete. It’s not clear to me what command I should run in the VM to try to make the rest of the sync happen.

I did notice that shortly before the
TASK [wordpress-install : Install Dependencies with Composer] ******************

failure, I see this warning:

TASK [wordpress-install : include_tasks] ***************************************
[WARNING]: TASK: wordpress-install : include_tasks: The loop variable 'site' is
already in use. You should set the `loop_var` value in the `loop_control`
option for the task to something else to avoid variable collisions and
unexpected behavior.
included: /Users/sam/Code/site/trellis/roles/wordpress-install/tasks/composer-authentications.yml for default => (item=(censored due to no_log))

… which seems like it could be related to this post: Composer Issue with HTTP auth. I tried changing the site variable to site2 to avoid conflicts as suggested in the thread, and now when I run trellis vm delete and trellis vm start the wordpress-install Task fails at the Composer auth setup step:

TASK [wordpress-install : Setup composer authentications (HTTP Basic) - site.com] ***
failed: [default] (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
fatal: [default]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}

I am using packagist.com with a local auth.json file that holds the credentials. My composer.json is likewise set up to use my Packagist.com:

  "repositories": [
    {
      "type": "composer",
      "url": "https://repo.packagist.com/my-org/"
    },
    {
      "packagist.org": false
    }
  ],

:thinking: I encountered something similar when the root CA certs were missing on the system, hence HTTPS connection were not trusted.

Can you wget/curl https://packagist.org from within the target system?

Thank you everyone for the suggestions so far. Still struggling with this - neither Lima nor Vagrant will come through completely. In vagrant, after I kill the process when it inevitably hangs at [wordpress-install : Install WP], I can then ssh into the incomplete vm and cd to srv/www/site.com. However, even running ls from that directory hangs, so I’m unable to investigate its contents. As such, I am unable to manually run the wp core install... command since I can’t get to the web root.

I have cleared any possibly conflicting records from /etc/hosts and ~/.ssh/config.
I’ve also run vagrant box update, with no change in result.

On the Lima side, a clean trellis vm start is still failing at Composer auth. I edited trellis/roles/wordpress-install/tasks/composer-authentications.yml to allow logging, and got this

TASK [wordpress-install : Setup composer authentications (HTTP Basic) - site.com] ***
failed: [default] (item=default-type.repo.packagist.com) => {"ansible_loop_var": "item", "changed": false, "item": {"hostname": "repo.packagist.com", "password": "redacted", "username": "redacted"}, "msg": "In ConfigCommand.php line 217: File \"./composer.json\" cannot be found in the current directory config [-g|--global] [-e|--editor] [-a|--auth] [--unset] [-l|--list] [-f|--file FILE] [--absolute] [-j|--json] [-m|--merge] [--append] [--source] [--] [<setting-key> [<setting-value>...]]", "stdout": "\nIn ConfigCommand.php line 217:
File \"./composer.json\" cannot be found in the current directory                                                                
config [-g|--global] [-e|--editor] [-a|--auth] [--unset] [-l|--list] [-f|--file FILE] [--absolute] [-j|--json] [-m|--merge] [--append] [--source] [--] [<setting-key> [<setting-value>...]]\n\n", "stdout_lines": ["", "In ConfigCommand.php line 217:", "                                                                   ", "  File \"./composer.json\" cannot be found in the current directory  ", "                                                                   ", "", "config [-g|--global] [-e|--editor] [-a|--auth] [--unset] [-l|--list] [-f|--file FILE] [--absolute] [-j|--json] [-m|--merge] [--append] [--source] [--] [<setting-key> [<setting-value>...]]", ""]}

I double-checked and I have the correct credentials set up in trellis/group_vars/ according to these instructions.

@strarsis I am able to trellis vm shell and run curl https://packagist.com successfully. I haven’t seen any certificate errors in either vagrant or lima logs FWIW.

Now I’m thinking the problem lies in the lack of directory sync. The Composer task is trying to run without a composer.json file, and I can clearly see that site.com/current/web exists (in Lima at least), but it’s completely empty. Unfortunately I’m not sure what the sync command itself looks like, and there are no errors in the vm provisioning process to help guide me.

Two things:

  1. Vagrant: if even running ls hangs then yeah obviously you can’t really debug and something must be quite wrong.
  2. Lima: if your site directory is not shared on the VM, then something is also fundamentally wrong and nothing can be done until that’s resolved.

Lima (and Vagrant) use local_path in group_vars/development/wordpress_sites.yml as the shared folder on the VM. So as a sanity check, I’d verify that:

  • it’s set properly in the site config
  • it exists on your local machine and contains a Bedrock based site
  • run cat .trellis/lima/[name of site].yml and verify those paths match up in mounts

Agreed @swalkinshaw, something weird is going on.

My group_vars/development/wordpress_sites.yml seems ok:

wordpress_sites:
  mysite.com:
    site_hosts:
      - canonical: mysite.test
    local_path: ../site # path targeting local Bedrock site directory (relative to Ansible root)
    admin_email: [redacted]
    multisite:
      enabled: false
    ssl:
      enabled: true
      provider: self-signed
      hsts_max_age: 0
    cache:
      enabled: false
    xmlrpc:
      enabled: false

And here’s the Lima
cat .trellis/lima/mysite.com.yml:

vmType: "vz"
rosetta:
  enabled: false
images:
- location: https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
  arch: x86_64
- location: https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-arm64.img
  arch: aarch64

mounts:
- location: /Users/sam/Code/mysite/site
  mountPoint: /srv/www/mysite.com/current
  writable: true

mountType: "virtiofs"
networks:
- vzNAT: true
portForwards:

containerd:
  user: false
provision:
- mode: system
  script: |
    #!/bin/bash
    echo "127.0.0.1 $(hostname)" >> /etc/hosts

I thought maybe destroying/rebuilding the vm would fix it but I’ve done it plenty now in both vagrant and lima and the result is consistent.

Dumb question, but have you rebooted?

Lima support is fairly new so I wouldn’t be too surprised if something went wrong there, but with Vagrant not working as well that might be connected.

I’m assuming /Users/sam/Code/mysite/site is just on your local HDD and not an external drive?

I have rebooted a few times. And my directory at Users/sam/Code/mysite/site is my laptop HDD.

This project was running fine (Vagrant only, hadn’t tried Lima yet) before this week. The last big change before the problems started was I upgraded macOS from Monterey to Ventura. :person_shrugging:

I’m a bit stumped. One random idea: can you delete your entire project dir and re-clone it? Assuming it’s all been pushed to a Git repo.

I think I may have finally zeroed in on the cause: Ubuntu 18.04. When I originally started trying with Lima, I was targeting Ubuntu 18.04 (to match the existing remote server) by setting my trellis-cli config file. So in trellis/.trellis/cli.yml I had

vm:
  ubuntu: 18.04

as outlined in the trellis-cli PR for Lima.

Well, I tried simply removing this file, which results in the default behavior of the guest machine being built on the latest Ubuntu 22.04. And voilà - the provisioning completed, and for the first time my site files were synced across host and guest machines!

I also tested this on a fresh project.

I created a brand new project with trellis new, and running trellis vm start with default settings completed fine: I could reach my new vanilla WP install in the browser. Then I ran trellis vm stop and trellis vm delete, added the Ubuntu 18.04 declaration in cli.yml, and ran trellis vm start for a fresh provision. Sure enough, it failed in exactly the same way as I have been experiencing, with folders failing to mount in the guest machine.

I have no idea why Ubuntu 18.04 seems to be breaking both Vagrant and Lima in this way. But it could explain how @toddsantoro had a different experience, after we both reported the same initial problem with WP Install hanging. Possibly he was on an older Ubuntu version as well, and upon switching to Lima it quietly switched boxes to Ubuntu 22.04, which is the default, and which works?

Curious if anyone can verify this as a legit issue.

2 Likes

Interesting! Thanks for all the debugging and reporting back. And I’m really happy that you got it working in some form at least.

Off the top of my head, I can’t think of any reason why 18.04 shouldn’t work. I assume lots of people are still using it, and Trellis hasn’t removed support or anything. Vagrant and Lima use different base VM images as well. So they’re both “Ubuntu 18.04” but there’s slight differences in how they build/prepare the OS images.

1 Like

That makes sense re: the differences between Vagrant and Lima. In fact, I’m now finding that using Ubuntu 22.04 is only fixing my syncing issues when using Lima - I tried using Ubuntu 22.04 with Vagrant and got the same issue as before.

I’m not quite fully out of the woods with Lima yet either. Although the provisioning is completing, and the files are being mounted on the guest, I’m not yet able to access my dev site at https://mysite.test. In my group_vars/development/wordpress_sites.yml I have

    ssl:
      enabled: true
      provider: self-signed

But when I hit https://mysite.test in the browser I get the ERR_CERT_AUTHORITY_INVALID message. If I bypass the warning then I get a 404 Not Found. I tried ssh-ing in and manually restarting nginx, but it made no difference.

Update: if I look in ~/.lima/mysite.com/ha.stderr.log I see a lot of errors like this

{"level":"debug","msg":"Stopping udp proxy (read udp 72.14.183.39:123: i/o timeout)","time":"2023-03-23T12:18:06-07:00"}
{"level":"error","msg":"r.CreateEndpoint() = no route to host","time":"2023-03-23T12:18:38-07:00"}
1 Like

Not sure about those Lima errors, but if you can connect via HTTP at all then I don’t think it’s related. There were no errors during provisioning? If it’s a 404 not found I’d check Nginx access/error logs. Double check what the root path is set to in the Nginx site config and verify everything is where you’d expect it to be. It should point to /srv/www/mysite.com/current/web

Turns out my difficulties accessing the dev site after provisioning with Lima were just a silly oversight on my part. I eventually managed to bypass the cache and reach the WP login screen, and then I was off to the races. The errors in the Lima logs don’t seem to be affecting my setup after all.

So after all that, I’m up and running locally with Lima and Ubuntu 22.04! I wish I could say with more certainty what caused it but all I really know is that Vagrant stopped working properly after I upgraded to macOS Ventura, and that Lima doesn’t seem to be playing nicely with Ubuntu 18.04.

Thank you @swalkinshaw and everyone who chimed in with suggestions :pray:

1 Like