Dev server freezes after a few minutes

I have a problem with a new project with Trellis. When starting the virtual machine I can work without problems for a few minutes, but then the server freezes. I can not do anything. I can’t shut down the server normally either. The trellis down command should force a shutdown after a while of trying graceful shutdown.

==> default: Attempting graceful shutdown of VM...
==> default: Forcing shutdown of VM...

Has this happened to anyone? What would you look at to debug it?

Hi @aitor,

I bet that’s really helping your workflow :roll_eyes: From the format of the log message, I guess you’re using Vagrant, not Lima?

I would start by:

  • Determine which layer is freezing. Is it the VM itself, or the hypervisor? If you have other VMs running, do they continue to be responsive when the Trellis VM freezes? If the issue lies in at the hypervisor level, please share version information for Vagrant / VirtualBox etc.
  • If VM based, try looking at dmesg from the previous boot. I think Ubuntu has journalctl. So try journalctl -o short-precise -k -b -1. You may be able to spot some obvious entries - OOM or similar.
1 Like

Yes I’m experiencing the same in my latest project using Trellis v1.20.0.
I’m on a M1 MacBook with macOS Ventura 13.2.1 running Vagrant 2.3.4 in Parallels Desktop 18.2.0.

It’s hard to determine when it exactly happens, but it seems to trigger on a theme change during bud dev. I check the VM error logs but there’s nothing in there referring to the time of freeze.

At that point I can only do a vagrant reload -f to reboot and all is good again, but it’s getting a bit frustrating. When it freezes I can still access the database fortunately.

The /etc/hosts/ & /etc/exports are also still in place and correct.

I don’t have other VM’s running at the same time, so how would one check if the issue lies on the hypervisor level?

It’s only happening in this project and I already tried destroying & recreating the VM box multiple times but this issue keeps coming back.

1 Like

Hi @Twansparant,

Thanks for the useful info. It sounds to me like the issue resides on the VM itself, because you have identified a VM-based trigger - bud dev.

This would lead me to suspect an OOM condition, even if MySQL is up after the freeze (the kernel will make decisions about which processes to kill in the event of OOM, and it could decide to leave your mysqld up, but take down your tty / shell / bud).

After restarting the VM after a freeze. See what output you get from:

journalctl -o short-precise -k -b -1

(You can adjust the last arg to look back across multiple boots (-1 is last boot, -2 is boot before last etc.). If you see a reference to killed process, it’s likely a memory issue.

To increase the VM’s memory allocation from within Trellis, try creating a vagrant.default.yml file containing at least:

vagrant_memory: 2048 # Increase VM's memory to 2GiB

You’ll likely need to vagrant destroy && vagrant up for that change to take effect, but perhaps not. It’s been a while since I’ve used Vagrant!

1 Like

Thanks @talss89!
I will definitely try the journalctl next time it freezes! I presume you run this inside the VM after trellis ssh development right?

I already had a vagrant.local.yml in place with:

vagrant_cpus: 4
vagrant_memory: 16384

But I can increase the memory a bit more I guess!

2 Likes

Yep, it should be run inside the VM. You’re accessing the VM’s kernel logs (dmesg).

But, if you’re allocating 16GiB of memory for the VM, that seems plenty, so would suggest the issue isn’t memory related… :thinking:

Will be interesting to see output from journalctl.

journalctl throws hundreds of lines. This is the tail. It seems that cant access the file system.


nfs: server 192.168.56.1 not responding, still trying
Apr 05 09:02:05.210683 espaciosutil kernel: 09:02:05.204436 main     7.0.6 r155176 started. Verbose level = 0
Apr 05 09:02:05.230756 espaciosutil kernel: 09:02:05.227311 main     vbglR3GuestCtrlDetectPeekGetCancelSupport: Supported (#1)
Apr 05 09:04:52.670126 espaciosutil kernel: FS-Cache: Loaded
Apr 05 09:04:52.691576 espaciosutil kernel: FS-Cache: Netfs 'nfs' registered for caching
Apr 05 09:18:02.224277 espaciosutil kernel: nfs: server 192.168.56.1 not responding, still trying
Apr 05 09:24:35.310131 espaciosutil kernel: 09:24:35.303136 control  Guest control service stopped
Apr 05 09:24:35.330100 espaciosutil kernel: 09:24:35.322528 control  Guest control worker returned with rc=VINF_TRY_AGAIN
Apr 05 09:24:35.330141 espaciosutil kernel: 09:24:35.322836 main     Session 0 is about to close ...
Apr 05 09:24:35.330156 espaciosutil kernel: 09:24:35.322912 main     Stopping all guest processes ...
Apr 05 09:24:35.330168 espaciosutil kernel: 09:24:35.322986 main     Closing all guest files ...
Apr 05 09:24:35.350217 espaciosutil kernel: 09:24:35.343446 main     Ended.
2 Likes

That’s interesting - so the VM is failing to communicate with your host machine’s NFS server on 192.168.56.1.

This kind of error would usually suggest a network interruption in the case of a traditional NFS setup (ie. over physical link) or perhaps nfsd is quitting for some reason.

Is there any reason why your network configuration may change or interfaces be stopped / started while you are using the VM? Perhaps VPN software, or similar?

nfsd on OSX seems to log via syslog() so is there anything NFS related in there? Try checking /private/var/log/system.log (on your host machine) for NFS related messages. I’d probably run tail -f /private/var/log/system.log | grep nfs and then try and make a VM crash.

This Vagrant article has some tips on how to troubleshoot NFS issues.

1 Like

There is no system log at that location, but i have VBox errors in the macos console.

Process:               VBoxNetAdpCtl [26067]
Path:                  /Applications/VirtualBox.app/Contents/MacOS/VBoxNetAdpCtl
Identifier:            VBoxNetAdpCtl
Version:               0
Code Type:             X86-64 (Native)
Parent Process:        ??? [26066]
Responsible:           Terminal [4652]
User ID:               0

Date/Time:             2023-04-05 11:01:23.485 +0200
OS Version:            macOS 11.6.8 (20G730)
Report Version:        12
Anonymous UUID:        77A64C8B-4D19-553A-535C-90BB5D7175E9

Sleep/Wake UUID:       50BCBE2C-731B-46E6-AE9A-233004D8CF17

Time Awake Since Boot: 140000 seconds
Time Since Wake:       10000 seconds

System Integrity Protection: disabled

Crashed Thread:        0

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Reason:    DYLD, [0x1] Library missing

Application Specific Information:
dyld: launch, loading dependent libraries

Dyld Error Message:
  dyld: Using shared cache: 95795624-443A-32E2-8DD6-6020A71D9F1E
Library not loaded: @rpath/VBoxRT.dylib
  Referenced from: /Applications/VirtualBox.app/Contents/MacOS/VBoxNetAdpCtl
  Reason: image not found

Binary Images:
       0x106e65000 -        0x106e68fff +VBoxNetAdpCtl (0) <F8BE37B0-AF22-312C-AB44-31A408D41E97> /Applications/VirtualBox.app/Contents/MacOS/VBoxNetAdpCtl
       0x1074a0000 -        0x10753bfff  dyld (852.2) <E20C43E3-CEB9-397A-9056-11A6A8D3F86E> /usr/lib/dyld
    0x7fff22b0a000 -     0x7fff22bb8fff  com.apple.framework.IOKit (2.0.2 - 1845.120.6) <61CDC69E-EC9E-32B1-8A63-532619BA310F> /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit

You may also find the OSX syslog at /var/log/system.log

This might be a permissions issue. It’s probably a good time to reinstall VirtualBox and see if that fixes things. If not - does /Applications/VirtualBox.app/Contents/MacOS/VBoxRT.dylib exist and who owns / what permissions?

1 Like

I can’t reach a file inside application content with the terminal (I don’t know how), but I can use the GUI to see it, is that enough? In file is there and has read and write permission for system.

The first time the bug appeared I reinstalled VBox updating to the latest version and the problem persists.

EDIT: I reached the file opening application contents with an IDE. These are its permissions:

-rw-r--r-- 1 root admin 7,9M 11 ene 17:55 VBoxRT.dylib

Also, there is no entries for NFS in /var/log/system.log. I did

less /var/log/system.log | grep nfs

with empty results.

It’s often not a good thing to recommend a different technology to solve a problem with another one… but it might be worthwhile giving trellis-cli’s new VM integration a try because it entirely avoids NFS, which is a very common issue with VMs.

See Local Development | Trellis Docs | Roots for more info. There’s a section on migrating as well; so you don’t need to fully commit to moving off Vagrant. You can try out both (just not at the exact same time).

2 Likes

Thank you. That sounds good. I have tried to use it but it does not support my OS (Big Sur). I’m having trouble upgrading to Ventura on my hackintosh. I’ll have to dedicate a few days to that matter, finally.

1 Like

Thanks for your reply! I had to wait a bit before another freeze happened :slight_smile:
After rebooting I ran the journalctl command but I can’t see any killed process lines.
The freeze just happened around 11:17 and the last line of the last boot was at 10:12.

The system.log also doesn’t show any info about nfs when running:
tail -f /private/var/log/system.log | grep nfs

In /Library/logs/parallels.log I only see this line around the time of freeze:

04-06 11:17:40.759 F /pvsHostInfo:620:1d28/ hw.cpufrequency error = 2

But that hw.cpufrequency error is in there a lot.
Not sure what else to check?

Just another freeze, this time the journalctl has this at the end:

Apr 06 11:23:27.761941 ona kernel: FS-Cache: Loaded
Apr 06 11:23:27.777950 ona kernel: FS-Cache: Netfs 'nfs' registered for caching
Apr 06 11:27:04.054052 ona kernel: nfs: server 10.211.55.2 not responding, still trying

So similar as @aitor’s.
But I’m using Parallels Desktop and not Virtualbox?

In /etc/hosts this project is the only one that has this line added after:

## vagrant-hostmanager-end
10.211.55.2   	example.test.shared example.test #prl_hostonly shared

Which matches with the IP address not responding?

Now the /Library/Logs/parallels.log has a few of these lines:

04-06 11:59:43.115 F /HostUtils:620:1d28/ Device [AppleSDXCBlockStorageDevice]: descendant media not found

Nothing changes at this point, no network change, no VPN change, nothing…

/sbin/nfsd has full disk access on my Mac.

I’m facing the same problem and have to perform a vagrant reload every 30 mins or so. Unfortunately, Lima isn’t an option for me because I can’t run Ventura on my Late 2015 iMac :-/

On top of that, I’m not able to connect to the ipv4 address either.

I’m experiencing this same issue. After firing up the vagrant, it will work ok for 10 minutes, then it freezes.

I’ve tried increasing the resources to the vm which did not help.

  • M2 MBP (fully loaded)
  • Ventura 13.3
  • Latest Trellis/Trellis-CLI/Bedrock/Sage
  • Latest Parallels
  • Latest version of Vagrant
  • Brand new project

Switching to Lima resolved the freezing issues for me

Did anyone find a solution/fix for this in the end?
This keeps happening for me, for this one particular project only with trellis version 1.20.0.
I destroyed and re-created my dev box, but same thing unfortunately…