Gzip: stdout: No space left on device

mZoo · September 6, 2018, 12:39am

Re-Provisioning a small DO Droplet, TASK [Install Python 2.x] failed with the following output:

    "gzip: stdout: No space left on device", 
    "E: mkinitramfs failure find 141 cpio 141 gzip 1", 
    "update-initramfs: failed for /boot/initrd.img-4.4.0-133-generic with 1.", 
    "dpkg: error processing package initramfs-tools (--configure):", 
    " subprocess installed post-installation script returned error exit status 1", 
    "No apport report written because MaxReports is reached already", 
    "Errors were encountered while processing:", 
    " linux-image-4.4.0-134-generic", 
    " linux-image-extra-4.4.0-134-generic", 
    " linux-image-generic", 
    " linux-generic", 
    " linux-image-extra-4.4.0-133-generic", 
    " initramfs-tools", 
    "E: Sub-process /usr/bin/dpkg returned an error code (1)"

I was able to to allow provisioning by sshing into the server and running the following manually:

sudo apt autoremove
sudo dpkg --configure -a
sudo apt autoclean
sudo apt-get update && sudo apt-get upgrade

I also found threads about creating more space in the server’s boot drive by removing unused linux headers, checking the currently used header:

uname -r

Then listing all the headers with either:

ls /boot

or

dpkg --list | grep linux-image

I didn’t end up doing either of those and still have a bunch of headers sitting around:

$ dpkg --list | grep linux-image
ii  linux-image-4.4.0-116-generic       4.4.0-116.140                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-119-generic       4.4.0-119.143                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-121-generic       4.4.0-121.145                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-124-generic       4.4.0-124.148                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-127-generic       4.4.0-127.153                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-128-generic       4.4.0-128.154                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-130-generic       4.4.0-130.156                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-133-generic       4.4.0-133.159                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-134-generic       4.4.0-134.160                                            amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-4.4.0-87-generic        4.4.0-87.110                                             amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-extra-4.4.0-116-generic 4.4.0-116.140                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rH  linux-image-extra-4.4.0-119-generic 4.4.0-119.143                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-extra-4.4.0-121-generic 4.4.0-121.145                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-extra-4.4.0-124-generic 4.4.0-124.148                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-extra-4.4.0-127-generic 4.4.0-127.153                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-extra-4.4.0-128-generic 4.4.0-128.154                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-extra-4.4.0-130-generic 4.4.0-130.156                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-extra-4.4.0-133-generic 4.4.0-133.159                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-extra-4.4.0-134-generic 4.4.0-134.160                                            amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
rc  linux-image-extra-4.4.0-87-generic  4.4.0-87.110                                             amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-generic                 4.4.0.134.140                                            amd64        Generic Linux kernel image

I would appreciate any feedback from the Trellis userbase.

Thanks and happy almost Autumn.

MWDelaney · September 6, 2018, 12:45am

How recent is your Trellis install?

Did you rebuild the droplet from scratch, or are you just trying to retrovision it with new settings?

Is there space left on the device?

mZoo · September 6, 2018, 12:54am

About a year old:

Virtualbox >= 4.3.10
Vagrant >= 2.0.1

Just trying to retrovision it with new settings. Actually by s3 backups stopped working a few weeks ago so trying to address that.

To be honest, I’m not totally sure, but I think so:

$ df -h
Filesystem                Size  Used Avail Use% Mounted on
udev                      981M     0  981M   0% /dev
tmpfs                     200M   24M  177M  12% /run
/dev/mapper/mps--vg-root   96G  6.8G   85G   8% /
tmpfs                    1000M     0 1000M   0% /dev/shm
tmpfs                     5.0M     0  5.0M   0% /run/lock
tmpfs                    1000M     0 1000M   0% /sys/fs/cgroup
/dev/vda1                 472M  169M  279M  38% /boot
tmpfs                     200M     0  200M   0% /run/user/1002
tmpfs                     200M     0  200M   0% /run/user/1000

Is ansible 2.4.3.0 compatible with Trellis these days? I’m having trouble confirming that.

swalkinshaw · September 6, 2018, 2:16am

Looks like you did some good Googling. You’ll want to remove some of those old headers though.

I still think it’s because /boot doesn’t have enough room (even though it’s not full). Maybe it tries to install, runs out of room, cleans up and fails.

mZoo · September 6, 2018, 3:55am

Did a bunch of these:

sudo apt-get purge linux-image-extra-4.4.0-128-generic

Then

sudo apt-get autoremove
sudo update-grub

This was a useful gist.

I think there’s a way to specify an array of kernels to purge, but not sure where I had seen it.

Something like this maybe.

NOTE: Had also seen a recommendation to leave at least two or three kernels on there in case you need to boot from one of the others.

mZoo · April 9, 2020, 7:35pm

Hey dear brothers.

I know this an old thread, however “disc full” has reared it’s ugly head again in a recent trellis-provisioned DO Ubuntu 18 server and I’m hoping to prevent this site-breaking issue from cropping up again.

When I first logged into the server, what I believe is referred to as the root directory was full:

/dev/vda1         25G   25G     0 100% /

Running any process that would modify data returns: No space left on device

Rebooted the system and was able to purge some of the linux-extra images (as noted above).

And I have freed up 15G:

Filesystem      Size  Used Avail Use% Mounted on
udev            480M     0  480M   0% /dev
tmpfs            99M  612K   98M   1% /run
/dev/vda1        25G  9.5G   15G  40% /
tmpfs           493M     0  493M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           493M     0  493M   0% /sys/fs/cgroup
/dev/vda15      105M  3.6M  101M   4% /boot/efi
tmpfs            99M     0   99M   0% /run/user/1001

Would someone please enlighten me on what is going on here and what would be a good way to

Determine the cause
Prevent it from happening again.

Thanks much and please stay safe.

mZoo · April 10, 2020, 6:17pm

@swalkinshaw If you can spare a few minutes, I would love to get some of your wisdom on this issue. Thank you much and peace.

swalkinshaw · April 10, 2020, 6:19pm

I don’t have any great answers since I don’t know the root cause. But you need to figure out where the space is being taken up. Here are some ideas: https://unix.stackexchange.com/questions/125429/tracking-down-where-disk-space-has-gone-on-linux

mZoo · April 10, 2020, 6:20pm

Thank you, man. Will report back.

UPDATE

Discovered that one of my colleagues had committed a node_modules directory (which is of course huge) to the repository, which was adding about 0.3 gig to the server with every new deploy (up to 6).

This command set was useful (20 largest files or dirs):

du -ax / | sort -rn | head -20

Current dir only:

du -ax . | sort -rn | head -20

Add node_modules/ to top level repository directory .gitignore and then ran

git rm -r --cached ../example.com/web/app/themes/custom-theme/node_modules

Manually removed a few recent releases rm -rf 20200456787 and freed up a few gigs of space.

mZoo · April 13, 2020, 6:27pm

Still struggling with this problem on an Ubuntu 18 DO droplet.

/dev/vda1 25G 11G 14G 43% /

is growing by about a Gig a day!

I am checking the volume for errors:

fsck -nf /dev/vda1

fsck from util-linux 2.31.1

e2fsck 1.44.1 (24-Mar-2018)

Warning! /dev/vda1 is mounted.

Warning: skipping journal recovery because doing a read-only filesystem check.

Pass 1: Checking inodes, blocks, and sizes

Deleted inode 92 has zero dtime. Fix? no

Inodes that were part of a corrupted orphan linked list found. Fix? no

Inode 311 was part of the orphaned inode list. IGNORED.

Inode 1439 was part of the orphaned inode list. IGNORED.

Inode 1628 was part of the orphaned inode list. IGNORED.

Inode 1630 was part of the orphaned inode list. IGNORED.

Pass 2: Checking directory structure

Pass 3: Checking directory connectivity

Pass 4: Checking reference counts

Pass 5: Checking group summary information

Free blocks count wrong (4095, counted=3629222).

Fix? no

Inode bitmap differences: -92 -311 -1439 -1628 -1630

Fix? no

Free inodes count wrong (2003575, counted=2821699).

Fix? no

cloudimg-rootfs: ********** WARNING: Filesystem still has errors **********

cloudimg-rootfs: 1222025/3225600 files (0.0% non-contiguous), 6521084/6525179 blocks

To actually repair need to unmount:

# fsck /dev/vda1

fsck from util-linux 2.31.1

e2fsck 1.44.1 (24-Mar-2018)

/dev/vda1 is mounted.

e2fsck: Cannot continue, aborting.

Now I am going to see if I can unmount the volume, which will require killing all the processes that are running on it, so that it won’t return umount: /: target is busy.

I guess there are a few ways of doing this.

lsof returns all of the trellis-managed processes (and then some).

I have read that orphaned Inodes problems are relatively benign, stemming from aborted processes which had generated an inode without getting the removing it.

UPDATE

Ran du -ax / | sort -rn | head -20 again and noticed that /tmp is looking rather large.

In it are the past four days worth of backup files, which I believe are generated by our backup-to-aws script.

-rw-r--r-- 1 web www-data 405061007 Apr 10 12:01 example.com-staging-2020-04-10-0800.tar.gz

1.3 gigabytes each! Just about the difference in growth of the volume! It seems like those tmp files aren’t being deleted.

Check the cron logs: grep CRON /var/log/syslog. Not very informative.

I’m going to try running the bash script manually (as web via sudo su web), line by line and see what happens. Needs to be done from current/scripts/.

Aha! bash: /usr/local/bin/aws: No such file or directory.

Which reveals a potential problem with my bash script, because if one of the items in that long list of task && task && fails, it dies silently.

For now, running ansible-playbook server.yml -e env=staging --tags aws-cli provisioned the server with the required application.

I manually removed the offending /tmp files.

df -h is still reporting an increased size to /dev/vda1, but I’m hoping that is just becuase the space in /tmp hasn’t actually been freed up yet.

strarsis · April 13, 2020, 6:34pm

The ncdu command is similar to the TreeSize tool for Windows.

mZoo · April 13, 2020, 7:31pm

Indeed after a reboot, the memory is freed up.