502 after Acorn update

abel-sch · March 3, 2023, 1:09pm

Hi!

I have recently updated Acorn to v3, which seemed to be going smoothly.

However… after a composer update yesterday my project started returning 502’s locally. I have reverted these changes and re-installed the composer dependencies thinking this would be fixed, but it did not.

Now I have fallen deep into a hole where even going back in git history where the project most definitely worked, installing all dependencies and destroying / building the vagrant box, I can’t get it back up. Running wp acorn optimize:clear does not seem to be doing anything for me.

I have had success reaching /wp/wp-admin by removing ACF-composer, which led me to believe Acorn & ACF composer might not be playing nicely anymore. Weirdly I narrowed it down to a Field with a flexible content block where I load layouts like this: ->addLayout($this->get(ExternalLinks::class)) wherever these layouts has a group with a repeater inserted.

I know this is really specific, could this maybe me a memory issue?

I find it strange I’m getting 502’s instead of a useful PHP error instead I have to do with 2023/03/03 14:04:14 [error] 858#858: *254 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.56.1, server: staatvandeuitvoering.test, request: "POST /wp/wp-admin/admin-ajax.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm-wordpress.sock:", host: "staatvandeuitvoering.test", referrer: "https://staatvandeuitvoering.test/wp/wp-admin/edit.php?post_type=page" in the error.log.

talss89 · March 3, 2023, 1:26pm

This is a strange one isn’t it! I assume you’re running Nginx from the log format.

recv() failed (104: Connection reset by peer) while reading response header from upstream

This indicates that the PHP-FPM upstream is unreachable via unix socket /var/run/php-fpm-wordpress.sock. The reason you’re getting a 502 Bad Gateway is because the socket is the gateway (aka. upstream), and it’s unavailable.

It could well be memory related. Here’s what I’d check:

Is PHP-FPM running?
Is PHP-FPM listening on /var/run/php-fpm-wordpress.sock?
What do the PHP-FPM logs report?
Can you see any OOM / kill events logged via dmesg | egrep -i 'killed process'?

The composer update and ACF issues could well be leading you down the wrong path, even though retracing your steps is definitely a good starting point. Now you’ve rolled back via GIT and recreated the VM, I’d attack using the above process.

abel-sch · March 3, 2023, 2:26pm

Thanks for your suggestions!

PHP-FPM seems to be running fine:

● php8.0-fpm.service - The PHP 8.0 FastCGI Process Manager
     Loaded: loaded (/lib/systemd/system/php8.0-fpm.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-03-03 15:13:37 CET; 3min 43s ago
       Docs: man:php-fpm8.0(8)
    Process: 79087 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
   Main PID: 75287 (php-fpm8.0)
     Status: "Processes active: 0, idle: 3, Requests: 15, slow: 0, Traffic: 0req/sec"
      Tasks: 4 (limit: 4619)
     Memory: 181.4M
     CGroup: /system.slice/php8.0-fpm.service
             ├─75287 php-fpm: master process (/etc/php/8.0/fpm/php-fpm.conf)
             ├─79405 php-fpm: pool wordpress
             ├─79406 php-fpm: pool wordpress
             └─79408 php-fpm: pool wordpress

I believe php-fpm is listening to /var/run/php-fpm-wordpress.sock as I can make the 502’s disapear by commenting out the fields. How do I check this?

The PHP-FPM logs show:
[03-Mar-2023 15:17:58] WARNING: [pool wordpress] child 79469 exited on signal 11 (SIGSEGV - core dumped) after 2.943923 seconds from start

dmesg | egrep -i 'killed process' does not return anything

talss89 · March 3, 2023, 2:33pm

Take a look at /etc/php/8.0/fpm/php-fpm.conf - this is the configuration for your 8.0 PHP process manager, although if commenting out your ACF fields fixes the issue, then PHP-FPM is listening correctly.

SIGSEGV is a segmentation fault signal sent by the kernel (call stack exhausted), and is probably caused by infinite recursion, which points back to your ACF configuration. Could you share your flexible content block code?

abel-sch · March 3, 2023, 2:57pm

I desperately updated Trellis (was on 1.14.0) to the latest release (1.20). Really have no clou on what could’ve caused, or solved, this at all but it’s back up.

Thanks for your support @talss89