Is the restart of PHP-FPM on each deploy necessary?

charlestroluxe · November 8, 2021, 1:14pm

One of the last things that Trellis does on a deploy is restart the PHP-FPM service. Is there a reason it has to do this, or is it just good housekeeping? We have a number of Trellis-deployed sites on the same server, so, when we deploy updates to one of them, there is the potential for anyone who is logged into the back end of any of them to see a temporary Bad Gateway error while PHP-FPM comes back up. If it’s imperative, we’ll work around it; if not, I was flirting with disabling the restart.

strarsis · November 8, 2021, 4:28pm

php-fpm is actually reloaded and not restarted:

github.com

roots/trellis/blob/17430191bb7211545eb63ba3ba989ee95c262c5f/roles/deploy/hooks/finalize-after.yml#L36

    
      
              when: project.update_db_on_deploy | default(update_db_on_deploy) and project.multisite.enabled | default(false)
          
          
  - name: Update WP database
              command: wp core update-db {{ project.multisite.enabled | default(false) | ternary('--network', '') }}
              args:
                chdir: "{{ deploy_helper.current_path }}"
              when: project.update_db_on_deploy | default(update_db_on_deploy)
          
          
  when: wp_installed.rc == 0
          
          
- name: Reload php-fpm
            shell: sudo service php{{ php_version }}-fpm reload
            args:
              warn: false

Reloading should (if all works well) gracefully end all existing connections and seamlessly hand over to new php-fpm processes.

charlestroluxe · November 8, 2021, 4:49pm

Hmm, you’re right. It does reload. And yet, if I’m clicking around in the backend when it does it, I get a 502. And if I look at the NGINX error.log at that time, I see a series of these:

2021/11/08 08:14:18 [error] 23955#23955: *12105796 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: <redacted>, server: <redacted>, request: "GET /example/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm-wordpress.sock:", host: "example.com", referrer: "https://example.com/example/"

Given what you’re saying, though, this shouldn’t happen?

charlestroluxe · November 8, 2021, 4:59pm

I’m assuming that I should be increasing the process_control_timeout variable here. But I notice that it’s not in the roles/wordpress_setup/templates/php-fpm.conf.j2 file by default. Obviously, I could add it in, and then define it, but before I do I’d like to know there’s not a good reason it was left out in the first instance.

strarsis · November 8, 2021, 8:03pm

Good question. From what I read and understand is that php-fpm had this issue primarily with PHP 5.x.
You are quite probably using php-fpm 7.x or even 8.x so this issue is still there.

Without bugs and unexpected side-effects, a server reload should not result in terminated or failed connections.

charlestroluxe · November 11, 2021, 11:28pm

I changed the process_control_timeout setting to 10 seconds, and this problem went away. The default is 0.

You can find this setting in /etc/php/7.4/fpm/php-fpm.conf (assuming you’re using 7.4; if not substitute accordingly).

The entry should look like this:

; Time limit for child processes to wait for a reaction on signals from master.
; Available units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
process_control_timeout = 10

strarsis · November 12, 2021, 12:23pm

Does this improve the server performance or stability?
If so, it should be be added to Trellis.

charlestroluxe · November 12, 2021, 1:31pm

I’ve noticed no difference to the performance, although it’s early. I have noticed that it prevents users who are logged into the backend from getting 502s during deploys, which is what I was trying to achieve.

strarsis · November 12, 2021, 1:44pm

Those HTTP 502s were directly shown in user browsers/clients, disrupting the experience/operations?
If this is the case, it may make sense to increase that value from 0 in the defaults.

charlestroluxe · November 12, 2021, 2:01pm

That’s correct. They also showed up in the logs at the exact same time, in this format:

2021/11/08 08:14:18 [error] 23955#23955: *12105796 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: <redacted>, server: <redacted>, request: "GET /example/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm-wordpress.sock:", host: "example.com", referrer: "https://example.com/example/"

strarsis · November 12, 2021, 2:46pm

If you like, you could create a new issue in the Trellis repository for changing the default value to something non-zero (e.g. 10 as you ended up with) in order to fix these kinds of issues.

swalkinshaw · November 18, 2021, 3:42am

For reference, PHP-FPM needs to be reloaded so PHP recognizes the new underlying path since Trellis creates a new release folder for each deploy and updates the current symlink to point to that one.

system · December 20, 2021, 1:15pm

This topic was automatically closed after 42 days. New replies are no longer allowed.