504 Time-out - upstream timed out

masoninthesis · January 24, 2018, 6:35am

Hey guys,

A high traffic site that I haven’t touched for about 4 months recently started spitting 504 Gateway Time-out Errors.

Problem is that I don’t have the machine I originally developed this site on, so I’m trying to fix it manually before I get Trellis all setup again.

Here is my error log.

The error is:

upstream timed out (110: Connection timed out) while reading response header from upstream, client: 75.169.196.167, server: 138.68.238.85, request: "POST /ebook/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm-wordpress.sock", host: "138.68.238.85", referrer: "http://138.68.238.85/ebook/"

Probably worth noting that the easiest way to replicate this error is to put an email in on the /ebook/ page (which is using gravity forms). Since I don’t have the site cloned to my computer yet, I’m just making manual changes to the staging server.

Here’s what I’ve tried so far:

Turning off all plugins (except Gravity Forms, because the site works fine when not using Gravity Forms)
Restarting Server
Restarting nginx
Restarting php7.1-fpm
Manually editing and increasing buffers: fastcgi_buffers 16 16k;
& fastcgi_buffer_size 32k;
Restarting nginx, php-fpm, and entire server
Adding fastcgi_read_timeout 180; and fastcgi_read_timeout 600s; to nginx.config html block (right underneath the fastcgi buffer sizes).

Since this site is fairly high traffic, I need to resolve this asap and willing to pay for a consult if needed. Thanks!

Most circulated related articles/resources I’ve tried out:

MWDelaney · January 24, 2018, 6:37am

Is this a Digital Ocean droplet? If so, what size?

masoninthesis · January 24, 2018, 6:39am

Yeah it is a DigitalOcean droplet, 20GB.

MWDelaney · January 24, 2018, 6:41am

Oh, wow. Does the DO control panel show the processor or memory maxing out? Might increasing the memory and processor resolve the issue?

What about setting up microcaching? You’d need Trellis to do that I think.

masoninthesis · January 24, 2018, 6:56am

I’m not exactly sure what I’m looking for on DO, I suppose the graphs look fine (bandwidth/CPU/etc).

But I’d be surprised if it has to do with traffic, because I have the staging and production on their own droplets, and they’re both doing the exact 504 errors. Despite the fact that nobody has goes to the staging site other than me.

Not sure if it’s microcaching, I’ll look into it. Thanks.

alwaysblank · January 24, 2018, 7:22am

Found this thread, with a similar problem: Gravity Forms won’t submit in production Have you checked whatever you’re using to send email from the server (SMTP, mailgun, etc) to make sure it’s working correctly? If you turn off notifications for Gravity Forms, does it stop throwing errors?

masoninthesis · January 24, 2018, 7:33am

Oh! I’ve seen this thread, and wrote it off because I thought it had mostly to do w/ an AJAX issue.

But, upon disabling my GravityForms notifications, it’s fixing it. It has something to do w/ that. It could be SendGrid or some SMTP issue. I’ll look further and update. Thanks!

swalkinshaw · January 25, 2018, 3:22am

Just for clarification, upstream timed out means Nginx timed out waiting for the upstream server which in this case is obviously fastcgi (php) as you know. So this just means the actual WP/PHP execution is taking way too long.

Increasing fastcgi_read_timeout should work if you increased it enough, but this won’t really fix anything especially for a high traffic site. The real solution is to fix the underlying cause of the long execution time.

alwaysblank · January 26, 2018, 4:53pm

I haven’t used SendGrid, but I use Mailgun all the time, and I’ve had much better luck using the Mailgun WP plugin, which uses the Mailgun HTTP API to send emails instead of SMTP. You might get better results if you send email through the SendGrid API.

masoninthesis · January 27, 2018, 12:02am

I want to try Mailgun, so that’s great to hear! I’m going to setup the plugin and test it immediately. Only issue is that it’s taken over 24 hours now to verify my domain.

Looks like Sendgrid has a plugin too, so I’ll test that meanwhile.

To update my progress on this issue:
The temporary solution was that I had Slack integrated to GravityForms, so I just disable notifications and had the client use Slack for now– so that gave me some breathing room to get the bottom of the issue.

I spun up a completely new Trellis environment, installed my old theme, synced databases, and the error persisted. So I know it had nothing to do w/ Trellis or Bedrock itself.

I was still getting 504’s on the new staging server, so I changed my Trellis/group_vars/all/mail.yml to the default, reporivision & deployed, then the 504’s went away– although obviously email won’t work correctly on Trellis’s default mail.yml.

Perhaps something might have changed on my Sendgrid SMTP login? Or perhaps the port? I’m not sure what the issue is exactly, but once I’ll update my solution once I find it. Thanks @alwaysblank!

Update on fix:

For now, Sendgrid’s Wordpress Plugin is fixing the issue. From now on, I’ll probably just include this plugin in my default composer.json since it’s so simple to setup. It has a cool stats widget on the WP Dashboard as well.