Troubleshooting slow server response times

I’m working on a project that has grown quite complex and apparently slow.

Local env: Lando (with xdebug) + Bedrock + Sage 10 (Acorn v3)

Prod env: AWS EC2 (t3.medium) + Trellis (Ubuntu 20.04 deploy)

As for the theme, I’ve been trying to keep the usage of unnecessary plugins to bare minimum, currently: ACF Pro, Gravity Forms, WooCommerce, WooCommerce Subscriptions, WP Rocket, WPML, Yoast SEO and a few other smaller ones that I don’t consider worth mentioning.

Chrome Developer tools tell that “Waiting for server response” varies between 2- 6.5 sec. However I’ve already refactored the code thoroughly (easily 40+ hours into it), optimized queries and made sure everything is as DRY as possible (to my knowledge of 7+ years with WP and PHP in general).

For those views that are the slowest I’ve tried troubleshooting with the following tools:

Xdebug - very useful and highly recommended tool for developing overall.

Xdebug profiler - seems very promising but haven’t had any luck getting it to work with Lando, although the setup configuration is pretty straight forward. I might look into it once more.

Debug Bar - looked into it, but seems quite similar to Query Monitor and I feel Query Monitor is giving more useful information.

Query Monitor - that shows the “Page Generation Time” varies 2.5-4 sec but “Database Queries” is ~0.1 sec (~230 SELECTS) and “Peak Memory Usage” ~55 MB at most parts.

There is a quite in depth article on using Query Monitor. The “Plugin profiling with Queries by Component” seems like another interesting aspect to look into.

Also, Profiling and Logging in Query Monitor is something that might help diagnose further but it is a very manual implementation and usefulness heavily relies on the developer.

Overall, “profiling” seems to stand out as an important keyword here.

Are there any other “must have” tools to use for diagnosing what’s going on server-side (profiling and timings)? Any other ideas?

First rule out DNS issues, how fast does dig look up the IP address of the site? Use ping and check for network issues (short ping times, variations).
Then use a tool like wget or curl and check the response time reported by these, are those really that large? Do they vary a lot?

(Besides profiling) I achieved significant performance improvements by using an object cache (Redis based in that case):

PHP 8 also improved performance compared to PHP 7.x.

2 Likes

As @strarsis said, DNS is a good place to start.

Also, how are you running WP Cron? Apologies if you’ve already seen this, but disabling WP Cron and running via system crontab is a good idea.

As for profiling, running the Xdebug profile output through a visualisation tool like kcachegrind will be very enlightening. I’d push on with that task and see what shows up.

As casual advice - I know Query Monitor seems to indicate that the issue isn’t database-bound, but in my experience, large Woocommerce sites suffer with less-than-optimal indices.

We found adding appropriate indices helped a lot. There’s a really useful plugin which can improve the default WP index setup out of the box, and can give you a hand in setting up more to better optimise your application: Index WP MySQL For Speed – WordPress plugin | WordPress.org

3 Likes

That looks promising, I will try that, too!

I went over the recommendations.

DNS side seems to be fine. DNS is managed in CloudFlare, dig command responds instantly (don’t see any timings there), ping is stable 10-13ms (rarely some single 20-30ms ones) and doesn’t seem to be affected at all when I request the slow pages during pinging.

For public pages, wget and curl requests seem instant as well. wget indicates that homepage is ~300K and loads 0.06s. Testing private pages (where the main problem is) seems more complicated due to authentication. I’m not entirely sure what else to look for in the wget and curl results.

Tried out the “Index WP MySQL For Speed” plugin - no effect. I think its worth mentioning that the WooCommerce side is pretty light - only 7 products. No extra complexity nor custom mods - only some hooks to do something then specific products are purchased.

PHP version was already 8.0. Launched a new EC2 instance with Ubuntu 22.04, upgraded Trellis to v1.12.1 and provisioned the server again - no effect.

Upgraded EC2 instance type to t3.xlarge - no effect.

Moved the database to RDS (same region) - no effect. It even seemed the response times got a bit slower.

I’m yet to check if switching WP Cron to system crontab does anything and look into object cache.

As for object cache, @strarsis you were referring to FastCGI Caching described in Trellis documentation?

# ./trellis/group_vars/production/wordpress_sites.yml

cache:
  enabled: true
  skip_cache_uri: /wp-admin/|/wp-json/|/xmlrpc.php|wp-.*.php|/feed/|index.php|sitemap(_index)?.xml|/store.*|/cart.*|/my-account.*|/checkout.*|/addons.*
  skip_cache_cookie: comment_author|wordpress_[a-f0-9]+|wp-postpass|wordpress_no_cache|wordpress_logged_in|woocommerce_cart_hash|woocommerce_items_in_cart|wp_woocommerce_session_

It’s enabled in my case with the configuration above. Doesn’t seem to have a significant effect either.

I must try harder and get the Xdebug Profiling to work… mind boggling.

No, I don’t mean the nginx microcaching (page cache) (this is another layer of caching that is helpful), but object caching for WordPress, like memcached or Redis.

I also encountered a similar issue and funnily it was one rogue plugin that erroneously performed a costly HTTP request on each and every request (it was the Weather Station plugin).

Were you able to profile the site using a profiler, is something pointed out that takes a very long time on each request?

1 Like

CPU on server maxes out on almost every request is a long but really useful thread on a similar performance troubleshooting journey.

Installing Tideways as a trial should be helpful.

2 Likes

This is another performance profiling tool for wp :v:

8 Likes

Just wanted to second this recommendation: This exact process (get a profile dump and run it through kcachegrind) helped me debug a similar issue years ago by pointing to the exact function calls that were eating up all the time (in my case it was some “clever” runtime image resizing).

I don’t have experience w/ Lando, but it looks like they’ve got instructions for setting up Xdebug: Lando + PhpStorm + Xdebug | Lando I’d strongly recommend getting that up and running, being able to profile or step through requests is extremely helpful.

3 Likes

Found out the issue thanks to a combination of “Rarst/laps” plugin suggested by @Punkwart and “Profiling and Logging in Query Monitor” that I mentioned in my initial post.

Rarst/laps plugin provided a very easy implementation of “light WordPress profiler” as the plugin documentation states. It helped me pinpoint the issue to “Template Loader” hook (template_redirect). Going to remember this one! Although I should definitely get Xdebug profiling to work as well in the future (when I make up the lost time… damn).

And I then used…

Profiling and Logging in Query Monitor custom profiling qm/start and qm/stop actions to see where the code is taking a significant amount of time to run:

// Start the 'foo' timer:
do_action( 'qm/start', 'foo' );

// Run some code
my_potentially_slow_function();

// Stop the 'foo' timer:
do_action( 'qm/stop', 'foo' );

Overall, big thanks to everyone who contributed to this thread and provided valuable information!

8 Likes

Hi @RistoKaalma,

I’m really happy to hear that you were successful.
I usually find answers in the roots.io forum as well.

I should also take the time to optimize the performance of one of my sites - thanks for sharing your solution and details with us, it will help me and others :slight_smile:

cu soon :v:

2 Likes

I’m really happy with New Relic for server monitoring. It does takes a bit to set up, and then it’s invaluable.

New Relic can show the aggregate response times of WordPress over time. It monitors RAM, CPU, PHP, nginx load, mysql queries. It monitors plugins and theme response times and internal and external hook times. It can also ingest logs and relate requests, server responses, response times, and errors to each other. You can zero in on slices of time, and can set alerts.

Query Monitor is awesome for looking at a single page load, New Relic is awesome for aggregate data over time.

1 Like

Might check this out in the future, thanks for sharing! :+1: