Roots Discourse

Site disappeared from search engines

Hello,

Can anyone help me troubleshoot why a two-year old site with thousands of users a day has just recently disappeared from search engines/Google? The only recent change had to do with us switching servers from DigitalOcean to AWS, in which case a new environment (called ‘aws’ was created) — WP_ENV was not initially set to Production, but I have since then redeployed the server and explicitly declared that variable. Here is all I know:

(1) Tried to simply resubmit it through Webmaster tools but Google Search console is saying that the “URL is not available to Google” with a reason being: “Excluded by ‘noindex’ tag”. This didn’t seem to be the case, and more over, I have since then added: <meta name='robots' content='index, follow' />

(2) The site uses the The SEO Framework plugin, which I have run through every option to ensure it doesn’t interfere — furthermore, had disabled it with no real change.

(3) Search engine visibility under Settings > Reading does not have “Discourage search engines from indexing this site” checkbox enabled.

(4) There is an age-gate in place, but it still has all the proper meta tags in place that the search engine should read. I have since then also created a whitelist to allow all crawlers to be able to access the content.

(5) We have a staging site that has a duplicate database in place, but it is not public facing and forces a login to access. For some reason, it doesn’t have a notice displayed via bedrock-disallow-indexing mu-plugin saying that it is excluded from being indexed, as some of my other sites do. Just to be sure, I have deleted that plugin from the AWS production server to be sure.

None of this has seemed to work, and I still have this issue with the URL inspector in Google Search console that denies the site from being reindexed:

I’m all out of ideas, the only thing I can think of is that maybe there are some additional NGINX configurations or some security/access rules in place on the server level being set in place by the IT team that setup the production server, as they have always made security a top level priority.

Any ideas at all would be greatly appreciated as I am starting to run out of leads at this point.

Thank you!

This happens on the HTTP headers level (X-Robots-Tag)`. This can be caused either by the HTTP server config or by PHP (WordPress plugin, WordPress theme (though this would be usually bad practice anyway)).

What Trellis release are you using? Any special modifications on top of it?

Can you set up a staging server (publicly reachable, as by Google, HTTPS (Let’s Encrypt))?
Then you can disable all plugins and check whether this header goes away.
If not, this is caused by the web server configuration.
Otherwise a particular plugin is responsible.
You can do a code search through the whole installed site for (X-Robots-Tag).

Also note (although I guess you already checked that first) that WordPress core itself has an option for discouraging search engines to index the site:

Hey that’s super helpful, thank you for the response @strarsis, much appreciated!

After a grep search for X-Robots-Tag on the production server, it somehow showed that the nginx conf file /etc/nginx/sites-available/{sitename}.com.conf contained the following which prevented the indexing:

Commenting that out and restarting nginx solved the issue. Any idea how it got there in the first place and should I do anything differently the next provision/deploy to ensure it doesn’t make it in there again?

Thanks again!

Trellis adds the configuration for these headers for non-production environments:


Check the ansible environment for your production system, is it truly production?

Do you know if that’s based on the WP_ENV variable or simply if the hosts file isn’t called “production”? For instance our “production” is actually “aws” (alongside with the regular “development” and “staging” hosts) and I was just curious if that would have caused that? Thanks again for all your help!

This is not based on an environment (env) variable during (PHP) runtime.
Rather it is the environment variable of ansible during server provision (applying the Trellis playbook).

From the provisioning instructions for Trellis:

Using Trellis CLI tool:
trellis provision production

Or manually:
$ ansible-playbook server.yml -e env=<environment>
When invoking ansible-playbook, an extra environment variable env is passed with the environment as value (e.g. production).

1 Like

This topic was automatically closed after 42 days. New replies are no longer allowed.