Correct method for hiding production site from search engines

Hey guys,

Got a site where the client wants to hide the site from search engines, but have the site live still prior to launching their product.

I tried adding the following to the config/application.php file with the other custom settings:
Config::define('DISALLOW_INDEXING', false);

However the robots.txt file is still showing the following on production:

User-agent: *
Disallow: /wp-admin/
Disallow: /*s=*
Disallow: /xmlrpc.php
Allow: /wp-admin/admin-ajax.php

I tested and can also confirm that the constant DISALLOW_INDEXING is showing as true on the website, however the hide from search engines option was unticked and the robots.txt file remained as above. Also ticking hide my site also appears to not change the robots.txt file.

For now I’ve manually created a robots.txt file and threw in a robots meta tag for good measure, but what is the correct process for this?


@Steve_de_Niese Did you ever find a more automated ‘Trellis-based’ solution, or have you stuck with the manually edited robots.txt file? I’ve got a client that needs the production server not to be indexed. Thanks!

I’ve added the robots_tag_header config in ./group_vars/production/wordpress_sites.yml and have re-provisioned the server.

  enabled: true

This adds the X-Robots-Tag header line if set to true in ./trellis/roles/wordpress-setup/templates/wordpress-site.conf.j2

{% block robots_tag_header -%}
  {% if robots_tag_header_enabled -%}
  # Prevent search engines from indexing non-production environments
  add_header X-Robots-Tag "noindex, nofollow" always;

  {% endif -%}
  {% endblock -%}

This added the line to the site’s NGINX conf file:

add_header X-Robots-Tag "noindex, nofollow" always;

Hope this helps.

@Steve_de_Niese if you want to hide contents from search engines, you need to forget the robots.txt for the moment.

First, you must tag your pages as noindex: if you prevent search engines from crawling your website (using robots.txt) they cannot remove indexed pages from their index. You could try adding a meta tag <meta name="robots" content="noindex, nofollow"> into your HTML pages or you can add a meta x-robots tag through NGINX or Apache or LiteSpeed (your web server).

For example:

X-Robots-Tag: noindex, nofollow, nosnippet, noarchive

My two cents here: I prefer using an under-construction plugin. This will show an under-construction/maintenance page/message to non-logged in visitors, while logged in users can view and interact with the site.

If you want a site that is parallel to the production site and should be used for previewing, this would be a candidate for a staging site. Bedrock sites already include the Bedrock Disallow Indexing plugin that disallow indexing when the DISALLOW_INDEXING constant is true (which is the case for staging and development environment configs in Bedrock sites):
GitHub - roots/bedrock-disallow-indexing: Disallow indexing of your site on non-production environments.