X-robots-tag

The X-Robots-Tag is a directive used in the HTTP header of a webpage to control how search engines like Google crawl and index certain content on your site. It works similarly to the meta robots tag, but the key difference is that it can be applied at the server level and to non-HTML files like PDFs, images, and videos. This makes it a flexible tool for managing what content search engines should or shouldn’t index, beyond just web pages.

Key Uses of the X-Robots-Tag

  1. Crawling Control for Non-HTML Files: Unlike meta robots tags, which are typically used on HTML pages, the X-Robots-Tag is useful for controlling how non-HTML files such as images, videos, and PDFs are indexed by search engines.
  2. Site-Wide Directives: You can apply the X-Robots-Tag globally to cover many files at once (like an entire folder of documents), without needing to adjust each individual page or file. This is particularly helpful if you want to manage crawling or indexing directives across large sections of your site.
  3. Custom Directives: You can combine multiple instructions in the X-Robots-Tag, such as noindex (preventing a page or file from being listed in search results) or nofollow (preventing links from passing authority). This gives you more granular control than the standard robots.txt file.

When to Use the X-Robots-Tag

  • Non-HTML files: If you want to block or control the indexing of file types like PDFs, images, or videos, using the X-Robots-Tag in the HTTP header is an efficient option.
  • Page-specific control: For example, you can block internal search pages or privacy policies from showing up in search results if they aren’t important for users.
  • Greater flexibility: The X-Robots-Tag allows you to apply directives across a larger scale, making it a powerful tool for webmasters who need more flexibility in managing how search engines interact with their site.

Real-World Example

Let’s say you want to prevent PDFs on your site from being indexed by search engines. You can add the following X-Robots-Tag directive in your server configuration:

<Files ~ ".pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

This ensures that all PDF files are not indexed or followed by search engines.