The ROBOTS.TXT File
13-8-2013 Basic SEOThe ROBOTS.TXT file is a text document that serves to establish crawling guidelines for Bots to follow when exploring your Website.
Bots (also called Spiders or Crawlers) are used by search engines to access your Website and index the content (text, images, files…) of its pages.
WHAT IS IT FOR?
With the ROBOTS.TXT file we can discourage Bots to access a certain folder of our Website. We can also avoid a certain Bot from crawling our Website or limit its crawling frequency. Some of the reasons we might want to do this are:
– Avoid duplicate content. This is the most important reason because, if we do this, we will rank higher in search engines, thus increasing our traffic.
– Reduce server overload due to excess of search engine petitions that could saturate it.
– Avoid indexing of certain pages that you want to be accessible to users but not indexed in Google due to privacy reasons.
We can also add a sitemap of our Website or SITEMAP.XML file to indicate Bots the URLs of all the pages of our site.
WHAT IS IT NOT FOR?
As we have said before, the ROBOTS.TXT file establishes crawling guidelines and bots may not honor your rules, especially the so-called «bad bots», whose only purpose is to crawl your Website searching for e-mails, private data or vulnerabilities.
If you have sensitive information on your Website and you don’t want bots to crawl it, you should use other security means to protect it.
Also, with the ROBOTS.TXT file you can’t protect your Website from Hackers who are using «brute force» attacks.
HOW TO CREATE A ROBOTS.TXT FILE
You can use one of the following Online tools to create the ROBOTS.TXT file, although we highly recommend you follow Google’s instructions in order to create it manually. You can also read this Wikipedia article about Robots exclusion standards.
It has to be located in the root foder of your Website, the same as the FAVICON and the SITEMAP.
ROBOTS.TXT FILE EXAMPLE
This MetricSpot’s ROBOTS.TXT file:
Disallow: /new/
Disallow: /tos/
Disallow: /items/
Disallow: /no/
Disallow: /condiciones-de-uso/
Disallow: /blog/cat/
Disallow: /blog/tag/
Disallow: /blog/wp-admin/
Disallow: /blog/wp-includes/
Disallow: /blog/wp-content/plugins/
Disallow: /blog/wp-content/themes/
Disallow: /blog/feed/
Disallow: /api/www.metricspot.com
Disallow: /*.js$
Disallow: /*.css$
Sitemap: https://metricspot.com/sitemap.xml
The User-agent: * line indicates the following rules apply to ALL Bots.
The following 5 lines block specific pages or folders using the Disallow: rule, followed by the URI to be blocked.
For security reasons, we have blocked the «Terms of Service» page because it includes information we don’t want to be indexed.
In order to avoid duplicate content issues we have blocked the /new/ and /items/ folders, which are used by our App to create temporary content. We have also blocked the /blog/cat/ and /blog/tag/ folders used by our Blog to host categories and tags.
The Disallow: /*.js$ and Disallow: /*.css$ rules block crawling of all JavaScript and CSS files to avoid server overload.
Last, with the Sitemap: https://metricspot.com/sitemap.xml line we show Bots where to find our Sitemap.
english ROBOTS.TXT SEO course