{"id":321,"date":"2013-08-13T19:47:03","date_gmt":"2013-08-13T18:47:03","guid":{"rendered":"https:\/\/metricspot.com\/blog\/?p=321"},"modified":"2016-12-14T17:52:10","modified_gmt":"2016-12-14T16:52:10","slug":"robots-txt-file","status":"publish","type":"post","link":"https:\/\/metricspot.com\/blog\/robots-txt-file\/","title":{"rendered":"The ROBOTS.TXT File"},"content":{"rendered":"<p>The ROBOTS.TXT file is a text document that serves to <strong>establish crawling guidelines<\/strong> for Bots to follow when exploring your Website.<!--more--><\/p>\n<p>Bots (also called Spiders or Crawlers) are <strong>used by search engines<\/strong> to access your Website and index the content (text, images, files&#8230;) of its pages.<\/p>\n<p>&nbsp;<\/p>\n<h3>WHAT IS IT FOR?<\/h3>\n<p>With the ROBOTS.TXT file we can <strong>discourage Bots to access a certain folder<\/strong> of our Website. We can also avoid a certain Bot from <strong>crawling our Website<\/strong> or limit its crawling frequency. Some of the reasons we might want to do this are:<\/p>\n<p>&#8211; <strong>Avoid duplicate content<\/strong>. This is the most important reason because, if we do this, we will rank higher in search engines, thus increasing our traffic.<\/p>\n<p>&#8211; <strong>Reduce server overload<\/strong> due to excess of search engine petitions that could saturate it.<\/p>\n<p>&#8211; <strong>Avoid indexing of certain pages<\/strong> that you want to be accessible to users but not indexed in Google due to privacy reasons.<\/p>\n<p>We can also <strong>add a sitemap of our Website<\/strong> or SITEMAP.XML file to indicate Bots the URLs of all the pages of our site.<\/p>\n<p>&nbsp;<\/p>\n<h3>WHAT IS IT NOT FOR?<\/h3>\n<p>As we have said before, the ROBOTS.TXT file establishes crawling guidelines and <strong>bots may not honor your rules<\/strong>, especially the so-called \u00abbad bots\u00bb, whose only purpose is to crawl your Website searching for e-mails, private data or vulnerabilities.<\/p>\n<p>If you have sensitive information on your Website and you don&#8217;t want bots to crawl it, you should use <strong>other security means<\/strong> to protect it.<\/p>\n<p>Also, with the ROBOTS.TXT file <strong>you can&#8217;t protect your Website from Hackers<\/strong> who are using \u00abbrute force\u00bb attacks.<\/p>\n<p>&nbsp;<\/p>\n<h3>HOW TO CREATE A ROBOTS.TXT FILE<\/h3>\n<p>You can use one of the following <a title=\"Robots.txt Tools\" href=\"https:\/\/www.google.com\/search?q=create+robots.txt+tool\" target=\"_blank\">Online tools<\/a> to create the ROBOTS.TXT file, although we highly recommend you follow <strong>Google&#8217;s instructions<\/strong> in order to <a title=\"Create Robots.txt manually\" href=\"http:\/\/support.google.com\/webmasters\/bin\/answer.py?hl=es&amp;answer=156449\" target=\"_blank\">create it manually<\/a>. You can also read this <strong>Wikipedia article<\/strong> about <a title=\"Wikipedia: Robots exclusion standards\" href=\"http:\/\/en.wikipedia.org\/wiki\/Robots_exclusion_standard\" target=\"_blank\">Robots exclusion standards<\/a>.<\/p>\n<p>It has to be <strong>located in the root foder<\/strong> of your Website, the same as the FAVICON and the SITEMAP.<\/p>\n<p>&nbsp;<\/p>\n<h3>ROBOTS.TXT FILE EXAMPLE<\/h3>\n<p>This MetricSpot&#8217;s <a title=\"MetricSpot's robots.txt file\" href=\"https:\/\/metricspot.com\/robots.txt\" target=\"_blank\" rel=\"nofollow\">ROBOTS.TXT<\/a> file:<\/p>\n<div class=\"codeblock\">User-agent: *<br \/>\nDisallow: \/new\/<br \/>\nDisallow: \/tos\/<br \/>\nDisallow: \/items\/<br \/>\nDisallow: \/no\/<br \/>\nDisallow: \/condiciones-de-uso\/<br \/>\nDisallow: \/blog\/cat\/<br \/>\nDisallow: \/blog\/tag\/<br \/>\nDisallow: \/blog\/wp-admin\/<br \/>\nDisallow: \/blog\/wp-includes\/<br \/>\nDisallow: \/blog\/wp-content\/plugins\/<br \/>\nDisallow: \/blog\/wp-content\/themes\/<br \/>\nDisallow: \/blog\/feed\/<br \/>\nDisallow: \/api\/www.metricspot.com<br \/>\nDisallow: \/*.js$<br \/>\nDisallow: \/*.css$<br \/>\nSitemap: https:\/\/metricspot.com\/sitemap.xml<\/div>\n<p>&nbsp;<\/p>\n<p>The <strong>User-agent: *<\/strong> line indicates the following rules apply to ALL Bots.<\/p>\n<p>The following 5 lines block specific pages or folders using the <strong>Disallow:<\/strong> rule, followed by the URI to be blocked.<\/p>\n<p>For <strong>security reasons<\/strong>, we have blocked the \u00abTerms of Service\u00bb page because it includes information we don&#8217;t want to be indexed.<\/p>\n<p>In order to avoid <strong>duplicate content issues<\/strong> we have blocked the <strong>\/new\/<\/strong> and <strong>\/items\/<\/strong> folders, which are used by our App to create temporary content. We have also blocked the <strong>\/blog\/cat\/<\/strong> and <strong>\/blog\/tag\/<\/strong> folders used by our Blog to host categories and tags.<\/p>\n<p>The <strong>Disallow: \/*.js$<\/strong> and <strong>Disallow: \/*.css$<\/strong> rules block crawling of all JavaScript and CSS files to avoid server overload.<\/p>\n<p>Last, with the <strong>Sitemap: https:\/\/metricspot.com\/sitemap.xml<\/strong> line we show Bots where to find our Sitemap.<\/p>\n","protected":false},"excerpt":{"rendered":"The ROBOTS.TXT file is a text document that serves to establish crawling guidelines for Bots to follow when exploring your Website.","protected":false},"author":2,"featured_media":2196,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[39],"tags":[40,19,41],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/posts\/321"}],"collection":[{"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/comments?post=321"}],"version-history":[{"count":0,"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/posts\/321\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/media\/2196"}],"wp:attachment":[{"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/media?parent=321"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/categories?post=321"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/metricspot.com\/blog\/wp-json\/wp\/v2\/tags?post=321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}