robots.txt file - MetricSpot docs

Q: Can I block crawlers I don't want?

Yes, with User-agent: GPTBot followed by Disallow: /. But this only works for crawlers that respect robots.txt — and a growing list of AI scrapers ignore it. For hard blocks, use server-level user-agent rules.

Q: Does `Disallow:` prevent indexing?

No, Disallow: blocks crawling, not indexing. A page with Disallow: can still appear in search results (with no description) if other sites link to it. To prevent indexing, use a noindex meta tag or X-Robots-Tag: noindex header instead.

Q: What if I want to allow everything?

The simplest valid file is: ` User-agent: * Allow: / ` You can omit the file entirely and Google will treat it as "all crawling allowed," but you also lose the sitemap reference and the explicit signal.

What this check does

GETs https://yourdomain.com/robots.txt and confirms it returns 200 with a parseable robots file. A missing file (404) or non-200 status fails the check.

Why it matters

robots.txt is the first URL every crawler — Googlebot, GPTBot, ClaudeBot, PerplexityBot, archive.org — fetches before scanning your site. It’s your one chance to:

Direct crawlers to your sitemap with a Sitemap: line, dramatically improving discovery for pages not linked from the homepage.
Block crawl traps: infinite calendars, faceted search filters, internal-search result pages.
Allow or disallow AI crawlers selectively (separate check).

Without a robots.txt, you’re saying “crawl whatever you find, in whatever order” — and crawlers waste budget on pages you don’t care about.

How to fix it

Create /public/robots.txt (or wherever your server serves static files from) with at minimum:

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

That’s the “open site” baseline. To block specific paths:

User-agent: *
Disallow: /admin/
Disallow: /search?
Disallow: /cart/

Sitemap: https://yourdomain.com/sitemap.xml

Common patterns:

WordPress: WordPress auto-generates a virtual robots.txt unless /public/robots.txt exists. Yoast / Rank Math let you edit it in the admin.
Next.js: create app/robots.ts exporting a MetadataRoute.Robots object.
Astro: drop a static public/robots.txt file.

After publishing, test with Google Search Console → robots.txt Tester.

Frequently asked questions

Can I block crawlers I don’t want?

Yes, with User-agent: GPTBot followed by Disallow: /. But this only works for crawlers that respect robots.txt — and a growing list of AI scrapers ignore it. For hard blocks, use server-level user-agent rules.

Does `Disallow:` prevent indexing?

No, Disallow: blocks crawling, not indexing. A page with Disallow: can still appear in search results (with no description) if other sites link to it. To prevent indexing, use a noindex meta tag or X-Robots-Tag: noindex header instead.

What if I want to allow everything?

The simplest valid file is:

User-agent: *
Allow: /

You can omit the file entirely and Google will treat it as “all crawling allowed,” but you also lose the sitemap reference and the explicit signal.