technical
Serve a real 404, not a soft-404
MetricSpot requests a random non-existent URL and checks the status. Soft-404s (200 OK on a "not found" page) bloat the index with junk URLs and confuse crawlers.
What this check does
Issues a request to a randomized URL on your domain that almost certainly doesn’t exist (something like /__metricspot_probe_${random}). The check fails when:
- The server returns
200 OKwith a “not found” page (a soft 404 — the worst case). - The server returns
5xxinstead of404. - The server returns
200with a redirect to the homepage (also a soft 404 in Google’s eyes).
It passes when the server returns a real 404 (or 410 Gone) with a useful error page.
Why it matters
Soft 404s confuse crawlers and corrupt your index.
- Index bloat. Google indexes a soft-404 page like any normal 200 page. A site with a buggy CMS that returns 200 for typos can end up with thousands of garbage URLs in the index.
- Wasted crawl budget. Googlebot revisits soft-404 URLs looking for new content, eating into the crawl budget for real pages.
- Search Console errors. Search Console will eventually mark soft-404 pages as
Soft 404errors and stop indexing them — but the cleanup takes weeks. - UX. A real 404 page can suggest related content, route users to search, and recover the visit. A 200-redirect-to-homepage strands them with no signal that the original URL was broken.
How to fix it
Configure your server to return 404 Not Found for unknown paths, and render a helpful page on top.
nginx — define a custom error page that still returns 404:
server {
error_page 404 /404.html;
location = /404.html {
internal;
}
}
Apache (.htaccess):
ErrorDocument 404 /404.html
Next.js (App Router):
// app/not-found.tsx
export default function NotFound() {
return (
<main>
<h1>404 — Page not found</h1>
<p>Try the <a href="/">homepage</a> or <a href="/search/">search</a>.</p>
</main>
);
}
Next.js automatically serves this with a 404 status code.
Astro:
---
// src/pages/404.astro
---
<html>
<head><title>404 — Not found</title></head>
<body>
<h1>404 — Page not found</h1>
<a href="/">Back to home</a>
</body>
</html>
Astro builds this as a static 404.html that nginx serves with the right status code automatically.
WordPress — themes ship a 404.php template. If yours redirects to home instead of returning 404, check functions.php for a misguided wp_redirect() and remove it.
Make the page useful. A real 404 should:
- Be honest — heading reads “Page not found”, not “Welcome back.”
- Suggest 3–4 popular pages (sitewide nav doesn’t count; offer specific links).
- Include search if you have it.
- Match site branding so the user knows they’re still on your site.
Audit yourself:
curl -sI https://yourdomain.com/__definitely_does_not_exist
The first line must be HTTP/1.1 404 Not Found (or HTTP/2 404). Anything else fails the check.
Frequently asked questions
Should I redirect 404s to the homepage?
No. A 301 redirect to home tells Google “this URL moved to /” — which is a lie. The user sees the homepage and has no idea their original URL was wrong; the URL is now “indexed” as a duplicate of home. Always return a real 404.
What’s the difference between 404 and 410?
404 Not Found means “we couldn’t find this resource right now.” 410 Gone means “this resource existed but has been permanently removed.” Google removes 410 URLs from the index faster than 404 URLs. Use 410 for content you’ve deliberately deleted; 404 is fine for typos and bots probing for vulnerabilities.
My 404 page returns 200 — how do I fix that?
Two common causes: (1) your CMS uses a wildcard route that catches everything, including 404s, and serves them with a 200; (2) your CDN cached an early-broken response with status 200. For (1), explicitly set res.status(404) in your framework’s not-found handler. For (2), purge the CDN cache and verify directly against origin.
Sources
Last updated 2026-05-11