Robots.txt Tester

Validate your robots.txt file, check for syntax errors, and view crawl rules.

Why validate robots.txt?

  • Prevent accidental de-indexing of critical pages
  • Ensure XML sitemaps are discoverable by bots
  • Block crawling of sensitive admin or testing areas
  • Review crawl budget optimization rules
  • Fix syntax errors that bots might misinterpret

Frequently Asked Questions

What is a robots.txt file?

It is a text file placed in the root directory of your website (e.g., example.com/robots.txt) that gives instructions to web crawlers about which pages they can or cannot crawl.

Does Disallow mean 'No Index'?

No. 'Disallow' only blocks crawling. If a page has external links pointing to it, Google may still index the URL without the content. To prevent indexing completely, use the 'noindex' meta tag.

How do I specify my sitemap location?

Add a line at the bottom of your robots.txt file: Sitemap: https://yourdomain.com/sitemap.xml

What does User-agent: * mean?

The asterisk (*) is a wildcard that represents 'all web crawlers'. Rules following this directive apply to every bot (Googlebot, Bingbot, etc.) unless a more specific user-agent is defined.

What should I do if my robots.txt returns HTML?

This usually means the file doesn't exist, and your server is returning a custom 404 error page. You should create a plain text file named 'robots.txt' and upload it to your root web directory.

Last updated: February 10, 2026Built by y4yes Tools Team

Results are generated in real-time. For best accuracy, verify critical issues manually.

What this tool checks

  • ✓ Robots.txt Existence (200 OK vs 404)
  • ✓ Syntax Errors (Invalid formatting)
  • ✓ Sitemap Declarations (Discovery)
  • ✓ Allow vs Disallow Rule Count

Common problems this tool finds

  • ⚠️ Accidental blocking of CSS/JS assets
  • ⚠️ Blocking the entire site during dev (/)
  • ⚠️ Missing Sitemap declaration
  • ⚠️ Returns HTML status page instead of text
  • ⚠️ Conflicting User-agent rules

How to fix results (Quick Checklist)

  • 1.Ensure the file is strictly plain text, not HTML or Rich Text.
  • 2.Place the 'Sitemap:' directive on its own line at the end.
  • 3.Use 'Allow:' for specific subfolders inside a Disallowed parent.
  • 4.Double check wildcard usage (*) to avoid unintended blocking.
  • 5.Test changes in Google Search Console after updating.

When to use this tool

Launch of a new website (remove dev blocks)
Fixing 'Blocked by robots.txt' status in GSC
Preventing crawling of admin/login pages
Ensuring AI bots can access your content
Debugging why images aren't appearing in search
Before running a full site crawl audit

Explore Related Tools