Robots.txt Tester
Validate your robots.txt file, check for syntax errors, and view crawl rules.
Why validate robots.txt?
- Prevent accidental de-indexing of critical pages
- Ensure XML sitemaps are discoverable by bots
- Block crawling of sensitive admin or testing areas
- Review crawl budget optimization rules
- Fix syntax errors that bots might misinterpret
Frequently Asked Questions
What is a robots.txt file?
It is a text file placed in the root directory of your website (e.g., example.com/robots.txt) that gives instructions to web crawlers about which pages they can or cannot crawl.
Does Disallow mean 'No Index'?
No. 'Disallow' only blocks crawling. If a page has external links pointing to it, Google may still index the URL without the content. To prevent indexing completely, use the 'noindex' meta tag.
How do I specify my sitemap location?
Add a line at the bottom of your robots.txt file: Sitemap: https://yourdomain.com/sitemap.xml
What does User-agent: * mean?
The asterisk (*) is a wildcard that represents 'all web crawlers'. Rules following this directive apply to every bot (Googlebot, Bingbot, etc.) unless a more specific user-agent is defined.
What should I do if my robots.txt returns HTML?
This usually means the file doesn't exist, and your server is returning a custom 404 error page. You should create a plain text file named 'robots.txt' and upload it to your root web directory.
Results are generated in real-time. For best accuracy, verify critical issues manually.
What this tool checks
- ✓ Robots.txt Existence (200 OK vs 404)
- ✓ Syntax Errors (Invalid formatting)
- ✓ Sitemap Declarations (Discovery)
- ✓ Allow vs Disallow Rule Count
Common problems this tool finds
- ⚠️ Accidental blocking of CSS/JS assets
- ⚠️ Blocking the entire site during dev (/)
- ⚠️ Missing Sitemap declaration
- ⚠️ Returns HTML status page instead of text
- ⚠️ Conflicting User-agent rules
How to fix results (Quick Checklist)
- 1.Ensure the file is strictly plain text, not HTML or Rich Text.
- 2.Place the 'Sitemap:' directive on its own line at the end.
- 3.Use 'Allow:' for specific subfolders inside a Disallowed parent.
- 4.Double check wildcard usage (*) to avoid unintended blocking.
- 5.Test changes in Google Search Console after updating.