How to Create the Perfect robots.txt File for SEO
Master robots.txt syntax and best practices. Learn how to control crawler access, optimize crawl budget, and avoid common mistakes.
By RiseTop Team · May 2026 · 8 min read
The robots.txt file is the first thing search engine crawlers look for when visiting your site.
Syntax Basics
| Directive | Purpose | Example |
|---|
| User-agent | Which crawler | User-agent: Googlebot |
| Disallow | Block paths | Disallow: /admin/ |
| Allow | Allow specific paths | Allow: /admin/public/ |
| Sitemap | Sitemap location | Sitemap: /sitemap.xml |
Essential Rules
- Place robots.txt in your site root
- Always include your sitemap URL
- Do not block CSS and JS files - Google needs them to render pages
- Test with Google robots.txt Tester before deploying
Warning: Disallowing a page in robots.txt does NOT remove it from search results. Use noindex meta tags for that.
Frequently Asked Questions
Should I block admin pages in robots.txt? +
Yes, but use Allow for public resources within admin. Remember that disallowed pages may still appear in search results.
What happens if I do not have a robots.txt? +
Crawlers will assume they can access everything. This is fine for most small sites, but you miss the opportunity to point to your sitemap.
How do I block AI crawlers? +
Add specific User-agent rules for AI bots. For example: User-agent: GPTBot followed by Disallow: / will block ChatGPT crawler.