Back to glossary

Robots.txt

A text file placed at a site's root that instructs search engine crawlers which URLs they are allowed or disallowed from crawling. Robots.txt manages crawl behavior but does not prevent indexing of pages discovered through external links.

Robots.txt is the first file search engine crawlers check when visiting your site. It uses a simple syntax of User-agent directives to specify which bots the rules apply to, and Allow/Disallow directives to control access to URL paths. The file also commonly includes the location of your XML sitemap.

For technical SEO, robots.txt is a blunt but powerful tool for managing crawler access. Use it to block crawling of admin pages, internal search results, staging environments, and other low-value paths. However, understand its limitations: robots.txt prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index the URL (without content) in search results. For preventing indexing entirely, use the noindex meta tag or X-Robots-Tag header instead. Common mistakes include accidentally blocking CSS and JavaScript files (which prevents Google from rendering pages correctly), blocking important page sections with overly broad rules, and forgetting to update robots.txt after site migrations. Test your rules using Google Search Console's robots.txt tester.

Related Terms