Back to glossary

Robots.txt

A text file placed at a site's root that instructs search engine crawlers which URLs they are allowed or disallowed from crawling. Robots.txt manages crawl behavior but does not prevent indexing of pages discovered through external links.

Robots.txt is the first file search engine crawlers check when visiting your site. It uses a simple syntax of User-agent directives to specify which bots the rules apply to, and Allow/Disallow directives to control access to URL paths. The file also commonly includes the location of your XML sitemap.

For technical SEO, robots.txt is a blunt but powerful tool for managing crawler access. Use it to block crawling of admin pages, internal search results, staging environments, and other low-value paths. However, understand its limitations: robots.txt prevents crawling, not indexing. If other sites link to a disallowed page, Google may still index the URL (without content) in search results. For preventing indexing entirely, use the noindex meta tag or X-Robots-Tag header instead. Common mistakes include accidentally blocking CSS and JavaScript files (which prevents Google from rendering pages correctly), blocking important page sections with overly broad rules, and forgetting to update robots.txt after site migrations. Test your rules using Google Search Console's robots.txt tester.

Related Terms

Core Web Vitals

A set of three Google-defined metrics that measure real-world user experience for loading performance, interactivity, and visual stability. Core Web Vitals are a confirmed ranking factor in Google Search.

Largest Contentful Paint (LCP)

A Core Web Vital that measures the time from page load start until the largest visible content element (image, video, or text block) is rendered on screen. Good LCP is 2.5 seconds or less.

Interaction to Next Paint (INP)

A Core Web Vital that measures the latency of all user interactions (clicks, taps, keyboard input) throughout the page lifecycle, reporting the worst interaction. Good INP is 200 milliseconds or less.

Cumulative Layout Shift (CLS)

A Core Web Vital that measures the total amount of unexpected layout shifts that occur during a page's entire lifespan. Good CLS is 0.1 or less, where layout shifts are calculated from the impact and distance of moving elements.

Time to First Byte (TTFB)

The duration from the user's request to the first byte of the server response reaching the browser. TTFB measures server-side processing speed and network latency, directly impacting all subsequent loading metrics.

Crawl Budget

The number of pages a search engine bot will crawl on your site within a given timeframe, determined by crawl rate limit and crawl demand. Crawl budget optimization ensures important pages are discovered and indexed efficiently.