SEO Intermediate Updated 2026-03-22

What is Robots.txt?

Robots.txt is a plain-text file at your website's root that instructs search engine crawlers which URLs they can and can't access — controlling how Googlebot and other bots interact with your site.

On This Page

What is Robots.txt?

Robots.txt is a text file placed at your domain’s root directory (yoursite.com/robots.txt) that tells search engine crawlers which pages or sections of your site they’re allowed to access.

Every major search engine — Google, Bing, Yahoo — checks this file before crawling your site. Think of it as a bouncer’s list. Not a lock on the door, but a clear set of instructions that well-behaved bots follow.

According to Google’s own documentation, Googlebot checks robots.txt before making any request to your server. For sites with thousands of pages, that file becomes one of the most important pieces of your technical SEO setup.

Why Does Robots.txt Matter?

Getting your robots.txt wrong can tank your rankings overnight. One misplaced directive and Google can’t see your most important pages.

  • Crawl budget protection — Large sites have limited crawl budget. Blocking low-value pages (admin panels, staging areas, duplicate filters) keeps Googlebot focused on what matters.
  • Prevents indexing of sensitive areas — Internal search results, login pages, and cart pages don’t belong in the SERP. Robots.txt keeps bots away.
  • Faster discovery of new content — When crawlers aren’t wasting requests on junk pages, they find your new blog posts and product pages faster.
  • Server load management — Aggressive bots can strain small servers. Blocking unnecessary crawling reduces resource consumption.

If you’re publishing content regularly — whether that’s 5 pages or 30 articles a month — you need crawlers spending their time on the right URLs.

How Robots.txt Works

The file uses a simple syntax. Three core directives handle most use cases.

User-Agent

This line specifies which crawler the rule applies to. User-agent: * targets all bots. User-agent: Googlebot targets only Google’s crawler. You can stack multiple rules for different bots in the same file.

Disallow

The Disallow directive blocks a specific path. Disallow: /admin/ prevents crawlers from accessing anything under the /admin/ directory. Leave it blank (Disallow:) and you’re allowing everything. A single forward slash (Disallow: /) blocks the entire site.

Allow and Sitemap

Allow overrides a broader Disallow rule for specific paths — useful when you block a directory but want one page inside it crawled. The Sitemap directive points crawlers to your XML sitemap, helping them discover all your important URLs without guessing.

How Google Processes It

Googlebot fetches your robots.txt before crawling anything else. If the file returns a 200 status, Google follows the rules. A 404 means “no restrictions” — everything gets crawled. A 5xx error makes Google temporarily cautious and limits crawling until the file becomes accessible again.

Types of Robots.txt Directives

Robots.txt directives fall into 4 main categories:

  • Access directives (Allow/Disallow) — Control which paths bots can visit. The foundation of every robots.txt file.
  • User-agent directives — Target rules at specific bots. You might block SEMrushBot while allowing Googlebot full access.
  • Crawl-delay directives — Tell bots to wait between requests. Google ignores this (use Google Search Console instead), but Bing and Yandex respect it.
  • Sitemap directives — Point to your sitemap file. Not technically a “rule,” but a discovery mechanism bots rely on.

Most small-to-medium sites only need access directives and a sitemap reference. Crawl-delay matters more for large-scale sites with server constraints.

Robots.txt Examples

Example 1: Local plumbing company A plumber in Austin has a WordPress site with /wp-admin/, /cart/, and /internal-pricing/ directories. Their robots.txt blocks all three and includes a sitemap reference. Result: Googlebot spends its time on service pages and blog posts — not admin panels.

Example 2: eCommerce store with filtered pages An online retailer has 50 products but 3,000 filter combinations (size + color + price). Without robots.txt blocking /products?filter=, Googlebot wastes crawl budget on duplicate filtered pages. One Disallow line fixes it.

Example 3: Accidentally blocking the entire site A marketing agency moved from staging to production and left Disallow: / in robots.txt. For 3 weeks, nothing got indexed. Traffic dropped to zero. One character caused it — the forward slash after Disallow.

Robots.txt vs Meta Robots Tag

These two do different jobs at different stages. Robots.txt stops crawlers before they reach a page. The meta robots tag gives instructions after a crawler has already accessed it.

Robots.txtMeta Robots Tag
Where it livesRoot directory fileHTML <head> of individual pages
When it actsBefore crawlingAfter crawling
ScopeEntire directories or pathsIndividual pages
Can prevent indexing?No — only prevents crawlingYes — noindex removes from search
Best forBlocking sections of a siteRemoving specific pages from search

Here’s the catch: if you block a page with robots.txt, Google can’t see a noindex tag on that page. So the page might still appear in search results (with no snippet) because Google found a link to it elsewhere. To truly remove a page from search, use the meta robots tag — not robots.txt.

Robots.txt Best Practices

  • Always include a Sitemap directive — Point to your XML sitemap so crawlers have a complete map of your site. One line: Sitemap: https://yoursite.com/sitemap.xml.
  • Never block CSS or JavaScript files — Google needs to render your pages to understand them. Blocking these resources hurts your on-page SEO.
  • Test before deploying — Use Google Search Console’s robots.txt tester to check your rules. A typo can block your entire site.
  • Review quarterly — As your site grows, new directories appear. What made sense 6 months ago might be blocking important content today.
  • Pair with a content strategy — Robots.txt manages what gets crawled, but you still need pages worth crawling. Services like theStacc publish 30 SEO-optimized articles per month, giving Googlebot fresh content to discover on every visit.

Frequently Asked Questions

Does robots.txt stop pages from appearing in Google?

Not directly. Robots.txt prevents crawling, not indexing. If other sites link to a blocked page, Google may still show it in results — just without a description snippet. Use a noindex meta tag to fully remove a page from search.

Where do I put my robots.txt file?

Place it at your domain’s root: https://yoursite.com/robots.txt. Subdirectory placement doesn’t work. Each subdomain needs its own robots.txt file.

Can robots.txt improve my rankings?

Indirectly, yes. Blocking low-value pages preserves crawl budget for your important content. On large sites, this means faster discovery and indexing of new pages — which can speed up ranking improvements.

Do all bots follow robots.txt rules?

Legitimate search engine bots (Googlebot, Bingbot) respect robots.txt. Malicious bots and scrapers typically ignore it. Don’t rely on robots.txt for security — it’s a guideline, not a firewall.


Want to make sure your SEO content actually gets crawled and ranked? theStacc publishes 30 SEO-optimized articles to your site every month — automatically. Start for $1 →

Sources

SEO growth illustration

Ready to automate your SEO?

Start ranking on Google in weeks, not months with theStacc's AI SEO automation. No writing, no SEO skills, no hassle.

Start Free Trial

$1 for 3 days · Cancel anytime