SEO Intermediate Updated 2026-03-22

What is Robots.txt?

Q: Where do I put my robots.txt file?

Place it at your domain's root: `https://yoursite.com/robots.txt`. Subdirectory placement doesn't work. Each subdomain needs its own robots.txt file.

Robots.txt is a plain-text file at your website's root that instructs search engine crawlers which URLs they can and can't access — controlling how Googlebot and other bots interact with your site.

What is Robots.txt?

Robots.txt is a text file placed at your domain’s root directory (yoursite.com/robots.txt) that tells search engine crawlers which pages or sections of your site they’re allowed to access.

Every major search engine — Google, Bing, Yahoo — checks this file before crawling your site. Think of it as a bouncer’s list. Not a lock on the door, but a clear set of instructions that well-behaved bots follow.

According to Google’s own documentation, Googlebot checks robots.txt before making any request to your server. For sites with thousands of pages, that file becomes one of the most important pieces of your technical SEO setup.

Why Does Robots.txt Matter?

Getting your robots.txt wrong can tank your rankings overnight. One misplaced directive and Google can’t see your most important pages.

Crawl budget protection — Large sites have limited crawl budget. Blocking low-value pages (admin panels, staging areas, duplicate filters) keeps Googlebot focused on what matters.
Prevents indexing of sensitive areas — Internal search results, login pages, and cart pages don’t belong in the SERP. Robots.txt keeps bots away.
Faster discovery of new content — When crawlers aren’t wasting requests on junk pages, they find your new blog posts and product pages faster.
Server load management — Aggressive bots can strain small servers. Blocking unnecessary crawling reduces resource consumption.

If you’re publishing content regularly — whether that’s 5 pages or 30 articles a month — you need crawlers spending their time on the right URLs.

How Robots.txt Works

The file uses a simple syntax. Three core directives handle most use cases.

User-Agent

This line specifies which crawler the rule applies to. User-agent: * targets all bots. User-agent: Googlebot targets only Google’s crawler. You can stack multiple rules for different bots in the same file.

Disallow

The Disallow directive blocks a specific path. Disallow: /admin/ prevents crawlers from accessing anything under the /admin/ directory. Leave it blank (Disallow:) and you’re allowing everything. A single forward slash (Disallow: /) blocks the entire site.

Allow and Sitemap

Allow overrides a broader Disallow rule for specific paths — useful when you block a directory but want one page inside it crawled. The Sitemap directive points crawlers to your XML sitemap, helping them discover all your important URLs without guessing.

How Google Processes It

Googlebot fetches your robots.txt before crawling anything else. If the file returns a 200 status, Google follows the rules. A 404 means “no restrictions” — everything gets crawled. A 5xx error makes Google temporarily cautious and limits crawling until the file becomes accessible again.

Types of Robots.txt Directives

Robots.txt directives fall into 4 main categories:

Access directives (Allow/Disallow) — Control which paths bots can visit. The foundation of every robots.txt file.
User-agent directives — Target rules at specific bots. You might block SEMrushBot while allowing Googlebot full access.
Crawl-delay directives — Tell bots to wait between requests. Google ignores this (use Google Search Console instead), but Bing and Yandex respect it.
Sitemap directives — Point to your sitemap file. Not technically a “rule,” but a discovery mechanism bots rely on.

Most small-to-medium sites only need access directives and a sitemap reference. Crawl-delay matters more for large-scale sites with server constraints.

Robots.txt Examples

Example 1: Local plumbing company A plumber in Austin has a WordPress site with /wp-admin/, /cart/, and /internal-pricing/ directories. Their robots.txt blocks all three and includes a sitemap reference. Result: Googlebot spends its time on service pages and blog posts — not admin panels.

Example 2: eCommerce store with filtered pages An online retailer has 50 products but 3,000 filter combinations (size + color + price). Without robots.txt blocking /products?filter=, Googlebot wastes crawl budget on duplicate filtered pages. One Disallow line fixes it.

Example 3: Accidentally blocking the entire site A marketing agency moved from staging to production and left Disallow: / in robots.txt. For 3 weeks, nothing got indexed. Traffic dropped to zero. One character caused it — the forward slash after Disallow.

Robots.txt vs Meta Robots Tag

These two do different jobs at different stages. Robots.txt stops crawlers before they reach a page. The meta robots tag gives instructions after a crawler has already accessed it.

	Robots.txt	Meta Robots Tag
Where it lives	Root directory file	HTML `<head>` of individual pages
When it acts	Before crawling	After crawling
Scope	Entire directories or paths	Individual pages
Can prevent indexing?	No — only prevents crawling	Yes — `noindex` removes from search
Best for	Blocking sections of a site	Removing specific pages from search

Here’s the catch: if you block a page with robots.txt, Google can’t see a noindex tag on that page. So the page might still appear in search results (with no snippet) because Google found a link to it elsewhere. To truly remove a page from search, use the meta robots tag — not robots.txt.

Robots.txt Best Practices

Always include a Sitemap directive — Point to your XML sitemap so crawlers have a complete map of your site. One line: Sitemap: https://yoursite.com/sitemap.xml.
Never block CSS or JavaScript files — Google needs to render your pages to understand them. Blocking these resources hurts your on-page SEO.
Test before deploying — Use Google Search Console’s robots.txt tester to check your rules. A typo can block your entire site.
Review quarterly — As your site grows, new directories appear. What made sense 6 months ago might be blocking important content today.
Pair with a content strategy — Robots.txt manages what gets crawled, but you still need pages worth crawling. Services like theStacc publish 30 SEO-optimized articles per month, giving Googlebot fresh content to discover on every visit.

Frequently Asked Questions

Does robots.txt stop pages from appearing in Google?

Not directly. Robots.txt prevents crawling, not indexing. If other sites link to a blocked page, Google may still show it in results — just without a description snippet. Use a noindex meta tag to fully remove a page from search.

Where do I put my robots.txt file?

Place it at your domain’s root: https://yoursite.com/robots.txt. Subdirectory placement doesn’t work. Each subdomain needs its own robots.txt file.

Can robots.txt improve my rankings?

Indirectly, yes. Blocking low-value pages preserves crawl budget for your important content. On large sites, this means faster discovery and indexing of new pages — which can speed up ranking improvements.

Do all bots follow robots.txt rules?

Legitimate search engine bots (Googlebot, Bingbot) respect robots.txt. Malicious bots and scrapers typically ignore it. Don’t rely on robots.txt for security — it’s a guideline, not a firewall.

Want to make sure your SEO content actually gets crawled and ranked? theStacc publishes 30 SEO-optimized articles to your site every month — automatically. Start for $1 →

Sources

Related Terms

Crawling

Crawling is the process search engines use to discover and scan web pages. Learn how crawling works, the role of Googlebot, and how to ensure your pages get crawled.

Index / Indexing

Indexing is the process of adding web pages to a search engine's database. Learn how indexing works, how to check if pages are indexed, and how to fix indexing issues.

Meta Robots Tag

The meta robots tag is an HTML element that instructs search engines how to crawl, index, and display a specific page. Directives include noindex, nofollow, nosnippet, and more.

Technical SEO

Technical SEO is the practice of optimizing your website's infrastructure — crawlability, indexability, site speed, security, and structured data — so search engines can access, understand, and rank your content effectively.

Sitemap (XML)

An XML sitemap is a file that lists all the important URLs on your website, helping search engines like Google discover, crawl, and index your pages more efficiently.

Learn More

Blog SEO Module Free SEO Audit Best AI SEO Tools AI SEO Automation Platform

See Pricing →Free SEO Tools SEO Blog SEO Glossary

What is Robots.txt?

On This Page

What is Robots.txt?

Why Does Robots.txt Matter?

How Robots.txt Works

User-Agent

Disallow

Allow and Sitemap

How Google Processes It

Types of Robots.txt Directives

Robots.txt Examples

Robots.txt vs Meta Robots Tag

Robots.txt Best Practices

Frequently Asked Questions

Does robots.txt stop pages from appearing in Google?

Where do I put my robots.txt file?

Can robots.txt improve my rankings?

Do all bots follow robots.txt rules?

Sources

Related Terms

Learn More

Ready to automate your SEO?