Robots.txt: Complete Guide for Search Engine Crawling

The robots.txt file tells search engine crawlers which pages they can and cannot access on your site. It is one of the first files crawlers look for when visiting a website.

How Robots.txt Works

The file must be placed at the root of your domain (example.com/robots.txt). Crawlers read it before accessing any other page. Rules specify which user agents (crawlers) can access which paths.

Basic Syntax

User-agent specifies which crawler the rules apply to. Use * for all crawlers. Disallow prevents crawlers from accessing specified paths. Allow explicitly permits access to paths within a disallowed directory.

Common Configurations

Block all crawlers from a directory: Disallow /admin/. Allow a specific page in a blocked directory: Allow /admin/public. Block a specific crawler: User-agent: BadBot followed by Disallow /.

Important Considerations

Robots.txt is a suggestion, not a security measure. Malicious bots may ignore it. Do not use it to hide sensitive information. Blocking CSS and JavaScript can hurt SEO because search engines need these to render your pages.

Sitemap Reference

Always include a Sitemap directive pointing to your XML sitemap. This helps crawlers discover all your pages efficiently. Use our Robots.txt Generator to create properly formatted files.