Robots.txt: Complete Guide for Search Engine Crawling
The robots.txt file tells search engine crawlers which pages they can and cannot access on your site. It is one of the first files crawlers look for when visiting a website.
How Robots.txt Works
The file must be placed at the root of your domain (example.com/robots.txt). Crawlers read it before accessing any other page. Rules specify which user agents (crawlers) can access which paths.
Basic Syntax
User-agent specifies which crawler the rules apply to. Use * for all crawlers. Disallow prevents crawlers from accessing specified paths. Allow explicitly permits access to paths within a disallowed directory.
Common Configurations
Block all crawlers from a directory: Disallow /admin/. Allow a specific page in a blocked directory: Allow /admin/public. Block a specific crawler: User-agent: BadBot followed by Disallow /.
Important Considerations
Robots.txt is a suggestion, not a security measure. Malicious bots may ignore it. Do not use it to hide sensitive information. Blocking CSS and JavaScript can hurt SEO because search engines need these to render your pages.
Sitemap Reference
Always include a Sitemap directive pointing to your XML sitemap. This helps crawlers discover all your pages efficiently. Use our Robots.txt Generator to create properly formatted files.
🏷️ Try our SEO tools!