robots.txt and Sitemap.xml: Best Practices for Crawl Control

2026-03-24 Technical SEO

robots.txt Fundamentals

robots.txt is a plain-text file at the root of your domain that instructs web crawlers which pages they can and cannot access.

User-agent: *
Disallow: /admin/
Disallow: /internal/
Allow: /

User-agent: Googlebot
Disallow: /staging/

Key points: - * applies to all crawlers; named rules take precedence - Disallow: / blocks all crawling (useful for staging environments) - The file must be at https://yourdomain.com/robots.txt

Sitemap.xml Fundamentals

A sitemap tells search engines which URLs to index and how often they change:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-05-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
</urlset>

Validating via API

curl -X POST https://api.toolkitapi.io/v1/seo/validate-robots \
  -H "X-API-Key: $API_KEY" \
  -d '{"url": "https://example.com"}'

{
  "robots_txt_found": true,
  "sitemap_declared": true,
  "sitemap_url": "https://example.com/sitemap.xml",
  "sitemap_urls_count": 142,
  "blocked_paths": ["/admin/", "/internal/"],
  "issues": []
}

Common Mistakes

Mistake	Consequence
Blocking CSS/JS via robots.txt	Google can't render pages correctly
`Disallow: /` on production	Site completely removed from search
Sitemap not declared in robots.txt	Crawlers may not find it
Sitemap contains non-canonical URLs	Confuses crawlers about the preferred URL

Linking Sitemap in robots.txt

Always declare your sitemap:

Sitemap: https://example.com/sitemap.xml

Try it out

Browse Tools →

robots.txt and Sitemap.xml: Best Practices for Crawl Control

robots.txt Fundamentals

Sitemap.xml Fundamentals

Validating via API

Common Mistakes

Linking Sitemap in robots.txt

Try it out

More from the Blog

The Complete On-Page SEO Audit Checklist for Developers

Validating Structured Data (JSON-LD) for Rich Results

How Broken Links Hurt Your SEO and How to Find Them