Sitemap Validator
Validate any XML sitemap against the sitemaps.org spec in seconds. Auto-detects <urlset> vs <sitemapindex>, checks required <loc>, ISO 8601 <lastmod>, <changefreq> enum, <priority> range, 50K-URL and 50MB spec limits. Line-numbered errors. 100% in-browser.
How to Use This Tool
- Fetch the sitemap you want to validate. Open the live URL in a browser (e.g.,
https://example.com/sitemap.xml), view source, and copy the full XML. Or, if you host the file locally, grab it directly from your server. The tool accepts both regular sitemaps (<urlset>) and sitemap indexes (<sitemapindex>) and auto-detects which one you pasted. - Paste the XML or upload the .xml file. Drop the content into the large textarea, click Upload .xml file to load from disk, or drag-and-drop a file onto the textarea. Files up to 50MB are accepted — anything larger violates the spec and must be split into multiple sitemaps inside a sitemap index.
- Click Validate (or press Ctrl/Cmd+Enter). The tool first parses XML with the browser's native DOMParser for well-formedness, then runs every spec check: namespace match, required
<loc>, ISO 8601<lastmod>,<changefreq>enum,<priority>range, 50K-URL limit, and 50MB file-size limit. - Read the verdict banner. Green means the sitemap is valid. Yellow means it passes but has warnings (typically duplicate URLs). Red means at least one hard error — Google would refuse to process the file.
- Work through the line-numbered errors. Every error shows the line number in your pasted XML, the severity, a human-readable message, and a recommended fix. Jump to that line in your editor or CMS template and correct the issue, then re-validate.
- Export or download fixes. Copy the error list to paste into a Jira ticket, export errors as CSV for the dev team, or click Download Cleaned XML to get a version where invalid priorities, non-ISO dates, and bad changefreqs are stripped and URLs with missing/invalid
<loc>are removed entirely.
About XML Sitemaps & Validation
The XML sitemap protocol, published at sitemaps.org in 2006 by Google, Yahoo, and Microsoft, is the de-facto standard for telling search engines which URLs on your site you want indexed. Every significant crawler — Googlebot, Bingbot, Yandexbot, Baiduspider, and a long tail of alternative engines — consumes sitemaps per the same 0.9 specification. In 2026, sitemaps are arguably more important than they were fifteen years ago: modern sites ship thousands of JavaScript-rendered, API-hydrated, or dynamically generated pages that are hard for crawlers to discover through links alone. A well-formed sitemap shortcuts that discovery problem and puts crawl priority in your hands.
Why validate? Because invalid sitemaps silently fail. If Google Search Console reports "Couldn’t fetch" or "Sitemap could not be read," the typical culprit is a subtle syntax problem: a missing namespace, a stray BOM character, a <lastmod> in MM/DD/YYYY format instead of ISO 8601, a <priority> value of 1.5, a <changefreq> value of sometimes, or a URL over 2048 characters. These bugs rarely break the XML syntactically — the file parses fine — but they violate the sitemaps.org schema, so Google either rejects the file or silently drops the offending entries. Our validator catches each of these at the spec level, before you submit.
The spec limits every search engineer should memorize: 50,000 URLs per sitemap file, 50MB uncompressed per file, 2048 characters per <loc>. Under those ceilings, a sitemap must use xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" on its root element. The root is <urlset> for a regular sitemap and <sitemapindex> for an index of sitemaps — you cannot mix <url> and <sitemap> children in the same file. Each <url> must contain one <loc>; <lastmod>, <changefreq>, and <priority> are optional hints.
Common mistakes we see in production sitemap audits: (1) <lastmod> in locale-specific formats like 4/4/2026 or Apr 4 2026 — these silently drop the date signal; (2) <priority> values outside 0.0-1.0, usually from CMS plugins that misinterpret user input as percentages; (3) non-enum <changefreq> values like rare, biweekly, sometimes; (4) relative URLs in <loc>, which Google ignores outright — the spec requires absolute URLs; (5) URLs pointing at http:// instead of the canonical https://; (6) trailing-slash inconsistency between sitemap URLs and canonical URLs, which splits indexing signals; (7) files exceeding 50,000 URLs without a sitemap index to wrap them; (8) duplicate <loc> entries, usually from paginated CMSes that republish every URL across every category.
Sitemap indexes are the standard escape hatch for large sites. Instead of one sitemap.xml containing every URL on the site, you ship an index file with children like sitemap-pages.xml, sitemap-blog.xml, sitemap-products-1.xml, sitemap-products-2.xml. Each child stays under the 50K URL cap, and Google discovers them automatically when it reads the index. EmproIT's recommended practice — codified in our internal sitemap batching guidance — is to cap each child sitemap at 200 URLs, well below the spec limit, so individual sitemaps remain small, fast to regenerate on content updates, and trivial to debug when a single page is misindexed. At 200 URLs per file, a 100,000-URL site needs 500 child sitemaps under one index — still far inside the 50,000-child-per-index ceiling.
Submission workflow: host the sitemap at a stable URL (most commonly /sitemap.xml at the root), add a Sitemap: directive to robots.txt, and submit the sitemap URL in Google Search Console (Indexing → Sitemaps) and Bing Webmaster Tools. If you ship a sitemap index, submit the index URL only — not the children. Search engines also auto-discover sitemaps via robots.txt, so the Search Console submission is redundant but helpful for monitoring. Google Search Console will tell you how many URLs it discovered, how many it indexed, and whether parsing errored out. Our validator catches every parse error before submission, so you never hit the red "Sitemap could not be read" banner in Search Console.
At EmproIT, our Technical SEO team builds and maintains sitemaps for enterprise sites with hundreds of thousands of URLs: multi-language, multi-regional, paginated category archives, dynamically generated landing pages, syndicated content, and hreflang clusters. We run automated validation on every deploy — catching regressions before they hit production — and monitor Search Console indexing metrics weekly. For teams without a dedicated SEO engineer, sitemap validation is a fast-win hygiene item that pays back in improved indexing rates and cleaner Search Console reports. Combine this tool with our Sitemap Generator (to build valid sitemaps from scratch), our Robots.txt Generator (to advertise your sitemap correctly), and our Canonical Checker (to confirm the URLs in your sitemap match your canonical URLs exactly).
Frequently Asked Questions
What is the sitemaps.org spec?
sitemaps.org is the open protocol co-authored by Google, Yahoo, and Microsoft in 2006 that defines the XML sitemap format. Every major search engine — Google, Bing, Yandex, Baidu, DuckDuckGo — consumes sitemaps per the same 0.9 spec. A valid sitemap must use the namespace http://www.sitemaps.org/schemas/sitemap/0.9 on either a <urlset> root (for regular sitemaps) or a <sitemapindex> root (for an index of child sitemaps). Each <url> entry requires a <loc> element containing an absolute URL up to 2048 characters, and may optionally include <lastmod>, <changefreq>, and <priority>. The spec also caps each file at 50,000 URLs and 50MB uncompressed. Our validator checks every one of these requirements.
Why is the sitemap limit 50,000 URLs per file?
The 50,000-URL-per-file cap comes directly from the sitemaps.org protocol, and it exists so crawlers can process sitemaps with predictable memory usage and bandwidth. Google enforces the limit strictly — if your sitemap exceeds 50,000 URLs, Google stops reading at 50,000 and ignores the rest. The standard workaround is a sitemap index: instead of one giant sitemap.xml, you ship sitemap-1.xml, sitemap-2.xml, sitemap-3.xml (each under 50K URLs) and reference them from a <sitemapindex>. EmproIT recommends splitting at 200-1000 URLs per file for easier debugging and faster incremental updates — a much tighter cap than the spec allows, but trivial to host and maintain. Our validator warns when any single file exceeds the 50K limit.
Why is the sitemap file size limit 50MB?
The 50MB cap (uncompressed) also comes from the sitemaps.org spec. Before May 2022 the cap was 10MB; Google raised it to 50MB to accommodate larger sites. If a sitemap exceeds 50MB, Google will not process it — the whole file is rejected, not truncated. In practice, sitemaps rarely hit the byte cap before hitting the URL cap: 50,000 typical URLs with all optional fields run about 10-15MB. If you are anywhere near 50MB, split the file. gzip compression (sitemap.xml.gz) is accepted by Google and reduces transfer size by about 80%, but the 50MB limit is measured uncompressed. Our validator computes the byte size at validation time and flags any file that exceeds the cap.
What is the correct date format for <lastmod>?
<lastmod> must be a W3C datetime, which is a subset of ISO 8601. The simplest valid form is YYYY-MM-DD (e.g., 2026-04-04). You can add precision: YYYY-MM-DDThh:mmTZD (e.g., 2026-04-04T14:30+00:00), YYYY-MM-DDThh:mm:ssTZD (e.g., 2026-04-04T14:30:00Z), or YYYY-MM-DDThh:mm:ss.sTZD with fractional seconds. The timezone designator (TZD) is either Z for UTC or an offset like +05:30 or -08:00. Formats that LOOK similar but are invalid: 04/04/2026, 2026/04/04, Apr 4 2026, epoch timestamps, or any timezone abbreviation like EST/PST. Our validator checks <lastmod> against the full W3C datetime regex and flags anything outside the spec as an error.
What are the valid <changefreq> values?
The <changefreq> element accepts exactly one of seven values: always, hourly, daily, weekly, monthly, yearly, never. Anything else — sometimes, rare, once, biweekly, occasional, or any capitalized form Daily/DAILY — is invalid. Note that <changefreq> is a hint to crawlers, not a guarantee. Google has publicly stated since 2017 that it essentially ignores <changefreq> and relies on its own change-detection heuristics: HTTP headers, content hashes, historical crawl patterns. Bing still uses <changefreq> as input. Best practice in 2026 is to ship accurate <lastmod> values (which Google does use) and to either omit <changefreq> or set it realistically per-page. Our validator rejects any value outside the seven-option enum.
What do <priority> values mean?
<priority> is a decimal between 0.0 and 1.0 (default 0.5) indicating the relative importance of a URL within your own site. It does NOT affect how your URLs rank against other sites — it is strictly a within-site hint. Google, like Bing, largely ignores <priority> because every site owner inflates their own values. Typical allocation: 1.0 for the homepage, 0.8-0.9 for major category/service pages, 0.5-0.7 for inner content, 0.3 or omitted for deep leaf pages. Our validator flags any value outside the 0.0-1.0 range as an error and warns on non-standard formatting (more than one decimal, priorities as percentages, etc.). If you want to simplify sitemaps, it is perfectly valid to omit <priority> entirely — the spec treats it as optional.
What is the difference between <urlset> and <sitemapindex>?
A <urlset> is a regular sitemap: its root is <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> and its children are <url> elements, each containing a <loc>. A <sitemapindex> is a sitemap OF sitemaps: its root is <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> and its children are <sitemap> elements, each containing a <loc> that points at a child <urlset>. Sitemap indexes are the standard way to handle the 50,000-URL-per-file cap: ship sitemap.xml as the index, sitemap-pages.xml / sitemap-blog.xml / sitemap-products.xml as child urlsets. Google and Bing discover child sitemaps through the index automatically. Our validator auto-detects the root type and applies the appropriate rule set — it does not try to validate a sitemap index against <url> rules or vice versa.
How do I submit a sitemap to Google Search Console?
First, host the sitemap at a stable URL — the root is fine (https://example.com/sitemap.xml), or any subdirectory. Reference it from robots.txt with Sitemap: https://example.com/sitemap.xml (absolute URL, one line per sitemap or index). Then in Google Search Console, open Indexing → Sitemaps, paste the sitemap URL, and click Submit. Google reads the file within hours (often minutes). If you ship a <sitemapindex>, submit ONLY the index URL — Google will follow it and discover the child sitemaps automatically. For Bing, the process is identical in Bing Webmaster Tools. Google Search Console reports parse errors, number of URLs discovered, and number indexed — our validator catches the parse errors BEFORE you submit so you avoid the "Couldn’t fetch" and "Sitemap could not be read" statuses entirely.