SEO - Advanced Sitemaps
General Information
Magento's native sitemap generator was designed for small catalogues. The moment your store crosses a few thousand SKUs, generation becomes slow, memory-hungry, and the resulting files often blow past Google's 50 MB / 50,000-URL per-file limits. Search engines silently ignore over-sized sitemaps — and when they do, the URLs you most need indexed simply don't get crawled.
The Qoliber SEO Advanced Sitemaps extension replaces Magento's sitemap engine with a production-grade pipeline that streams items one at a time, splits output across as many files as needed, and tells search engines exactly where to find them.
What Does This Extension Do?
- Splits large sitemaps — configure URLs-per-file (default 1,500) and the module emits
sitemap_products_1.xml,sitemap_products_2.xml, … so every file stays under search-engine limits. - Streams XML generation — items are yielded one at a time instead of buffered into memory. A 100,000-product catalogue uses the same memory footprint as a 100-product one.
- Gzip-compressed output — optional
.xml.gzvariant, supported by every major search engine, dramatically reduces bandwidth on each crawl. - Auto-announces sitemaps in
robots.txt— the module appends aSitemap:line for every registered file. No more hand-editingrobots.txtafter each generation run. - Per-entity-type sitemaps — separate files for products, categories, CMS pages and images, each with their own scheduling and limits.
- Excludes specific CMS pages — keep your terms-and-conditions, login and cart pages out of the index, where they don't belong.
- Pluggable architecture — implement
SitemapGeneratorInterfaceto add custom entity types (blog posts, brand pages, store locator) without touching core code.
Why Is This a Game Changer?
- Won't crash on big catalogues. Streaming generation means memory usage is constant regardless of catalogue size — no
Allowed memory size of …errors during the sitemap cron. - Search engines will actually read it. Sub-50 MB files, gzip transport, and
robots.txtdiscoverability mean Google, Bing and friends crawl your sitemap on every visit. - No manual robots.txt edits, ever. When you add a new sitemap entry in admin, the
Sitemap:line appears inrobots.txtautomatically. - Works on a stock Magento install. No custom indexers, no extra cron jobs, no infrastructure beyond what Magento already uses.
How Does It Work?
The module overrides Magento's sitemap generation pipeline with a streaming, multi-generator pipeline. Each generator owns one entity type — products, categories, CMS pages, images, hreflang clusters, video, static URLs — and yields its items one at a time. The service collects the output into chunks (capped at your configured URLs-per-file limit), writes each chunk to disk as a numbered child sitemap, and finishes by writing a parent sitemapindex that links them all together. A separate RobotsTxtInjector plugin appends Sitemap: lines to your store's robots.txt, keeping the announcement in sync with what's actually on disk.
Generated sitemaps location
Sitemaps are generated in {MAGENTO_ROOT_DIR}/pub/media/sitemap/{STORE_CODE}/{STORE_LOCALE} — the path you configure in Marketing → SEO & Search → Site Map.
Sitemaps split by entity
Every entity type ends up in its own file (or its own set of files, when the URL count exceeds the per-file limit). The default generators ship with these prefixes:
| Generator | File prefix | Contents |
|---|---|---|
| Products | sitemap_products | Every visible, in-stock catalogue product URL |
| Categories | sitemap_categories | Every active, anchored category URL |
| CMS pages | sitemap_cms | Every published, non-excluded CMS page |
| Images | sitemap_images | Image-sitemap entries (<image:image>) for products with gallery images |
| Hreflang | sitemap_hreflang | Per-URL <xhtml:link rel="alternate" hreflang="…"> clusters across store views |
| Static URLs | sitemap_static | Homepages, contact, cart, login and other static routes |
| Videos | sitemap_videos | Video-sitemap entries (<video:video>) for products with attached video |
File-naming rules
- Single chunk — when a generator's output fits in one file, no suffix is added:
sitemap_products.xml. - Multi-chunk — when the URL count exceeds the per-file limit, output is split:
sitemap_products_1.xml,sitemap_products_2.xml, …. - Gzip — when gzip is enabled, every file gets a
.gzsuffix:sitemap_products.xml.gz,sitemap_products_1.xml.gz, ….
The behaviour is backward-compatible with pre-2.1 installs that expected unsuffixed single-chunk filenames.
Example: parent sitemap.xml (sitemapindex)
The root index file (configured per Magento's standard sitemap admin) references every child:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_products_1.xml</loc>
<lastmod>2026-05-09T08:31:14+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_products_2.xml</loc>
<lastmod>2026-05-09T08:31:14+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_categories.xml</loc>
<lastmod>2026-05-09T08:31:14+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_cms.xml</loc>
<lastmod>2026-05-09T08:31:14+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_images.xml</loc>
<lastmod>2026-05-09T08:31:14+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_hreflang.xml</loc>
<lastmod>2026-05-09T08:31:14+00:00</lastmod>
</sitemap>
</sitemapindex>Example: a child sitemap (sitemap_products_1.xml)
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://www.mystore.com/blue-yoga-pants.html</loc>
<lastmod>2026-05-08T14:22:01+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
<url>
<loc>https://www.mystore.com/black-yoga-mat.html</loc>
<lastmod>2026-05-08T11:09:48+00:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.7</priority>
</url>
<!-- … up to URLs-per-file limit … -->
</urlset>Example: an image sitemap (sitemap_images.xml)
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://www.mystore.com/blue-yoga-pants.html</loc>
<image:image>
<image:loc>https://www.mystore.com/media/catalog/product/cache/.../mp01-blue-yoga-pants.jpg</image:loc>
<image:title>Blue Yoga Pants</image:title>
<image:caption>Blue Yoga Pants - Premium technical fabric</image:caption>
</image:image>
</url>
</urlset>Example: robots.txt announcement
The module appends a Sitemap: line per registered sitemap so search engines discover the entire tree from robots.txt:
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Sitemap: https://www.mystore.com/media/sitemap/default/en_US/sitemap.xml(Crawlers fetch the index, then follow every child sitemap referenced inside it.)
Configuration at a glance
Find the module under Stores → Configuration → Qoliber → SEO: Advanced Sitemaps. Toggle gzip output, set URLs-per-file, and decide which entity types to include — the rest is automated.
What's New in 2.1
- Streaming generation — sitemap items now yield one at a time; memory footprint is decoupled from catalogue size.
- Gzip output toggle — opt-in
.xml.gzvariant cuts crawler bandwidth. - Auto-injection into
robots.txt— registered sitemaps appear inrobots.txtwithout manual edits. SitemapGeneratorInterface::getItems()returnsiterable— backward-compatible signature change for custom generators (existing array returns continue to work).