4 min readMay 9, 2026by jakub

SEO - Advanced Sitemaps

General Information

Magento's native sitemap generator was designed for small catalogues. The moment your store crosses a few thousand SKUs, generation becomes slow, memory-hungry, and the resulting files often blow past Google's 50 MB / 50,000-URL per-file limits. Search engines silently ignore over-sized sitemaps — and when they do, the URLs you most need indexed simply don't get crawled.

The Qoliber SEO Advanced Sitemaps extension replaces Magento's sitemap engine with a production-grade pipeline that streams items one at a time, splits output across as many files as needed, and tells search engines exactly where to find them.

What Does This Extension Do?

  • Splits large sitemaps — configure URLs-per-file (default 1,500) and the module emits sitemap_products_1.xml, sitemap_products_2.xml, … so every file stays under search-engine limits.
  • Streams XML generation — items are yielded one at a time instead of buffered into memory. A 100,000-product catalogue uses the same memory footprint as a 100-product one.
  • Gzip-compressed output — optional .xml.gz variant, supported by every major search engine, dramatically reduces bandwidth on each crawl.
  • Auto-announces sitemaps in robots.txt — the module appends a Sitemap: line for every registered file. No more hand-editing robots.txt after each generation run.
  • Per-entity-type sitemaps — separate files for products, categories, CMS pages and images, each with their own scheduling and limits.
  • Excludes specific CMS pages — keep your terms-and-conditions, login and cart pages out of the index, where they don't belong.
  • Pluggable architecture — implement SitemapGeneratorInterface to add custom entity types (blog posts, brand pages, store locator) without touching core code.

Why Is This a Game Changer?

  • Won't crash on big catalogues. Streaming generation means memory usage is constant regardless of catalogue size — no Allowed memory size of … errors during the sitemap cron.
  • Search engines will actually read it. Sub-50 MB files, gzip transport, and robots.txt discoverability mean Google, Bing and friends crawl your sitemap on every visit.
  • No manual robots.txt edits, ever. When you add a new sitemap entry in admin, the Sitemap: line appears in robots.txt automatically.
  • Works on a stock Magento install. No custom indexers, no extra cron jobs, no infrastructure beyond what Magento already uses.

How Does It Work?

The module overrides Magento's sitemap generation pipeline with a streaming, multi-generator pipeline. Each generator owns one entity type — products, categories, CMS pages, images, hreflang clusters, video, static URLs — and yields its items one at a time. The service collects the output into chunks (capped at your configured URLs-per-file limit), writes each chunk to disk as a numbered child sitemap, and finishes by writing a parent sitemapindex that links them all together. A separate RobotsTxtInjector plugin appends Sitemap: lines to your store's robots.txt, keeping the announcement in sync with what's actually on disk.

Generated sitemaps location

Sitemaps are generated in {MAGENTO_ROOT_DIR}/pub/media/sitemap/{STORE_CODE}/{STORE_LOCALE} — the path you configure in Marketing → SEO & Search → Site Map.

Sitemaps split by entity

Every entity type ends up in its own file (or its own set of files, when the URL count exceeds the per-file limit). The default generators ship with these prefixes:

GeneratorFile prefixContents
Productssitemap_productsEvery visible, in-stock catalogue product URL
Categoriessitemap_categoriesEvery active, anchored category URL
CMS pagessitemap_cmsEvery published, non-excluded CMS page
Imagessitemap_imagesImage-sitemap entries (<image:image>) for products with gallery images
Hreflangsitemap_hreflangPer-URL <xhtml:link rel="alternate" hreflang="…"> clusters across store views
Static URLssitemap_staticHomepages, contact, cart, login and other static routes
Videossitemap_videosVideo-sitemap entries (<video:video>) for products with attached video

File-naming rules

  • Single chunk — when a generator's output fits in one file, no suffix is added: sitemap_products.xml.
  • Multi-chunk — when the URL count exceeds the per-file limit, output is split: sitemap_products_1.xml, sitemap_products_2.xml, ….
  • Gzip — when gzip is enabled, every file gets a .gz suffix: sitemap_products.xml.gz, sitemap_products_1.xml.gz, ….

The behaviour is backward-compatible with pre-2.1 installs that expected unsuffixed single-chunk filenames.

Example: parent sitemap.xml (sitemapindex)

The root index file (configured per Magento's standard sitemap admin) references every child:

XML
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_products_1.xml</loc>
    <lastmod>2026-05-09T08:31:14+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_products_2.xml</loc>
    <lastmod>2026-05-09T08:31:14+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_categories.xml</loc>
    <lastmod>2026-05-09T08:31:14+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_cms.xml</loc>
    <lastmod>2026-05-09T08:31:14+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_images.xml</loc>
    <lastmod>2026-05-09T08:31:14+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.mystore.com/media/sitemap/default/en_US/sitemap_hreflang.xml</loc>
    <lastmod>2026-05-09T08:31:14+00:00</lastmod>
  </sitemap>
</sitemapindex>

Example: a child sitemap (sitemap_products_1.xml)

XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://www.mystore.com/blue-yoga-pants.html</loc>
    <lastmod>2026-05-08T14:22:01+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>
  <url>
    <loc>https://www.mystore.com/black-yoga-mat.html</loc>
    <lastmod>2026-05-08T11:09:48+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.7</priority>
  </url>
  <!-- … up to URLs-per-file limit … -->
</urlset>

Example: an image sitemap (sitemap_images.xml)

XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset
    xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://www.mystore.com/blue-yoga-pants.html</loc>
    <image:image>
      <image:loc>https://www.mystore.com/media/catalog/product/cache/.../mp01-blue-yoga-pants.jpg</image:loc>
      <image:title>Blue Yoga Pants</image:title>
      <image:caption>Blue Yoga Pants - Premium technical fabric</image:caption>
    </image:image>
  </url>
</urlset>

Example: robots.txt announcement

The module appends a Sitemap: line per registered sitemap so search engines discover the entire tree from robots.txt:

User-agent: *
Disallow: /admin/
Disallow: /checkout/

Sitemap: https://www.mystore.com/media/sitemap/default/en_US/sitemap.xml

(Crawlers fetch the index, then follow every child sitemap referenced inside it.)

Configuration at a glance

Find the module under Stores → Configuration → Qoliber → SEO: Advanced Sitemaps. Toggle gzip output, set URLs-per-file, and decide which entity types to include — the rest is automated.

What's New in 2.1

  • Streaming generation — sitemap items now yield one at a time; memory footprint is decoupled from catalogue size.
  • Gzip output toggle — opt-in .xml.gz variant cuts crawler bandwidth.
  • Auto-injection into robots.txt — registered sitemaps appear in robots.txt without manual edits.
  • SitemapGeneratorInterface::getItems() returns iterable — backward-compatible signature change for custom generators (existing array returns continue to work).
SEO - Advanced Sitemaps — SEO Suite — Extensions | qoliber Docs