Hire us

Identify Your Sitemap in Robots.txt

Matt Oakley - 14 February 2018
Identify Your Sitemap in Robots.txt
Sign up to receive a weekly recap from Superpowers, Gadgets & Villains

In recent posts we’ve been discussing XML Sitemaps and how important they are for SEO. The primary purpose of a sitemap is to assist search engines in mapping the structure of your site. It’s your opportunity to influence which content they index.

Search engine crawlers are reasonably intelligent. They know that many sites will put in place a sitemap. Convention dictates that a sitemap should be available at /sitemap.xml or /sitemap.xml.gz if using a compressed sitemap.

Crawlers will generally check for the existence of these files before continuing their crawl. Yet it is sometimes the case that you wish to name your sitemap differently.

Website owners can make use of the /robots.txt file to tell search crawlers where sitemaps can be found. In fact, the robots.txt file is usually the first place that a “bot” will look before starting to crawl. This makes it the perfect place to leave instructions about the sitemap.

To specify the location of your sitemap in the robots.txt file simply add the following line, updating the domain and path to your sitemap:

1
Sitemap: http://www.yourdomain.com/sitemap.xml

It’s as simple as that. You’re no longer relying on the search crawler to guess where your sitemap is hosted.

Using Multiple Sitemaps?

No problem. Multiple sitemap records are supported within robots.txt and can be specified as:

1
2
3
Sitemap: http://www.yourdomain.com/sitemap-products.xml
Sitemap: http://www.yourdomain.com/sitemap-categories.xml
Sitemap: http://www.yourdomain.com/sitemap-articles.xml

If you are using multiple sitemaps then you might consider an Index Sitemap as described in this dated but relevant article from Moz.

Using an index file simply point to that file in robots.txt and let the index file do the rest:

1
Sitemap: http://www.yourdomain.com/sitemap-index.xml

Conclusion

Specifying the sitemap in the robots.txt is not strictly required. It’s more of a best practice. There’s no reason not to include the instruction and so we ensure it is included on all sites which we build.

Let’s build something together!

Lets build something

Let's build something great together!

We're working with some great companies and we'd love to add your startup to that list. Get in touch today for a free consultation with one of our Heroes.

Start your project today