Ultimate Magento Robots.txt File Examples

Ultimate Magento Robots.txt File Examples

Extremely common question when it comes to eCommerce – and for that matter Magento SEO – is how a robots.txt file should look and what should be in it. For the purpose of this article, I decided to take all of our knowledge and experience, some sample robots.txt files from our clients sites and some examples from other industry leading Magento studios to try and figure out an ultimate Magento robots.txt file.

Robots.txt Magento

Please note that you should never just take some of these generic files and place it as your robots.txt file on your specific Magento store blindly. Every store has its own structure and almost in every case there’s a need to modify some of the robots.txt’s content to better fit the specific needs of your store’s URL structure and indexing priorities you have. Always ask your eCommerce consultants to edit the Robots.txt file for your specific case and double check that everything that should be indexable indeed is using Google Webmaster Tools robots.txt testing tool before you deploy it live.

Inchoo’s recommended Magento robots.txt boilerplate:

# Google Image Crawler Setup
User-agent: Googlebot-Image
Disallow:

# Crawlers Setup
User-agent: *

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
#Disallow: /js/
#Disallow: /lib/
Disallow: /magento/
#Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
#Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/
Disallow: /catalog/product/gallery/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
#Disallow: /*.js$
#Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=

As you can see, the file above allows image indexing for image search while disallowing some blank image pages as explained in a tutorial by my mate Drazen.

It prevents some of the folders that are usually unwanted in index for a common Magento online store setup.

Please note that it doesn’t disallow most of the sorting and pagination parameters as we assume you’ll solve them using rel prev next implementation and by adding meta “noindex, follow” to the rest of the sorting parameters. For more info why meta “noindex, follow” and not “noindex, nofollow” read this.

In some cases you might want to allow reviews to be indexed. In that case remove “Disallow: /review/” part from the robots.txt file.

UPDATE: Since a lot of people in the comments talked about javaScript and image blocking and didn’t read the instructions in this post carefully, I decided to edit the recommended robots.txt file. The one above now allows indexing of the same. You’ll also notice that the file now allows “/checkout/”. This is due to our new findings that it is beneficial to allow Google to see your checkout. Read more in this post.

Robots.txt examples from portfolio websites of some other top Magento agencies:

One from BlueAcorn:

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /customer/
Disallow: /checkout/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Allow: /media/catalog/product/
Disallow: /*.php$
Disallow: /skin/
Disallow: /catalog/product/view/

User-agent: Googlebot-Image
Disallow: /
Allow: /media/catalog/product/

Sitemap: http://example.com/sitemap/sitemap.xml

Here’s another one from BlueAcorn similar to our recommended robots.txt file but with a little twist:

# Crawlers Setup
User-agent: *
Crawl-delay: 10

# Allowable Index
Allow: /*?p=

Allow: /media/

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
# Disallow: /media/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

As you can see above, they allow ?p parameter but disallow it in case there’s another parameter used at the same time with the ?p. This approach is quite interesting as it allows the rel prev next implementation while disallowing lots of combinations with other attributes. I still prefer solving those issues through “noindex, follow” but this is not bad either.

Here is an example of robots.txt file, very similar to what we’re using, coming form Groove Commerce‘s portfolio:

# Groove Commerce Magento Robots.txt 05/2011
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these “robots” where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

# Website Sitemap
Sitemap: http://www.eckraus.com/sitemap.xml

# Crawlers Setup

# Directories
User-agent: *
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/
Disallow: /blog/

# Paths (clean URLs)
User-agent: *
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/

# Files
User-agent: *
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
User-agent: *
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

Here’s an example from Astrio‘s portfolio:

User-agent: *
Disallow: /*?
Disallow: /app/
Disallow: /catalog/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /customer/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /tag/
Disallow: /review/
Disallow: /var/

Source: Magento Robots.txt Examples from different top Magento Agencies