When building an eCommerce site, the robots.txt file will be one of the most important elements of your site’s search engine optimization (SEO).
eCommerce sites are typically bigger than the majority of sites on the Web. What contributes to these sites’ size is also the fact that they contain features such as faceted navigation.
This is why eCommerce sites need to be able to better control how search engines like Google crawl their site. By doing this, sites like these can manage crawl budgets and prevent Googlebot from crawling low-quality pages from their site.
While the default robots.txt on Shopify is very solid for most cases, you might need to do some adjustments to this file for some sites. Shopify has been growing exponentially with more sites starting to use the platform. The sites using Shopify are getting larger and more powerful, which means they require more crawl budget management—i.e. optimizing the robots.txt more.
In this article, we will discuss:
- The Shopify robots.txt
- Default Shopify robots.txt
- Shopify Robots.txt file location
- Constructing a Shopify Robots.txt.liquid
- Modifying the Shopify Robots.txt file
- Limitations of the Shopify Robots.txt. file
- Uses of modifying Robots.txt
What Is Shopify Robots.txt?
The Shopify robots.txt file’s function is to give directions to search engines e.g. Google, Bing, etc. on what URLs on your site they may crawl. The robots.txt file will usually block search engines from crawling low-quality pages that you don’t want to be crawled. We generate the Shopify robots.txt by using robots.txt.liquid file.
What Is NOT Indexed by Default in Shopify’s Robots.txt?
If you look at your Shopify site you will see an already configured robots.txt file. To find the robots.txt file you need to go to: domain.com/robots.txt
When you access the robots.txt file, you’ll find a set of preconfigured rules. The default preconfigured rules are set to block search engines from crawling nonessential pages. Here are some of the most useful rules you will find in the default Shopify robots.txt file:
- Disallow: /search: For blocking internal site search
- Disallow: /cart: For blocking the Shopping Cart page
- Disallow: /checkout: For blocking the Checkout page
- Disallow: /account: For blocking the Account page
- Disallow: /collections/*+*: For blocking duplicate category pages generated by the faceted navigation
- Sitemap: [Sitemap Links]: For referencing the sitemap.xml link
Most Shopify store owners won’t even need to make any adjustments to their robots.txt file as the default configuration will be enough to manage most cases. Because the bulk of Shopify sites is typically on the smaller side, crawl control isn’t a problem for many of them.
However, for the sites that do need it, it is possible for store owners to create additional rules to customize the robots.txt for their site. You can do this by creating and editing robots.txt. file.
Where Is the Shopify Robots.txt File Located?
In order to make adjustments to your Shopify robots.txt file you need to first locate this file. To find your Shopify robots.txt file you need to consider the root directory of the primary domain name of your Shopify store. So, essentially it is the domain name slash robots.txt.
For example: examplestore.com/robots.txt
What Is the Process for Constructing a Shopify Robots.txt.liquid?
Below you will find a step-by-step guide to constructing the Shopify robots.txt.liquid file in your store:
- Go to the left section of your Shopify admin page, go to Online Store > Themes
- Select Actions > Edit code
- Under “Templates”, click on the “Add a new template” link
- Click the dropdown on the far left and choose “robots.txt”
- Select “Create template”
After this the Shopify robots.txt.liquid file will open in the editor and you can go on editing the file.
How Can I Modify the Shopify Robots.txt File?
There are a few ways to modify or edit the Shopify robots.txt file:
- You can add additional rules to the Shopify robots.txt by writing new blocks of code to the robots.txt liquid file like in the example below:
{%- if group.user_agent.value == ‘*’ -%}
{{ ‘Disallow:
[URLPath]
‘ }}
{%- endif -%}
- If your Shopify site uses /search-results/ for the internal search function and you want to block it by editing the robots.txt, you simply need to add this command:
{%- if group.user_agent.value == ‘*’ -%}
{{ ‘Disallow: /search-results/.*’ }}
{%- endif -%}
- For blocking the multiple directories (/search-results/ & /private/, you need to add these two blocks to the file:
{%- if group.user_agent.value == ‘*’ -%}
{{ ‘Disallow: /search-results/.*’ }}
{%- endif -%}
{%- if group.user_agent.value == ‘*’ -%}
{{ ‘Disallow: /private/.*’ }}
{%- endif -%}
When you complete this action these lines will appear in your Shopify robots.txt file:
Disallow: /search-results/.*
Disallow: /private/.*
Sitemap: ##########
Know the Limitations of a Robots.txt File
While you can edit your Shopify robots.txt file to direct search engines to keep away from the pages you don’t want to be crawled, there are still some limitations to this, such as:
- A robots.txt file cannot prevent a page from being indexed by Google if it is linked by other sites.
- Not all crawlers will understand all the instructions on your robots.txt file as different crawlers will index syntax differently.
- You can’t guarantee that a crawler will obey your robots.txt instructions because not all search engines support robots.txt directives.
Possible Use Cases of Modifying Robots.txt
Editing a Shopify robots.txt file is difficult and it is not recommended to do if you don’t need to or if you don’t know what you are doing. So, how do you determine if your site would benefit from editing the robots.txt file? Here are some of the cases when you could seriously consider editing your Shopify robots.txt file:
Internal site search
One of the best things to do for your site’s SEO is to block your site’s internal search through the robots.txt file. The reason you need to do this is because users can enter an infinite number of queries in search engines. Allowing Google to crawl these pages on your site will lead to a plethora of low-quality search results in the index.
The good news is that you can use Shopify’s default robots.txt to block the standard internal search with this command:
Disallow: /search
Having said that, many Shopify sites don’t utilize their default internal search. Instead, many Shopify sites opt for using apps or some other internal search technology. Doing this will frequently change the URL of the internal search. When this occurs, Shopify’s default rules can no longer protect your site.
In the example below the internal search results render at URLs with /pages/search in the path:
This means that Google is allowed to crawl these internal search URLs:
In such cases, the site owner should consider editing Shopify’s robots.txt rules, i.e. add custom commands to block Google from crawling the pages or search directory.
Faceted navigations
Another case where you should consider adjusting your Shopify robots.txt file is when your site has faceted navigation, i.e. a filtering option you can apply on your category pages. You can usually find these on the left-hand side of the page.
Let’s take as an example a clothing Shopify site that lets users filter the products by color, size, product type, etc. If we were to select the “Black” and “White” color filters, we would see a URL with the “?color” parameter loaded. In this instance, Shopify’s default robots.txt will successfully block certain page paths that faceted navigation might create, but not all of them. When the facet, “color” is not blocked, Google is allowed to crawl the page.
In such cases, you should consider blocking pages with robots.txt in Shopify. This is because there are a large number of these faceted navigation URLs that could be crawled due to the number of combinations. To reduce or prevent the crawl of low-quality or similar pages you should block multiple facets e.g. size, color, etc., and add rules in the robots.txt to block their crawl.
Conclusion
The default setting on Shopify’s robots.txt doesn’t allow search engines to index certain pages on stores, such as password-protected pages and pages related to the checkout process. However, users can modify the Robots.txt file which is located in the root directory of your store’s file system to allow or block specific pages as needed.
To construct a robots.txt file, you can use the default file provided by Shopify as a starting point and add or modify the instructions as needed.
It should be noted that there are limitations to Shopify’s robots.txt file. The file can be used to instruct search engines not to crawl pages you don’t want to be crawled but it does not guarantee it will block pages from being indexed. That’s why it is important to regularly review and update the file as needed to ensure that it is accurate and effective.
There are some cases where it is advisable to modify the Shopify robots.txt file, such as: blocking pages related to internal site search and faceted navigations, allowing or blocking specific pages as needed, etc. By understanding how to use the robots.txt file and taking advantage of its capabilities, you can improve the visibility and SEO of your Shopify website.