How Squarespace Robots.txt File Negatively Impacts Search Engine Optimization
Because of the peculiar robots.txt file (yes, I’m still talking about this stupid robots.txt file again!) that is included in every user’s website and the way it’s set up without escaping the subfolders, you could very easily keep important pages and journal urls from being crawled and properly indexed by the search engines. Let’s get straight to the gory details!
Take a look at your website’s robots.txt file by going to: yoursitenameandnotwhatihavehere.squarespace.com/robots.txt
You should find something similar to the following:Disallow: /login
Pages and Posts that will not be Crawled
You will notice that these sub-directories, folders, or files are not followed with a trailing / (slash). So what does this mean you ask? It means that any files, pages, or blog titles that match one of the above paths, will not be crawled! There is an exceptionally better chance of this happening if you are excluding dates from your journal urls (Which I recommend unless you are posting time sensitive content or news OR you have used the dates for a while already as removing them will require you to redirect ALL the previous dated posts to the new url).
Let me show you some examples-
You have a page that provides info about all the contributors to your non-profit and you name it /contributers.html, it will not be crawled.
You have a blog post that features all the relevant info in your industry and you title the post, Monthly Round-up | May 2012 (ie /blog/monthly-round-up-may-2012.html), it will not get crawled.
If you have ANY Tags (I would mention categories, but all categories already are disallowed) that begin with any of the above folder names (following the last slash), they will not be crawled.
Any page you create using the contact module will automatically be disallowed and not get crawled. Hopefully you don’t have any useful info on this page
If you use the Site Changes module to display a condensed view of all your website’s updates, it will not get crawled. (I’m not sure what happens if you make this page your home page!)
Hopefully you get the idea and can probably come up with additional possibilities of how this blanket policy can have a negative seo impact on your website’s search engine optimisation. I doubt it’s a widespread problem, but it should definitely be on your radar when publishing new pages and posts. Personally, I believe that Squarespace should re-evaluate their “robot” policy and get with the times! You can check your robots.txt with this robot checker- http://tool.motoricerca.info/robots-checker.phtml
The takeaway is to not to let any of your pages follow the url paths that are disallowed in the robot file to avoid any confusion or risks. The only exception to this would be if you internally redirect affiliate links, and of course pages you actually don’t want the search engines to crawl!
*Affiliate Links to Squarespace included in post. Thanks for supporting Squarespace Plugins!