How Squarespace Robots.txt File Negatively Impacts Search Engine Optimization
Because of the peculiar robots.txt file (yes, I’m still talking about this stupid robots.txt file again!) that is included in every user’s website and the way it’s set up without escaping the subfolders, you could very easily keep important pages and journal urls from being crawled and properly indexed by the search engines. Let’s get straight to the gory details!
Take a look at your website’s robots.txt file by going to: yoursitenameandnotwhatihavehere.squarespace.com/robots.txt
You should find something similar to the following:

Will not Crawl
Disallow: /contributor
Disallow: /blog/category
Disallow: /blog/week
Disallow: /blog/month
Disallow: /blog/recommend
Disallow: /blog/author
Disallow: /login
Disallow: /contact
Disallow: /site-changes
Pages and Posts that will not be Crawled
You will notice that these sub-directories, folders, or files are not followed with a trailing / (slash). So what does this mean you ask? It means that any files, pages, or blog titles that match one of the above paths, will not be crawled! There is an exceptionally better chance of this happening if you are excluding dates from your journal urls (Which I recommend unless you are posting time sensitive content or news OR you have used the dates for a while already as removing them will require you to redirect ALL the previous dated posts to the new url).
Let me show you some examples-
You have a page that provides info about all the contributors to your non-profit and you name it /contributers.html, it will not be crawled.
You have a blog post that features all the relevant info in your industry and you title the post, Monthly Round-up | May 2012 (ie /blog/monthly-round-up-may-2012.html), it will not get crawled.
If you have ANY Tags (I would mention categories, but all categories already are disallowed) that begin with any of the above folder names (following the last slash), they will not be crawled.
Any page you create using the contact module will automatically be disallowed and not get crawled. Hopefully you don’t have any useful info on this page
If you use the Site Changes module to display a condensed view of all your website’s updates, it will not get crawled. (I’m not sure what happens if you make this page your home page!)
Conclusion
Hopefully you get the idea and can probably come up with additional possibilities of how this blanket policy can have a negative seo impact on your website’s search engine optimisation. I doubt it’s a widespread problem, but it should definitely be on your radar when publishing new pages and posts. Personally, I believe that Squarespace should re-evaluate their “robot” policy and get with the times! You can check your robots.txt with this robot checker- http://tool.motoricerca.info/robots-checker.phtml
The takeaway is to not to let any of your pages follow the url paths that are disallowed in the robot file to avoid any confusion or risks. The only exception to this would be if you internally redirect affiliate links, and of course pages you actually don’t want the search engines to crawl!
*Affiliate Links to Squarespace included in post. Thanks for supporting Squarespace Plugins!








So what is the fix for this?
You can just avoid beginning title/uri and pages with whatever the disallowed “un-escaped” folders in your robots.txt file or create your own robots.txt file (even use the exact same robots file but add a slash (/) to the end of the problem urls), upload it your file storage and then use the url redirect feature (website management dropdown menu) to redirect /storage/robots.txt –> robots.txt. This will overide the default Squarespace robots file. see this related article on redirects: http://www.squarespaceplugins.com/squarespace-journal-titles-and-urls-quick-tip/
I realize as I’m typing, that my explanation sounds more confusing than it actually is! If you give me the site you’re working on, I can give you more specific details.
Hi Holly – Thanks for the tip. Although I read this the other day, I really didn’t think it would apply to me. After all, haven’t noticed any of those problems yet. Plenty of other issues, just not this one. I thought.
Doing a simple google search for my website turned up this on the first page of the search: Squarespace Standard Robot Exclusion # Access is disallowed to …
http://www.capitalthinking.net [ETA by Webmaster: Eric- Don't want to encourage links to your robots file so separating] /robots.txt
# Squarespace Standard Robot Exclusion # Access is disallowed to functional / filtering URLs User-agent: * Disallow: /display/admin/ Disallow: /display/Search …
Understand this is new. But I haven’t done anything with the .txt file for weeks – and then only because it was suggested by one website or another. What’s worse, is that underneath the google search entry was a suggestion (highlighted in blue) to block all references from my site from this point on. Not that I’m taking it personally or anything, but I thought that was a bit much.
So, back to work on this file I go. Any thoughts or suggestions would be welcome.
Are you working on that webinar thing yet?
Thanks,
Eric
First, always check that you are using “unpersonalized” search? else you’ll see results that only you will probably see. (or add this to the end of your search url to get more generic results: &pws=0&gl=us) When I check your site using the site operator site:capitalthinking.net, it looks fine (other than typical ss limitations). However, I do see your robots file showing up in the Serps on the first page. Did you link to it from a forum or somewhere else? It’s not real uncommon to see robots.txt file in google, but not something you find on average unless its been referenced somewhere else.
(see screenshot http://screencast.com/t/D7wZz638). However, shame on you if you haven’t put in a meta description in your website settings yet!!! (I say that because I don’t see it in the SERPS for your homepage…sometimes google picks there own snippets though) In fact, you need to go add meta descriptions for all your PAGES Eric! (“Business Strategy” is NOT a description
) I mean this in the most loving way possible of course!
As far as the # Access bla bla , etc, etc…anything after the pound sign are just non machine readable comments if I’m thinking correctly (lacking of sleep right now!) and I wouldn’t worry about that part. You could have:
#Squarespace Plugins is an awesome website. All Hail Squarespace.
And it wouldn’t make a difference to google crawlers
On a positive note, your site looks to at least be keeping dupe content out by not having any non www references in the Serps except for a login page
Thanks Holly. I appreciate the feedback and the criticism. I don’t remember ever linking to the robots.txt file in a forum – or anywhere else for that matter. I have used a plug-in (Outbrain) which seems to lose track every so often and put up links to pages it shouldn’t (those I’ve selected against). Website Login is one of those titles; which is funny because the pic (and the link if you click it) is not the Login page at all. Just a random article.
This is my “experimental” site. I use it to play with and to try things out. It’s a personal blog and not targeted to anyone – or any group specifically. It just gives me a place to learn a bit about the platform, the tools, and the underlying design. Granted, I’m not moving in the right direction fast enough (for me).
I appreciate the help and your articles. And trust me, there is an audience out there for the webinar. I know that Jon Morrow does a lot of his seminar (classes) using WordPress as the primary platform so I think something along those lines would easily be within your wheelhouse of expertise.
Thanks again,
Eric
Hi Holly – In the process of going through the SEO articles and your helpful posts, I came across this article regarding SEO issues with LinkWithin. Currently, I use Outbrain (discussed) elsewhere which works pretty well. There are some issues with my installation of the product that I need to deal with, but overall I like it.
At the same time, I thought it interesting that LinkWithin worked as it does – which may explain the speed differences as well.
Thanks again for all the help,
Eric
Well – it would be nice if I actually remembered to include the links (there are two)
http://www.johnfdoherty.com/switch-from-linkwithin-to-nrelate/
http://www.geekinheels.com/2012/02/02/bloggy-thursdays-why-i-stopped-using-linkwithin-and-switched-to-nrelate.html
Sorry about that.
Eric
Don’t sell yourself short Eric. It’s all a work in progress and I think in many ways, everyone’s website is “experimental”. I think another thing we have to keep in mind is that many of the sites we follow have teams, if not legions of people behind the scenes able to pool all their knowledge together.
Thanks for bringing the related post thing up Eric. I honestly never looked closely at the seo implications of these kind of interlinking widgets. In fact, the “SimpleSlide” widget that pops up from the bottom of the screen as you scroll down “Recommended for You” does exactly the same thing by going through their site. Their reasoning is they provide some “helpful” stats.
Funny thing, I had nRelate on this blog before it was a “real” website (strictly a developmental site that didn’t even have a domain for almost a year!). I really like it and it worked great until I started using my current template…it jacked everything all up.
I didn’t know this, but it looks like they have a Squarespace version (haven’t had a chance to look further into it but might give it a look-see when I get a chance). I’ve tried both linkwithin and outbrain on my Squarespace site but Outbrain never showed up and like you mentioned, site speed became an issue. However, I know someone has written a post about how to make any of the above ONLY load on single posts. A google search might find it. I think Nathan at squarecoach.com may of written something about Outbrain as well.