Squarespace Case Study | SEO, Categories, and Robots.txt

squarespace seo robot file

You better check your site, before you wreck your site

Ok, perhaps I’ve overstated myself a bit with that headline, but I do want to bring something to your attention that I think is very important about how Squarespace directs search engines to crawl your website. We all have a robots.txt file which guides search engines like google to index your site and essentially how your website is found and viewed by searchers.

To view your robots.txt file go to:
www.yoursite.com/robots.txt.

Your robot file should look similar to this:

# Squarespace Standard Robot Exclusion
# Access is disallowed to functional / filtering URLs

User-agent: *
Disallow: /display/admin/
Disallow: /display/Search
Disallow: /display/Login
Disallow: /display/RecoverPassword
Disallow: /display/common.css
Disallow: /login
Disallow: /contributor

Disallow: /blog/category
Disallow: /blog/week
Disallow: /blog/month
Disallow: /blog/recommend
Disallow: /blog/author
Disallow: /login

I’ve added emphasis to the Disallow: /blog/category/ as that’s my focus for now. What this essentially tells the bots, is to not to crawl or follow any links on these category pages. I believe Squarespace’s intention is to keep from having duplicate content issues which may dilute your search results. However, it doesn’t keep them out of the index, especially if someone is linking to that content.

You can check your own site for category inclusion in the SERPs with this site operator:
site:yoursite.com inurl:/category/

Unfortunately, as with any “fix” or reaction, there can be consequences that are possibly worse than the original problem. And so I think this may be one of those situations. Let me give you a real world example.

Squarespace SEO Case Study: OkayGeek

One of the most popular articles on this site links to other Unofficial Squarespace Resources and one of the websites I mention is OkayGeek with a link to their Squarespace Category. Now granted,  the Squarespace Plugins site is fairly new and probably doesn’t offer much link juice to the popular OkayGeek website, but if I were TechCrunch, OkayGeek will get little if any benefit from my link because robots.txt is essentially telling Google that page is not important. However, it doesn’t keep it from showing up in the search engines if someone links to you.
Here’s how the OkayGeek Squarespace Category   looks in Google SERPs:

2012 05 02 10581 550x286 Squarespace Case Study | SEO, Categories, and Robots.txt photo

No meta description in Category due to disallow: category

Where’s the Beef?

And if they do link to you, they have no information (meta description) to include in the results because you disallowed the category pages from the search bots. And then the juice doesn’t flow!

Impact on Sitelinks

Disallowing categories can also have a negative impact to your sitelinks (the indented links below the main link). Instead of popular categories like reviews, OkayGeek sitelinks include their login page and a couple of other low priority links. Wouldn’t it be more beneficial if categories like Reviews, Videos, Gaming, and Editorials were shown as site links instead?

2012 05 02 1046 550x272 Squarespace Case Study | SEO, Categories, and Robots.txt photo

OkayGeek Sitelinks: Wouldn't Categories be better?

Impact on Search

So, let’s look at one more example. What happens when you search for your brand along with a category keyword? I figured searching OkayGeek reviews would list tons of reviews they have featured on their website as reviews are probably some of their most popular articles. As you can see, I get a total of two links that might be relevant, a link to Jason’s profile (I’m sure Jason is great, but probably not what I’m looking for), and a twitter account. And by the way, HOLY MOLY, do you see link 4!!!? The NG4 website ranks higher than OKAYGEEK for their own review! (I assume a canonical tag may of prevented this.) Sheesh!

2012 05 02 10561 550x428 Squarespace Case Study | SEO, Categories, and Robots.txt photo

Where's the reviews?!

So What’s the Solution

Don’t you hate it when people bring up problems, but have no recommended solutions? I know I do. I have a couple of ideas, but they certainly are not ideal. However,the best solution would be if Squarespace used canonical tags and/or allowed page by page meta robots editing. This could significantly reduce the amount of duplicate content issues without hacking the robots.txt file. (check your Google Web Master Tools and you will see what I’m talking about with duplicate content as well as duplicate titles).  Honestly, I’m hoping someone reads this and can offer something better than the solutions I share below.

So until hell freezes over, here are some options. 

  • Edit your robots.txt file and remove the disallow:category. Yes, you can do this. Copy your current robots.txt file in a text editor, delete disallow: category and save as a txt file. Then upload to your file storage and use a 301 redirect to your new file. (If you need help, feel free to contact me) This may create duplicate content issues but at least your juice will flow.
    • My SeoMoz Pro account gives me access to some of the real SEO Experts. Here’s a reply from Phil Sharp, seo manager at Practice Fusion to my question concerning the Squarespace Robots.txt Category issue?

      Some people like to prevent search engines from crawling category pages out of a fear of duplicate content. For example, say you have a post that’s at this URL:
      site.com/blog/chocolate-milk-is-great.html

      and it’s also the only post in the category “milk” with this url:
      site.com/blog/category/milk

      then search engines see the same exact content (your blog post) on two different URLs. Since duplicate content is a big no-no, many people choose to prevent the engines from crawling category pages. Although, in my experience, it’s really up to you. Do you feel like your category pages will provide value to users? Would you like them to show up in search results? If so, then make sure you let Google crawl them.

      If you DON’T want category pages to be indexed by Google, then I think there’s a better choice than using robots.txt. Your best bet is applying the noindex, follow tag to these pages. This tag tells the engines NOT to index this page, but to follow all of the links on it. This is better than robots.txt because robots.txt won’t always prevent your site from showing up in search results (that’s another long story), but the noindex tag will.

  • Be vigilant in how you organize your content with categories and tags. Don’t include more than two categories in a post. And perhaps just use tags or just use categories. (Tags do get crawled, but are also part of the duplicate content issue as well)
  • Keep the robots.txt the way it is, and create your own category index page. For instance, OkayGeek would create a page at OkayGeek.com/reviews that includes a list of links to all their reviews using the titles as anchor text.
  • Use the Squarespace Excerpt Feature. This pertains more to duplicate content issues, but is probably the easiest to carry out. Don’t display full posts on any index pages. Only display the full content on the permalink page.
  • Create a sitemap and submit it in Google Webmaster tools. I just read a reply from Bonnie Gibbons on the developer forums that suggests submitting a sitemap will help with duplicate content. Perhaps editing out the disallow : category along with submitting a sitemap may be most beneficial. Create a free one up to 500 pages at http://www.xml-sitemaps.com/.

I’m also doing an experiment with redirecting the default category index pages to a custom index pages. I don’t suggest you try this unless your want to risk having major issues with your website. I’ve just started creating the custom pages and haven’t completed the redirects so I have no idea what the consequences will be. I’ll get back with you on the results.

To be clear, I love Squarespace and think it even has SEO advantages that other platforms don’t have- semantic code, fast page loading, asynchronous loading widgets, meta tag optimization, etc. This gets most of  us past the 80/20 threshold of optimization. However, sometimes the 20% or at least part of it, is the difference between success and struggling in highly competitive markets. I don’t know about you, but I put a lot of blood, sweat, and tears into my websites. I’m not willing to rely on the “build it and they will come” mantra any longer.

Please Share and Comment

I’ve never asked or begged anyone to share anything on this website. But I’m asking you now. If you have followers with an interest in Squarespace or SEO, I’m asking that you share this article with them. Especially if they can offer additional insights and best practices on how to limit duplicate content while at the same time, letting the link juice flow within the Squarespace framework. And of course, if you have any insights, please post them in the comments. I know Seo  isn’t near as fun as adding cool stuff like accordions and social media share buttons, but I don’t think any of it matters as much if we are losing traffic due to a “not as optimal as at could be” website. I know 70-80% of my views come from search and I’m guessing I’m not alone. I’ll continue to update this post with your useful suggestions and helpful links so check back often as I’m sure I’ll need to make changes.

Further Reading and References:

*Searches were done with and without personal search.


If you would like to receive other tips and tutorials,  follow Squarespace Plugins via RSS and sign-up for the SSP Monthly Newsletter which will include all the tutorials for the month as well as the latest Squarespace happenings around the net. (Your info will not be shared or sold to anyone)

Squarespace user since 2009. Traveling throughout North America with my first husband and three kids. You can follow our journey at FoggyPhils. (http://www.foggyphils.com)

17 Comments on "Squarespace Case Study | SEO, Categories, and Robots.txt"

  1. Scott says:

    There is a way to input page by page meta robot tags and rel canonical. I just haven’t tested it fully yet. And the option is available for both standard and unlimited packages. Check out http://developers.squarespace.com/design-coding/post/1673227#post1818959 for more detail.

    I also implemented a meta robots noindex, nofollow on one of my client’s login pages by manipulating the page description fields that squarespace offers. Have yet to see how well it works, since the first meta tag tells the bot to index it then the second tells the bot not to. I am hoping the bot listens to the latter. Time will tell, but I hope this helps.

  2. Holly says:

    Just a reminder to test your “experiments”. I suggest reading this article before using any canonical tags.
    http://www.seomoz.org/blog/catastrophic-canonicalization

  3. Hey Holly,
    You seem to be reasonably clued in on Squarespace. Did you find any workaround to helping SEO for images within Squarespace? The Alt Tags don’t seem to work properly on Squarespace 6 anymore and you cannot choose a custom filename, since all the images are uploaded to the cloud and renamed.

    Przemek

    • Holly says:

      Przemek,
      It looks like they have been working on this. And perhaps more of problem than alt tags is the using of a cdn to host the images. This is/was causing SEO issues in that the images weren’t being indexed or credited to the site. However, they just posted an update saying it’s fixed. Let me know if you are still having issues and I’ll look at it further. http://blog.squarespace.com/better-links-and-more

      • Thanks for the reply Holly,

        Yeah, I’ve just got an email about it. I have to say, their customer service is pretty damn good.

        Just checked the new sitemap that Squarespace generates and it looks that they preserved the original filenames that I uploaded. Also the captions are not included in the sitemap, which should theoretically put me back into Google Images in a few days. I’ll report back with findings. The odds look good.

      • Just to give you an update. The issue with the CDN was never fixed. I kind of worked around it by copying the URL of the image from the CDN and embedding it into a blog post with a standard img src tag. Seems to be getting picked up by Google after a few days. Most of them anyway.

  4. Bobby says:

    Hi Holly,

    My page views dropped immensely when I migrated from Blogger to Squarespace, from receiving 1300 views a day minimum, at SP i struggle to make 400 a day.

    I changed the robots.txt file, deleting the category part and my page views improved massively for that one day, and reflected the page views I expect to be getting. Now it has returned back to the views of 400 which i call measly, and very incorrect. I contacted the support team and they couldn’t figure out why. Is the robots.txt file the reason why? (I say measly, as it was no problem on Blogger, and with 980 followers on Twitter, and near on 400 fans on Facebook, thats why I believe the views aren’t correct.

    Also, my link in google shows the incorrect page titles below the website as shown in one of your images which is frustrating. If you could help me I would be forever grateful!

    It would also aid my Computing degree I am currently studying. Thanks in advance.

    • Holly says:

      Hey Bobby!
      Not sure if you figured this out yet, but I’ll take a couple of “general” stabs. First, your robots.txt is fine as far as allowing bots to crawl your site. No Blockages there. When you are comparing views are you using the same statistics program (ex google analtyics) or blogger vs squarespace? Unless you are using the same stat source, it’s really comparing apples to oranges as they all seem to count “views” a bit different. What I imagine your problem stems from is your url mapping from blogger to squarespace. Are the urls EXACTLY the same? If not, you would need to do 301 redirects. Now, google will eventually figure it out. BUT, you will not get “Credit” in the google ranks from other sites that have linked to you in the past. Careful review of Google Webmaster tools is a MUST when moving a site. It will help you see crawl errors and invalid urls that may no longer be valid.

      • Bobby says:

        Thanks for the swift reply!

        I understand completely what your saying, is there any way I could check to see my old URL links bearing in mind I completely removed my Blogger account? Originally I compared Blogger views to Squarespace, however now I use Google analytics, alongside the Squarespace traffic stats, and it doesn’t add up entirely. At some points I can see that there are active viewers into double figures when I look on GA, however when i refresh the Squarespace traffic overview the views stay exactly the same. It’s a tough one to digest, and I thank you in advance for what you saying. Also want to point out that this is a VERY informative site and has had answers for many of my Squarespace dilemmas!

        • Holly says:

          Bobby! Sorry for the now super delayed reply! ;) You really can’t depend or expect any two analytics software to report exactly the same. You will most definitely drive yourself batty by doing so! They all use different algorithms for counting views (some software only counts them if they’ve been on the site more than 10secs), new visits can be based on different time segments, and of course they all have different cookies. However, they shouldn’t be completely way off. I’d say 30% can even be considered normal differentiation. As far as seeing your old urls- Do you have google webmaster tools? This should show you almost all your links and will tell you which ones are coming back as 404 errors. I actually track several squarespace user’s Google WMT accounts. If you would like to add me as a user so I can look at your specific situation, please fill out a request using the contact form link (http://www.squarespaceplugins.com/contact/). I don’t charge for basic monitoring and initial consult. In exchange, I get to observe how well the migration process is going for current users over periods of time which allows for smoother transitions for those interested in future migrations and the community as a whole.

  5. Karen says:

    I’m trying to figure out if my robot traffic is anywhere close to normal. I’ve got one site where robot hits are 4x page views. Another that are 2x pageviews. A friend just showed me her stats (much bigger blog than any of mine) and her robot hits are just under half total pageviews.

    I’m not using that comparison to correlate the two numbers. I just want to know how much robot traffic I should expect. What’s normal? When should I (or anyone else) be concerned.

    FTR, all the sites I’m referring to are on Squarespace.

    • Holly says:

      Karen- sorry for such a long delay. Your post ended up in my spam filters. As far as robots hitting your site, there are quite a few out there. It really depends on your traffic as to how they compare to pageviews. The more traffic, the less you will notice them or the less percentage of pageviews from robots anyways. I wouldn’t worry about it much, especially when using Squarespace to host your site. They are watching for any nefarious robots.

  6. Brent says:

    I think I have a serious disallow problem and am not being indexed properly. Can you check it out and contact me? Thanks

  7. Dan Adams says:

    Great article, thank you for posting it. While reading through it I noticed that the link to http://www.okaygeek.com/blog/category/squarespace was broken. Just a heads up, but awesome job on the write up!

    I just signed up for Squarespace 5 days ago and these articles are very helpful. You can see my Squarespace site at http://www.myx100life.com

Got something to say? Go for it!

 
Loading...
Join over 100 folks and get our free monthly SSP Newsletter.
No-Spam Guarantee. Ever.