You better check your site, before you wreck your site
Ok, perhaps I’ve overstated myself a bit with that headline, but I do want to bring something to your attention that I think is very important about how Squarespace directs search engines to crawl your website. We all have a robots.txt file which guides search engines like google to index your site and essentially how your website is found and viewed by searchers.
To view your robots.txt file go to:
Your robot file should look similar to this:
# Squarespace Standard Robot Exclusion
# Access is disallowed to functional / filtering URLs
I’ve added emphasis to the Disallow: /blog/category/ as that’s my focus for now. What this essentially tells the bots, is to not to crawl or follow any links on these category pages. I believe Squarespace’s intention is to keep from having duplicate content issues which may dilute your search results. However, it doesn’t keep them out of the index, especially if someone is linking to that content.
You can check your own site for category inclusion in the SERPs with this site operator:
Unfortunately, as with any “fix” or reaction, there can be consequences that are possibly worse than the original problem. And so I think this may be one of those situations. Let me give you a real world example.
Squarespace SEO Case Study: OkayGeek
One of the most popular articles on this site links to other Unofficial Squarespace Resources and one of the websites I mention is OkayGeek with a link to their Squarespace Category. Now granted, the Squarespace Plugins site is fairly new and probably doesn’t offer much link juice to the popular OkayGeek website, but if I were TechCrunch, OkayGeek will get little if any benefit from my link because robots.txt is essentially telling Google that page is not important. However, it doesn’t keep it from showing up in the search engines if someone links to you.
Here’s how the OkayGeek Squarespace Category looks in Google SERPs:
Where’s the Beef?
And if they do link to you, they have no information (meta description) to include in the results because you disallowed the category pages from the search bots. And then the juice doesn’t flow!
Impact on Sitelinks
Disallowing categories can also have a negative impact to your sitelinks (the indented links below the main link). Instead of popular categories like reviews, OkayGeek sitelinks include their login page and a couple of other low priority links. Wouldn’t it be more beneficial if categories like Reviews, Videos, Gaming, and Editorials were shown as site links instead?
Impact on Search
So, let’s look at one more example. What happens when you search for your brand along with a category keyword? I figured searching OkayGeek reviews would list tons of reviews they have featured on their website as reviews are probably some of their most popular articles. As you can see, I get a total of two links that might be relevant, a link to Jason’s profile (I’m sure Jason is great, but probably not what I’m looking for), and a twitter account. And by the way, HOLY MOLY, do you see link 4!!!? The NG4 website ranks higher than OKAYGEEK for their own review! (I assume a canonical tag may of prevented this.) Sheesh!
So What’s the Solution
Don’t you hate it when people bring up problems, but have no recommended solutions? I know I do. I have a couple of ideas, but they certainly are not ideal. However,the best solution would be if Squarespace used canonical tags and/or allowed page by page meta robots editing. This could significantly reduce the amount of duplicate content issues without hacking the robots.txt file. (check your Google Web Master Tools and you will see what I’m talking about with duplicate content as well as duplicate titles). Honestly, I’m hoping someone reads this and can offer something better than the solutions I share below.
So until hell freezes over, here are some options.
- Edit your robots.txt file and remove the disallow:category. Yes, you can do this. Copy your current robots.txt file in a text editor, delete disallow: category and save as a txt file. Then upload to your file storage and use a 301 redirect to your new file. (If you need help, feel free to contact me) This may create duplicate content issues but at least your juice will flow.
- My SeoMoz Pro account gives me access to some of the real SEO Experts. Here’s a reply from Phil Sharp, seo manager at Practice Fusion to my question concerning the Squarespace Robots.txt Category issue?
Some people like to prevent search engines from crawling category pages out of a fear of duplicate content. For example, say you have a post that’s at this URL:
and it’s also the only post in the category “milk” with this url:
then search engines see the same exact content (your blog post) on two different URLs. Since duplicate content is a big no-no, many people choose to prevent the engines from crawling category pages. Although, in my experience, it’s really up to you. Do you feel like your category pages will provide value to users? Would you like them to show up in search results? If so, then make sure you let Google crawl them.
If you DON’T want category pages to be indexed by Google, then I think there’s a better choice than using robots.txt. Your best bet is applying the noindex, follow tag to these pages. This tag tells the engines NOT to index this page, but to follow all of the links on it. This is better than robots.txt because robots.txt won’t always prevent your site from showing up in search results (that’s another long story), but the noindex tag will.
- Be vigilant in how you organize your content with categories and tags. Don’t include more than two categories in a post. And perhaps just use tags or just use categories. (Tags do get crawled, but are also part of the duplicate content issue as well)
- Keep the robots.txt the way it is, and create your own category index page. For instance, OkayGeek would create a page at OkayGeek.com/reviews that includes a list of links to all their reviews using the titles as anchor text.
- Use the Squarespace Excerpt Feature. This pertains more to duplicate content issues, but is probably the easiest to carry out. Don’t display full posts on any index pages. Only display the full content on the permalink page.
- Create a sitemap and submit it in Google Webmaster tools. I just read a reply from Bonnie Gibbons on the developer forums that suggests submitting a sitemap will help with duplicate content. Perhaps editing out the disallow : category along with submitting a sitemap may be most beneficial. Create a free one up to 500 pages at http://www.xml-sitemaps.com/.
I’m also doing an experiment with redirecting the default category index pages to a custom index pages. I don’t suggest you try this unless your want to risk having major issues with your website. I’ve just started creating the custom pages and haven’t completed the redirects so I have no idea what the consequences will be. I’ll get back with you on the results.
To be clear, I love Squarespace and think it even has SEO advantages that other platforms don’t have- semantic code, fast page loading, asynchronous loading widgets, meta tag optimization, etc. This gets most of us past the 80/20 threshold of optimization. However, sometimes the 20% or at least part of it, is the difference between success and struggling in highly competitive markets. I don’t know about you, but I put a lot of blood, sweat, and tears into my websites. I’m not willing to rely on the “build it and they will come” mantra any longer.
Please Share and Comment
I’ve never asked or begged anyone to share anything on this website. But I’m asking you now. If you have followers with an interest in Squarespace or SEO, I’m asking that you share this article with them. Especially if they can offer additional insights and best practices on how to limit duplicate content while at the same time, letting the link juice flow within the Squarespace framework. And of course, if you have any insights, please post them in the comments. I know Seo isn’t near as fun as adding cool stuff like accordions and social media share buttons, but I don’t think any of it matters as much if we are losing traffic due to a “not as optimal as at could be” website. I know 70-80% of my views come from search and I’m guessing I’m not alone. I’ll continue to update this post with your useful suggestions and helpful links so check back often as I’m sure I’ll need to make changes.
Further Reading and References:
*Searches were done with and without personal search.