Duplicate content is one of those buzzwords that has been thrown around quite a bit within the SEO world lately. Just like most buzzwords, there are a lot of people out there using them that don’t exactly know what they mean. In this article, we’ll look at what duplicate content is, how it hurts sites, and a few simple and effective ways to solve duplicate content problems.
What is duplicate content?
Duplicate content, as the name suggests, is content that has been duplicated on another page or site somewhere on the web. There are, however, a few different types of duplicate content that have different origins.
URL Variations: Let’s say you recently migrated your site from HTTP to HTTPS. If the HTTP version is still being indexed, this means that the number of pages on your site, in Google’s eyes, just doubled.
URL variations are very common and can happen in a few different ways. Here are a few different types of variations that can happen, causing duplicate content problems:
- Www pages vs non-www pages
- Dynamically generated URLs (pages that generate a unique URL when visited)
- Product pages that have parameter variations
- Trailing slashes
- Printer-friendly pages
Scraped Content: There are many sites out there that use the same copy for multiple pages. E-commerce sites see this problem a lot. A site that sells lawn chairs will probably have the same sales copy on the page for red lawn chairs as it does on the page for blue lawn chairs. Although these two distinct pages, when Google crawls your site it is going to see multiple pages with the same content.
Stolen Content: Probably one of the most annoying things that can happen to digital marketers, this happens when another site steals your content and publishes it as their own. When Google crawls these two different sites, it gets really confused seeing the same text on two different sites. This doesn’t only happen with blog articles, it can also happen with any kind of text-based content.
The Damaging Effects of Duplicate Content
This won’t come as a surprise to anyone reading this, but duplicate content is bad. The consequences of duplicate content can be condensed down to two main things; wasted crawl budget, and diluted link equity.
Wasted Crawl Budget: Crawl budget can be defined as “the number of URLs Google crawls in a certain amount of time.” This means that Google only crawls a certain amount of pages each time it looks at your site. If a site is experiencing duplicate content issues stemming from URL variations, this can be a big problem.
In this circumstance, Google could be wasting its crawl budget on multiple versions of the same page instead of crawling the valuable pages you want to rank. This can become a huge problem for large sites that really need that crawl budget to rank their pages that bring in traffic.
Link Equity Dilution: As we all know, backlinks are extremely valuable. If a site has duplicate content issues, this means that there could be multiple versions of the same page getting backlinks.
Let’s say you wrote an awesome blog post (https://www.awesomeblog.com/greatpost) that starts to gain some traction and organically brings in 20 backlinks. If this blog post has duplicate content problems, all those backlinks it’s getting could essentially be wasted. There could be 8 backlinks to https://www.awesomeblog.com/greatpost, 5 to https://awesomeblog.com/greatpost, and 7 to https://www.awesomeblog.com/greatpost/. Instead of your post getting all the link equity 20 backlinks would bring in, it’s now getting split up between three pages. That sucks.
How to Fix Duplicate Content Issues
Unfortunately duplicate content can really hurt a site, fortunately, there are a few ways to fix it. We’ll go over three of the most simple and effective solutions: 301 redirects, canonical tags, and robots.txt/noindex tags.
301 Redirects: By setting up a 301 redirect from the duplicate page to the original page, all your visitors will be rerouted to the original content. Another nice thing about 301 redirects is that they pass between 90-99%* of link equity.
Canonical Tags: If a page is having duplicate content issues due to URL variation, then a canonical tag is a quick and effective solution. Basically, a canonical tag tells Google which page among the different URL variations is the original page that should have all link equity.
Canonical tags won’t redirect your visitors to the original URL, but from an SEO standpoint, it accomplishes the goal of passing along link equity to the URL you specify.
Robots.txt/Noindex Tags: This is one of the oldest solutions to duplicate content. By adding duplicate URLs to the robots.txt file, Google will see that you don’t want it to index those specific pages. Noindex tags accomplish a similar result, the only difference is that they are added to the code of a specific page. The benefit of this method is that it completely removes the duplicate URLs from Google’s results.
One downside of using these types of solutions is that they don’t pass along link equity to the original page. For this reason, a best practice is to add a canonical tag to the duplicate pages to make sure they’re passing all the good stuff along.
Let’s go over everything we’ve learned so far. Duplicate content happens when there are multiple variations of a sites URLs, or when another site steals your content and re-publishes it. If a site is experiencing duplicate content problems, it can hurt the site’s rankings by wasting crawl budget and diluting link equity. Luckily, duplicate content is pretty easy to fix. The three most simple ways are 301 redirects, canonical tags, and robots.txt/noindex tags.