10 Things About Duplicate Content

Duplicate Content is one of the most perplexing problems in SEO. In this post I am going to outline 15 things about how Google handles duplicate content.

  • Google’s standard response would be to filter out any duplicate pages, and show only one page with a given set of algorithmic parameters in its returned results.
  • Its clear by viewing SERPs that large media companies seem to be able to show copies of press releases and do not get filtered out.
  • Google will hardly ever penalize a site for duplicate content. Their belief is that it is merely irrelevant.
  • There are times when Google will penalize, but this is going to take some pretty blatant plagiarism or the site has very little value to the end user. Or the site is simply made for the purpose of advertising. I have seen instances of algorithmically applied penalties for sites with large amounts of duplicate content.
  • Examples of sites that add little value are thin affiliate sites, which are sites that use copies of third party material for its content, and exists just get search traffic and promote affiliate programs. If this is your site, Google may well seek to penalize you.
  • When Google visits your site, they have a number of pages that they are likely to crawl based on their previous crawls data. One of the costs of duplicate content is that when the crawler loads a duplicate page, one that they are not going to index, they have loaded that page instead of a page that they might index. This is a big downside to duplicate content if your site is not fully indexed as a result.
  • Google finds it easy to detect certain types of duplicate content, like as print pages, archive pages in blogs, and thin affiliates. These are usually recognized as being inadvertent and dont get indexed as standard.
  • Google are still working with RSS feeds and the best way to keep them from showing up as duplicate content. The recent Google purchase of Feed Burner should speed the resolution of that issue.
  • One key thing they use as a signal as to what page to select from a group of duplicates, is that they look at and see what page is linked to the most.
  • Finally, if you DO want to see duplicate content results, just do your search, get the results, and append the “&filter=0? parameter to the end of your search results and refresh the page.