© 2003-2007 Googlerankings.com
Googlerankings.com is in no way affiliated
with or is the property of Google Inc.
 

Introduction

About this guide
+ Website Diagnostics Panel

Common issues

Duplicate content in Google
Supplemental Results
Omitted results in Google
Canonical URLs
Bulk update
Google PageRank
TrustRank
Website navigation
Buying links
Link Schemes
Bad neighborhood
Accessibility and Usability issues
Anchor text link
Scraper sites
Hijacking Google results
Historic domain penalty
DMOZ snippets
Removing a web site from the Google Index
Google Sitemaps
Reinclusion requests
The Google sandbox

Tools and services

Diagnostics Home
Googlerankings.com


Duplicate content in Google

In order to battle off plagiarism and scraper sites, and also to provide higher quality search results, the Google index is applying a filter to sort out duplicates of web pages and other documents found on the web. The URLs that are judged to point to content that can be found on another URL as well, are being lowered in their importance, and eventually are turned into supplemental results or are dropped out of the index.

Known issues

Case 1,
The now obsolete practice of having a backup copy of a web page or an entire web site ( a.k.a. mirror sites, hosted on a different server, under a different domain name ), parallel to the one that is intended to be the "original" will trigger the applying of this filter.

+ Resolution: The immediate shutdown of the mirror site, and all copies of the content you have control of. Redirect visitors to the single copy that you wish to keep.


Case 2,
In certain instances where the URL history, the crawl rate or pattern, PageRank, directory level or the TrustRank of the new copy suggests that the new web page is the one with higher importance, the "original" URL will be marked as the supplemental result, or dropped out of the index.

+ Resolution: You should not have an identical copy of any single web page, nor an entire web site on the web simultaneously to the original. In case you notice your web pages being plagiarized by a 3rd party, contact the webmaster and request its deletion. If the webmaster does not respond, contact the hosting company, the Internet Service Provider, or the Registrar directly, and report the problem to Google representatives through the Google Webmaster Tools control panel.


Case 3,
Sometimes a single web page can be accessed through multiple URLs, resulting in the presumption of two identical copies of the same content existing in the index. The algorithm will then most likely judge either to be the duplicate, and set its attributes in the database accordingly. In certain cases, where the URL that is presumed to be the original by the webmaster can not be identified as so by Google, or the multiple URL pattern is being perceived as spam, both or all URLs will be marked as supplemental, or be dropped from the index.

+ Resolution: Google does its best to identify the patterns of good-faith duplicate content issues, such as the www.example.com vs. the example.com versions of the same URL pointing to a single web page. In certain cases however the algorithm can not decide whether the duplicate content is spam, the result of erroneous inbound links or of inconsistent navigation / parameters for the same URL.
For more information on how to resolve this issue, see Canonical URLs.

Case 4,
In extremely rare cases a proxy server or a hacked website may cache web pages or entire websites, and knowingly or by chance allow Google to index its pages. Sometimes Google may not be able to determine the original source of the content, and keep the URLs of proxy in its Index, instead of the URLs of the website being copied. This issue is a problem that Google engineers are currently working on resolving.

+ Resolution: To prevent such issues taking websites by surprise, you may set up a Google Alert at http://www.google.com/alerts for the domain name and inspect reports of any suspicious URLs that use its domain name as a part of the address, or bits of its unique content. Either way, you will need to identify the bot that requests the pages from the website and disallow any further copying of the content through your .htaccess settings. Read more on Hijacking.



Resources
Google Webmaster Guidelines
http://www.google.com/support/webmasters/bin/answer.py?answer=35769

Avoiding Duplicate Content penalties ( Elixir Systems )
http://www.elixirsystems.com/seo_tips/avoiding-duplicate-content-penalty.php

Duplicate Content Issues (Yahoo & Google) ( Search Engine Roundtable )
http://www.seroundtable.com/archives/006714.html

Duplicate Content - Get it right or perish ( Webmasterworld )
http://www.webmasterworld.com/google/3060898-4-30.htm

How do I prevent Googlebot from following links on my pages? ( Google Webmaster Help Center )
http://www.google.com/support/webmasters/bin/answer.py?answer=33581

How do I tell Googlebot not to crawl a single outgoing link on a page? ( Google Webmaster Help Center )
http://www.google.com/support/webmasters/bin/answer.py?answer=33582

Web site diagnostics

Banned from Google
Penalized by Google
Supplemental Results
Unanswered problems


Overview of Terms | Contact Us
©2003-2007 Googlerankings.com
Google and PageRank are trademarks of Google Inc.
This is site developed by Sorensen Online and Graphite-Works