|
© 2003-2007 Googlerankings.com
Googlerankings.com is in no way affiliated with or is the property of Google Inc. |
| Introduction About this guide Common issues Duplicate content in Google
Tools and services |
Hijacking is either an accidental or deliberate abuse of known vulnerabilities of Search Engine indexing, where a web page poses as the original source of another URL, while is hosted and owned by a completely different entity. The phenomenon takes place when during a crawl, Search Engine bots discover a URL that - while is on a different domain than the one that has the original content - gives off false signals of being the new location of the same page. If the hijacker is left unnoticed, and no action is being taken, the original web page may be replaced in the Index by the falsely assumed new location, resulting in a complete takeover of a website's rankings by a 3rd party. Search Engines sometimes mistake certain server messages as legitimate requests to index the source URL ( hijacker ) instead or ahead of its target ( the hijacked page ). In other cases online plagiarism can turn out to drop the original source in favor of the hijacker, if the abusing party owns a domain name that has the proper parameters set to outrank the target, and copies its content. Such factors are always technical related, misinterpretation of a server redirect, or the false assumption of a web page being migrated from an old URL to a new one. In either case, the act is in fact violating international copyright laws, thus is not only unethical, but in some cases, illegal. A deliberate act of web page hijacking may see legal actions by the owner of the original ( hijacked ) website. Making the proper precautions, monitoring your content on the web, and taking swift and firm action on the first sign of a hijack is of high importance. Hijacking, while has been a widespread problem until 2006, seems to be less of an issue since the temporary server redirect ( 302 redirect ) exploit in Google has been issued a fix. However plagiarism and proxy hijacking may still pose a problem, to which the final resolution from the Search Engine technicians are still in the works. Precautions that can be made are the setting up of access control to initially ban known proxies, and any request that is disguised as a Search Engine bot if it does not arrive from an IP associated to the domain of the given crawler. Also a properly set up Google alert may give out hints in time if the URL or unique content found on pages is being used elsewhere on the Internet. To do this, go to the Google alerts setup page and request reports on specific content ( adding the queries in between quotes ) and the domain name itself as well. Known issues Case 1, + Resolution: In the past the resolution
to this problem was to contact the webmaster of the offending domain to
disable the redirect or remove the page(s) in question, and in case the
Index has already taken note of the new URL, also file a spam report at
Google, explaining the situation. If the webmaster was not to respond,
the proper action was to contact the hosting company, the server park,
the registrar or any other entity that could take action against the hijack
by making the offending page or domain inaccessible from the web.
+ Resolution: Block access to the
pages of the website from the domain, IP or IP range that is copying the
content. Make sure not to seal off access from other visitors, but do
everything you can to keep the unwanted bots out of your server. File
a spam report at Google, explaining the situation, and should the hijack
be deliberate, you may seek legal advice as whether to file a DMCA complaint.
Most importantly, you'll need to communicate the issue to Search Engines
that have misinterpreted the content appearing on another set of URLs,
and also block access to this and any further attempts on automated scrapers
copying your content, but without denying access to legitimate requests,
such as crawling by Googlebot. Keep in mind that the proxy bots copying
your pages may also identify themselves as another entity to bypass the
security. For this reason, you may need to match up the IP address of
the requests to the domain they resolve to, and should an attempt to cloaking
be evident ( a bot identifying itself to be from a Search Engine, while
its IP address shows no relation to the domain of the bot in question,
e.g. googlebot.com, crawl.yahoo.net ) you should deny access. Most often
hijacking only poses a problem as it invokes a filter wrongly accusing
the original URLs to be the duplicates. Read more on Duplicate content.
Resources Proxy Hijack - Now what should I do ? ( Webmasterworld ) Hijacking - Some Advice for Webmasters ( Webmasterworld ) Report a Spam Result ( Google.com ) Digital Millennium Copyright Act ( DMCA, filing a notice of infringement
- Google.com ) Live Search's Weblog : search robots in disguise ( How to identify
MSN / Live.com bots - MSDN Blogs ) How to verify Googlebot ( Official Google Webmaster Central Blog
) Yahoo! Search Crawler, Slurp, is moving ( Yahoo! Search Blog ) What is "proxy hijacking"? What do I need to know about
proxies? ( Spamhaus ) Google Alerts ( Google.com ) |
Web site diagnostics Banned from Google
|