Google goes Deep Web
Google is doing something important. It has been playing around with HTML forms. Basically they have been toying around with forms from high quality site and use it crawl for urls that correspond to the query. It takes time to digest this. Better read this post, Crawling through HTML forms, from the Google Webmaster Central Blog.
Google of course follows the discussion quicly with a mention that the experiment follows good Internet citizenry practice and this involves:
Only a few useful sites were included in the experiment
The GoogleBot strictly adhered to robots.ext, no follow, and no index directives: If the search form is forbidden the URLS that the form would get would not be crawled.
Also, Google used only GET forms and not forms that require personal information.
Google also limited the number of fetches per web site because of the potential impact several fetches would do to a particular website.
Seems to be above board.
But what is important about this experiment of Google?
First, It is a significant effort on the part of Google to mine the Hidden Web also known as the Deep Web - in other words data, information and knowledge not reachable by the search engines.
Second, If consistently successful Google might be able to add high quality content to its usual search results.
Makes one wonder what the other competitors are doing. If successful does this mean a ten-fold increase of the Information Overload?
Tags: google, Information OverloadRelated Stories
POSTED IN: Information Overload
0 opinions for Google goes Deep Web
No one has left a comment yet. You know what this means, right? You could be first!
Have an opinion? Leave a comment: