But the article cited above shows how this barrier is slowly cracking. Now I can enter "fedex 791725670102" into Google (not Federal Express) and discover that the jigsaw puzzle I mailed to an author in Australia was signed for by him.
Of course, Google has to send me to the Federal Express site (which takes an extra click) to complete the search, but the principle is established: a search at Google can kick off a deep search on another site.
The burn-out of the dot-com era left a smoldering envy of those few dot-commers that managed to stay alive. Google is foremost among these. If they can continue pulling in dynamic data from more and more sites, their dominance may well continue--for access to dynamic data is indeed the key to the next big improvement in search.
A generalization of the Google/FedEx collaboration would lead to what is commonly called metasearch engines, a peer-to-peer solution to the search problem that involves a radically different architecture from any of the current popular engines. I said different, not new. The idea of peer-to-peer search was aired at least as far back as early 2000. I described it in my first article on peer-to-peer systems in May of that year:
Gnutella is a fairly simple protocol. It defines only how a string is passed from one site to another, not how each site interprets the string. One site might handle the string by simply running fgrep on a bunch of files, while another might insert it into an SQL query, and yet another might assume that it's a set of Japanese words and return rough English equivalents, which the original requester may then use for further searching. This flexibility allows each site to contribute to a distributed search in the most sophisticated way it can. Would it be pompous to suggest that Gnutella could become the medium through which search engines operate in the 21st century?
What's holding back metasearch is the lack of standards for categorizing data and knowing what to search for. It's easy to guess that "fedex 791725670102" should be interpreted as a search for a Federal Express package, but anything less strictly defined is a big metadata problem.
A lot of people have dumped on the ideal of metadata, notably Cory Doctorow in the article Metacrap. So the waters of the deep web will be slow to stir, but as the benefits become clear, more and more sites may emerge.
What business model would drive metasearch? That question is classic in peer-to-peer systems, because distributed systems typically have problems generating and distributing income. Sites could be motivated to solve the metadata problem because they'd draw more traffic by joining the system, and expose more of their data to people's searches.
As for the aggregating site--Google or a competitor--it would potentially have an easier road to profitability than Google has now. The aggregating site could continue to derive revenue from ads and from the sale of search software. Since the computing resources it needed would be vastly less than the current Google, it would need less revenue from ads and sales. And since the use of its software would be a prerequisite to joining (although one hopes it would tolerate the use of compatible, competing software) it should be able to land more sales.
Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.
oreillynet.com Copyright © 2006 O'Reilly Media, Inc.