The Mechanics of a Deep Net Metasearch Engine

Most of the Web isn't reachable by a crawler. It sits behind search forms - on government registers, library catalogues, scientific databases, niche topic engines - and the only way in is to ask. This is a paper about asking thousands of them at once.

This 2003 IADIS paper describes the architecture of Turbo10 (later T10) - a metasearch engine that aggregated thousands of specialised "deep net" search engines into a single query interface. The hard part isn't the searching. It's keeping the adapters alive: each engine has its own form, its own URL pattern, its own way of formatting results, and they all change without warning.

crawler waterline The visible web indexed by Google The deep net thousands of specialised engines each dot is a search form - the only way in is to ask
The Web most users see, against the Web behind the search forms.

One query, thousands of asks

A user types a query once. The metasearch broker fans it out to a chosen set of relevant engines - PubMed, the patent office, library catalogues, niche commerce engines - through small adapters that translate the query into the shape each engine expects. Results come back in different formats; the broker normalises and merges them, deduplicates, and ranks the unified list before showing it to the user.

Query Broker fan out · merge adapter adapter adapter adapter adapter PubMed Patents Library Shop Niche
Adapters do the dirty work - one for each engine, each one keeping pace with a moving target.

The hard problem: keeping adapters alive

Crawler-based search has a single hard problem (scale). Metasearch has a different one: adapter rot. Forms change. URL structures change. Result formats change. Engines disappear. The paper describes the automation around this - a process for generating, monitoring and repairing adapters at scale, so the system can offer thousands of engines without an army of maintainers.

That automation is what made the difference between a clever demo and a system that ran for almost a decade. T10 handled around 100 million searches a day at peak before pivoting into a search-advertising network and finally closing in 2012 after Google's anti-competitive practices in EU search advertising eroded the addressable market.

Title
The Mechanics of a Deep Net Metasearch Engine
Author
Nigel Hamilton
Conference
IADIS International Conference e-Society 2003 · Lisbon · pp. 1034–1036
Read it
iadisportal.org / The Mechanics of a Deep Net Metasearch Engine

Related: Search Trails: Back to the Future picks up the thread four years later, and A patent for blazing search trails describes the trail-recording method that grew out of this work.

← Back to articles