Twenty years ago I took on a new client, Ask Jeeves, a search engine that was designed to answered questions about anything using natural language processing. It worked better the more people asked the same question, and if Jeeves didn’t know an answer, it tasked one of the many in-the-flesh librarians and UC Berkeley grad students sitting in cramped cubicles to find the answer.
Obviously, this approach lacked scalability – this was before tech companies outsourced its workforce overseas – so it wasn’t long before Google came up with a scalable search engine based not only on algorithmically determined popularity (Page Rank) but also on the efforts of domain holders to optimize search through SEO. Advertising (and God knows what else) later played and still plays a major role in what results show up first when you conduct a search.
Five years ago, Seymour Rubinstein, whose company launched the first commercially successful word processor, WordStar, in 1979, hired me to position and publicize Webthresher, a search engine he’d been working on for more than a decade. He claimed Webthresher could find more relevant information by scraping other search engines, which you could select, and by using a patented algorithm that identified significance by word length. Basically, the shorter the word, the less relevant.
Webthresher has not yet made it to the commercial stage although it still promises features like complete privacy, and the ability to cache searches into personal files as well as to present an automated synopsis of web sites alongside an index of other possibly related sites.
Taking a different approach is the Internet Archive’s Wayback Machine, developed ten years ago to capture and curate the millions of web sites that might otherwise disappear, as did the Supreme Court opinions at one point. Until now, the Wayback Machine was sourced by researchers, techies, and academics who had to know the exact URL they were seeking in order to conduct a search.
As of October 26, at the Internet Archive’s 20th anniversary celebration at its headquarters in the Richmond district of San Francisco, Brewster Kahle, founder and chief librarian, demonstrated how to search for sites – including one million Wikipedia sites whose broken links were restored by the Archive -- by using common words. Searches are totally ad free and so is use of the Wayback Machine.
The archive is a nonprofit, initially funded by Kahle, who sold his company, Alexa, to Amazon in 1996 for $400 million in stock options. The Archive – which employs 200 people at centers around the world -- is now funded by federal grants and private philanthropies as well as by the people who use it.
The Internet Archive does not like to describe its Wayback Machine as a search engine. It’s true, it’s no replacement for Google if you want to find up-to-date information. But as an engine for searching web sites of the past – such as GIFs on GeoCities and games like the Oregon Trail and recordings of famous speeches as well as music and actual newspaper articles, the Internet Archive’s Wayback Machine is fulfilling its original mission to save, curate, and provide easy access to the world’s knowledge and culture.