Crawling The Web
Source: Tom's Hardware – Keywords: second, hand, smoke
Crawling The Web
Conversely, All the Web and Northern Light want to cover as many pages on the Web by indexing content. Crawling the Web and trawling this kind of data requires a lot of hardware power, and gives these companies bragging rights. From the background information I have read on both companies, I believe that Northern Light has indexed 150 million Web pages, and FAST Search and Transfer , the Norwegian company behind All the Web, has indexed over 200 million pages, and wants to go as high as 500 million. I particularly like the way Northern Light organizes search results into a menu of folders and sub-folders, which makes it very easy to sort through the links by category. For example, if my "Joe Bloggs" search digs up a number of articles written by Joe for Zippity Magazine, I might get a Zippity folder with all those links sorted in there, as well as the general listing based on the appearance of the term. The one very annoying thing about Northern Light is that they have a number of documents that they refer to as Special Collection, in effect, these documents are for sale by Northern Light, and frankly, they're a pain when they appear in your search because, it's almost as bad as calling up a product catalog when you don't want anything for sale. In addition, most of the documents are out of date, and very expensive. All the Web just spews out a great big list. It ain't pretty, but if you are a purist, it makes for many interesting hours of chasing down links and information. In fact, All the Web has become my first port of call when I am doing research, followed by Google, and then Northern Light.
I asked Tom Wilde, senior product manager at FAST's offices in Westboro, Massachusetts, how his company perceives the future of search engines. Let me preface the question by saying that the general consensus of opinion among search engine experts is that the Web cannot be indexed completely, and never will be so, there has to be a number of different methods for searching and collating results.
Wilde said, "From a technical standpoint, many of today's search engines will begin to reach their limitation in terms of catalog size, and as a result, much of the Web's new content will not be available for searching. New developments in search will include new algorithmic approaches to solving the relevancy challenge such as Google and Ask.com . These solutions are not particularly capital intensive and can be defensible if done correctly."
"Spidering the entire Web has become an extremely complex problem to solve that does require significant capital to create the appropriate infrastructure. Eventually the Web will become so large that software-only based solutions will not be adequate to handle the size. A combination hardware/software approach will be needed to deliver fast, relevant results to users. FAST's Pattern Matching Chip is an example of this evolution. With dedicated hardware handling the processing, the user will experience a very fast, fixed response time to their queries regardless of the complexity of the query. This is not true of software only solutions, where the response time increases almost geometrically with query complexity. The hardware solution will manifest itself most clearly in the "alert" or "reverse search" model."
- Previous page Search Habits
- Next page Wilde Explains
- Second Hand Smoke - The Phony War
- Overclocking AMD's Athlon Processor
- Performance-Showdown between Athlon and Pentium III
- Tom's Blurb: Athlon Motherboards
- KryoTech's Cool Athlon 800MHz
- Second Hand Smoke - The PowerPoint Generation
- Siggraph 1999: Nvidia and SGI
- Tom's Blurb: Athlon Day
- Second Hand Smoke - Dictum de dicto
- Tom's Blurb: 3D Chips and Cards