History of Search Technology

Web did not have good search engines for a long time. The first search engines did not even analyze page copy; they only looked at titles and had no ranking criteria. As the convenience and commercial potential of search engine became more obvious, more advanced systems were developed.

Excite was the first serious commercial search engine. It was developed in Stanford and was purchased for $6.5 billion by @Home. In 2001 Excite and @Home went bankrupt and InfoSpace bought Excite for $10 million.

At the time the first search engines were rolling out, web directories were still strong competitors, primarily because of poor search results, and later on, because of spam and abuse.
Parts of the Search Engine
There are three main parts to every search engine:

Web Interface

A spider crawls the web. It follows links and scans web pages. All search engines have periods of deep crawl and quick crawl. During a deep crawl, the spider follows all links it can find and scans web pages in their entirety. During a quick crawl, the spider does not follow all links and may not scan pages in their entirety.

The job of the spider is to discover new pages and to collect copies of those pages, which are then analyzed in the index.
Crawl Rate
Pages that are considered important get crawled frequently. For example, the New York Times may be crawled every hour or so to put new stories in the index. Less authoritative sites with less PR are crawled less frequently, even as rarely as once a month. The crawl rate depends directly on link popularity and domain authority.

If many links point to a website, it may be an important site, so it makes sense to crawl it more often than a site with fewer links. This is also a money-saving issue. If search engines were to crawl all sites at an equal rate, it would take more time overall and cost more as a result.
When you search using a web interface (like Google.com), in many cases results are already presorted to a certain extent. The degree to which results are presorted depends on the complexity of the algorithm. If the time to apply an algorithm to the index is considerable, then that algorithm is applied in advance. On the other hand, some algorithms are applied at the time when the search query is requested.


