Google and other Crawler-Based Search Engines which use crawlers have their listings automatically created and updated. This work is carried out by their software which crawls the web – and what they find is what you can search using their engine. Search engines will find changes on your web page sooner or later, which can have an impact on your listing. Titles, copy in the pages themselves and other elements of your pages can be updated and crawled.
Open directory and some other directories are staffed by humans who create their listings. Website owners can submit a subscription to these directories covering their site. The directory's editors may also write descriptions of the sites which they review. However, when searching these directories, only the submitted descriptions are searched for matches. When you make a change to a page on your site, your listing will not be updated. However, having a site with good content is more likely to be reviewed by the editors than one which is a poor site with substandard content.
There is the crawler (also called a spider or a bot). The crawler visits pages, "reads" it and follows through the links contained to the other pages of the site. The crawler will come back every month or two to see if there have been any changes.
The index (also known as a catalogue) is the second part of a search engine. This contains the findings of the crawlers – every page that the crawlers find is listed here. When changes are made on a web page, the information in the index is also updated. It can take some time before a change is noticed by spiders to make it into the index. Until a page has been not only spidered, but indexed, it will not be searchable using the search engine.
The search engine software is the last part of a search engine. This is the software which sifts through the index, finding and ranking matching by order of relevance to queries.
Not All Crawler-Based Search Engines Work The Same. It's true that all search engines which use crawlers have these same three parts – but these parts are set up a little differently from search engine to search engine. Because of this, you can search the same term on two different search engines and get two different sets of results.
When you search using a Crawler-based Search Engines, you'll receive a list of results ranked by relevance to your search terms. These listings are retrieved in a second or less from among the millions of web pages in the search engine's index.
Search engines aren't perfect though – there will be the odd page which is completely irrelevant and there are times when you have to dig deeper into your search results than you may have expected. Generally speaking though, the search engines do what they do very well indeed.
The problem is that a search engine doesn't have its own judgment to draw on and it also can't learn from experience like we can.
How exactly do Crawler-Based Search Engines determine relevancy given the mind boggling number of pages they must look through? They use a complex algorithm; and the workings of each search engine's algorithm is a jealously guarded secret – but all of them work according to the following basic principles.
The location and the frequency of keywords on a page is part of determining relevance. A web page which has a given search term in its HTML title tag is determined to be more relevant to the subject than are other pages. It is also assumed by search engines that a page which includes the words being searched for near the beginning of the page are more relevant.
The other important factor used by search engines to determine relevance is the frequency of the keywords. A higher frequency will result in the page being determined to be more relevant by the search engine.
Each of the search engines has their own formula for location and frequency, which is why no two search engines will yield the same results. Some of the search engines index more pages – and some index them more often. This means that no two search engines are ever working with exactly the same set of web pages.
Search engine spamming can also cause web pages to be excluded from the index. One example of spamming is to use the same keyword literally hundreds of times on a single page in the effort to improve their place in search engine results. There are a number of things which search engines watch for, as well as looking into user's complaints.
Crawler-Based Search Engines use other criteria to rank web pages, having caught on to the practice of webmasters continually rewriting the copy on their pages to achieve a higher ranking.
These off page factors are much harder for webmasters to do anything about.
Search engines can tell a lot about the content of a page by looking at how pages are linked together, helping them to determine relevance. There are also increasingly sophisticated technologies used to find and discount artificial links.
Click through measurement is one of the other important off page factors. This refers to the behavior of search engine users in relation to which results they actually choose when performing a search. In this way, high ranked pages which are not attracting visitors may drop in rank and vice versa.
There are also systems in place to counterbalance the "black hat" efforts of webmasters to artificially increase their page rankings in this area, just as with link analysis.