Indexing robots

How does a robot decide where to visit?

This depends on the robot, each one uses different strategies.
In general they start from a historical list of URLs, especially of documents with many links elsewhere, such as server lists, "What's New" pages, and the most popular sites on the Web.

Most indexing services also allow you to submit URLs manually, which will then be queued and visited by the robot.

Sometimes other sources for URLs are used, such as scanners through USENET postings, published mailing list achives etc. Given those starting points a robot can select URLs to visit and index, and to parse and use as a source for new URLs. How does an indexing robot decide what to index? If an indexing robot knows about a document, it may decide to parse it, and insert it into its database.

How this is done depends on the robot: Some robots index the HTML Titles, or the first few paragraphs, or parse the entire HTML and index all words, with weightings depending on HTML constructs, etc. Some parse the META tag, or other special hidden tags. We hope that as the Web evolves more facilities becomes available to efficiently associate meta data such as indexing information with a document. This is being worked on...

How do I register my page with a robot? You guessed it, it depends on the service :-) Many services have a link to a URL submission form on their search page, or have more information in their help pages. For example, Google has Information Information for Webmasters.

Search

Pages

Indexing robots

Still None Awesome so far » Be the 1^st Awesome to Indexing robots

Post a Comment

Search This Blog

Popular Posts

Features

Archieves

Followers

Latest 10 Comments

Search

Pages

Indexing robots

Still None Awesome so far » Be the 1st Awesome to Indexing robots

Post a Comment

Search This Blog

Popular Posts

Features

Archieves

Followers

Latest 10 Comments

Still None Awesome so far » Be the 1^st Awesome to Indexing robots