Wednesday, March 23, 2011

SharePoint:Incremental Crawl Vs Full Crawl

Full Crawl: “Full crawl” crawls entire content under a content source (depending upon two settings specified at the time of creating the content source. These settings are “Content Source type” and “Crawl Settings”).

Incremental Crawl: “Incremental crawl” crawls the content which has been added/modified after last successful crawl.

Why do we need incremental Crawl?
Though “Full crawl” crawls every bit and piece of content under a content source but we surely need Incremental crawl as it crawls content which has been added/modified after last successful crawl.
Full crawls will take more time and resource to complete than Incremental crawls. You should consider following facts before going for full crawl instead of incremental crawl.


1.As compared with incremental crawls, full crawls chew up more memory and CPU cycles on the index .
2.Full crawls consume more memory and CPU cycles on the Web Front End servers when crawling content in your farm.
3.Full crawls use more network bandwidth than incremental crawls.

Crawling puts an overhead on resources. If some content is already been crawled and indexed, why do we need to crawl it again? Therefore incremental crawl is used in such cases to take care of any added/modified content after last successful crawl.
There are some scenarios where incremental crawl doesn’t work and you need to run full crawl.

Why do we need Full Crawl?

1. Software updates or service packs installation on servers in the farm.
2. When an SSP administrator added new managed property.
3. Crawl rules have been added, deleted, or modified.
4. Full crawl is required to repair corrupted index. In this case, system may attempt a full crawl (depending on severity of corruption)
5. A full crawl of the site has never been done.
6. To detect security changes those were made on file shares after the last full crawl of the file share.
7. In case, incremental crawl is failing consecutively. In rare cases, if an incremental crawl fails one hundred consecutive times at any level in a repository, the index server removes the affected content from the index.
8. To reindex ASPX pages on Windows SharePoint Services 3.0 or Office SharePoint Server 2007 sites. The crawler cannot discover when ASPX pages on Windows SharePoint Services 3.0 or MOSS sites have changed. Because of this, incremental crawls do not reindex views or home pages when individual list items are deleted.

The system does a full crawl even when an incremental crawl is requested under the following circumstances:
· A shared services administrator stopped the previous crawl.
· A content database was restored. This applies to MOSS and Windows SharePoint Services 3.0 content databases only.


Note: You should not pause content source crawls very often or pause multiple content source crawls as every paused crawl consumes memory on index server.

5 comments:

  1. Where can i follow the steps to do incremental crawling in Nutch 2.x?

    ReplyDelete
  2. Nice post! Really helpful?

    ReplyDelete
  3. This is useful thank you!

    ReplyDelete