Friday, February 19, 2010

SharePoint 2007 Search Administration: Scheduling and the UI

Jeremy Williams, Sr. Director, Modern Workplace

As an SSP administrator, one can control and tweak the many aspects of SharePoint Search.  One piece of the puzzle an admin can configure is a content source.  For the uninitiated, a content source is simply a location where SharePoint Search is going to go out and look for files.  Some examples of content sources could be all of your SharePoint sites, a file share, or even your public website.  When configuring a content source, an administrator must choose how often that source will be crawled. 

There are two types of crawls, a full crawl and an incremental crawl.  While there is an (almost) infinite number of schedules one could set up for their content source, I often see a full crawl happening relatively infrequently (once a week or month, depending on content size), and incremental crawls happening quite frequently (every 5-60 minutes).  Why?  Let’s say a user uploads a file to SharePoint and tells their buddy, “Hey, I just uploaded the document about Client ABC to SharePoint, go check it out!”  Their buddy (since he doesn’t know where the file is located) goes right to his trusty SharePoint search box and searches for “Client ABC”.  If an incremental crawl hasn’t run since the file was uploaded, then it won’t appear in the results.  This in turn makes for frustrated users, and an all around ‘bad’ SharePoint experience [but alas, I digress..]

Back to the point, when scheduling an incremental crawl, here’s a fairly typical starting point (picture below):


Everything seems to make sense in there, except for the last two text boxes.   One way to interpret that section is, “Repeat the incremental crawl every 15 minutes and allow it to run for 1440 minutes”.  Based on this interpretation, one may be lead to believe that an incremental crawl could start and run as long as 1440 minutes (that’s 1 full day).  Well, as an SSP admin, I would hate for there to (potentially) be 1 crawl each day, so I’m going to change that 1440 to 20.  Great, I’ve got crawl set up and my users should always be able to find their files (relatively) quickly, right?

Sadly, the answer is No.  That text is actually interpreted as, “Repeat the incremental crawl every 15 minutes, and continue crawling every 15 minutes for 1440 minutes.”  So, in my example above, the text becomes, “Repeat the incremental crawl every 15 minutes continue crawling every 15 minutes for 20 minutes.”  This will yield you exactly 2 incremental crawls everyday, one around midnight, and the other around 12:15…After that, you’ll have to wait until midnight rolls around again. 

For some users, this might have been a “duh” type of post because they interpreted the text and control in the manner Microsoft intended.  However, other users may have incorrectly interpreted the text/controls as my example pointed out…