Tuesday, November 8, 2011

SharePoint 2007 Timer Job Woes

Jeremy Williams, Sr. Director, Modern Workplace

Scenario: You realize one day that you are no longer receiving alerts from SharePoint.  Upon investigation, you notice that you can create a new alert and receive the notification email that an alert has been created for you.  However, you receive no notification emails after the first one is sent for that particular alert.  Upon further investigation, you discover that no timer jobs have been executed in a couple of days.

Solution One: Thinking that it’s the typical case of a dead SharePoint Timer Job, you go and restart the SharePoint Timer Job Service.  After restarting the windows service, waiting patiently, (and impatiently running stsadm –o execadmsvcjobs) you realize that this isn’t the problem, so you continue to …

Solution Two: You decide to attempt clearing the Timer Job cache directory on the SharePoint server…This is better documented at Joe Rodgers Blog.  Essentially, you navigate to the All Users/Program Data directories, and clear out the files in the SharePoint—>Config—>GUID folder, then restart the timer service and wait for it to rebuild (and execute) all of the files.  After going through the same patient waiting, you (sadly) realize that this didn’t fix the issue either.

Sanity Check: Go through all of your service accounts and their related services and make sure that they are:

  • Running
  • Running under the correct Identity

After going through your Sanity Check, try the first two solutions…just to make sure you aren’t going crazy (after all, it’s possible)

Safety-First: Now you’re (probably) starting to get a bit nervous about the overall health of your SharePoint 2007 Server (and depending on how many users are angry at you, you might be nervous about your own health)…  Before proceeding, take a deep breath and MAKE SURE YOU HAVE BACKUPS.  I don’t care if you need to call up your off-site tape storage people and have them make an emergency delivery…get those backups!! [It’ll be the cheapest insurance you’ve ever not-paid-for]

Big Hammer Solution: You’re going to be running the SharePoint Products and Technologies Configuration Wizard…This will (effectively and hopefully) clean up whatever issue you were having in your environment this whole time.  Remember that running this will take SharePoint offline in a BIG way (for a minimum of 5 minutes, and with no conceivable upper-bound).  Once you’re ready, go ahead and launch the application from start…

…Hiccup 1:  You’re presented with the following error message, “unable to upgrade SharePoint products and technologies because an upgrade is already in progress”.  Of course, that’s quite annoying, but don’t worry the fix is relatively straightforward [note: my steps below are modified from those presented here: http://blogs.technet.com/b/manjesh/archive/2009/11/10/unable-to-run-sharepoint-configuration-wizard-error-unable-to-upgrade-sharepoint-products-and-technologies-because-an-upgrade-is-already-in-progress.aspx]

  1. Stop the Windows SharePoint Services Timer service.
  2. On the SharePoint Server where Central Administration Site is hosted browse to C:\Documents and Settings\All Users\Application Data\Microsoft\SharePoint\Config\<guid>
  3. Move all the xml files to another location.
  4. Also backup the cache.ini file. Then Edit the cache.ini and replace the current value with "1".
  5. Restart the Windows SharePoint Services Timer service. New xml files will start appearing in the guid folder.
  6. Verify the cache.ini now contains its previous value (or verify that the value is no longer "1"; any non-1 value is fine).
  7. Run the command from the sharepoint server "stsadm -o setproperty -pn command-line-upgrade-running -pv no" (without quotes)
  8. Run the command from the sharepoint server "stsadm -o execadmsvcjobs" (without quotes)
  9. Open up Central Administration and Navigate to Timer Job Status (under Operations).  Look for a One-Time job of ‘Upgrade Job’.  If you find it, delete it.
  10. Run SharePoint Products and Technologies

Big Hammer Solution (cont’d): Alright, so now your SharePoint server is running through it’s paces on the upgrade/configuration…  Step 8 will probably be the longest step in the world…so go grab some coffee, relax, etc…  If it seems that it’s hanging on Step 8, you can run this command from the command prompt.  The (following) command will effectively cancel your previous upgrade efforts, and will restart everything.  Once you execute this command, it’s best to step away and stop looking at the system…You’ll only serve to drive yourself a bit mad.  After (what could very well be a few hours) the job will complete and your farm should (once again) be in a sustainable, and running state.

psconfig -cmd upgrade -inplace b2b -wait -force

Validating it all worked: Go ahead and check your Timer job to make sure that it’s executing jobs now…Assuming that it is, you can go ahead and make sure that your Alerts are working again too.

Phew!  I’m glad that’s over…Now what the FLIP happened?!?!?!  In the particular case that I’m walking through above, there were a couple of factors at play…  There was a SQL server failure, service account identity changes, and a previous upgrade (service pack/cumulative update) that appeared to never have completed…  By running SharePoint Products and Technologies Configuration Wizard (psconfig), we allowed SharePoint to recompile (read: recompose) itself a bit, and also execute the upgrade job properly.  Once this job was able to complete, the timer job had the ability to execute jobs as designed. 

And yes, I realize that I didn’t actually answer what happened here, since I’m not terribly sure of the actual root cause.  A service pack/CU package may have been halted mid-way, or perhaps SQL went offline during a patching window, or perhaps the gremlins that run SharePoint were just a bit angry the particular day this all went down.  However, the solution above should help anyone in a similar situation, since the issues encountered represent some fairly serious SharePoint issues…