Aw man it's been a while! My latest project is in full swing, placing me in that doldrum-ish phase when all the new/fun stuff has been figured out and I'm left with coding through the minutia of the requirements. Between that, playing with my new Windows Phone 7 (which is simply exquisite), and working on a pet project on the side, I haven't had much time to write.
As I attack my blog backlog, I'll start with one of interesting discoveries made not by solely by me, but with the guidance and wisdom of my handsome project manager, Matt Schaub, who is this person:
We got a call from our client that they were seeing random errors on the site. Pouring into the ULS logs, we saw not only crazy COM exceptions in general, but specifically, in corners of our code base that had NEVER blown up. Imagine my panic: this is New Balance's intranet! My baby! Over the months of building and testing, we had scrutinously beaten the application into submission at every possible breaking point. There is no way some new logical unhappy path was discovered that eluded our QA efforts and suddenly brought the site to its digital knees!
My contribution to this resolution was noticing that all of the exceptions (and like I said, these were random COM exceptions: different "HResult" codes and meaningless stack traces) were pursuant to code that made updates across the API. It was Matt who suggested that I take a look at the content database. As a good SharePoint architect, I knew to pretend that the content database didn't exist; I never thought to look anywhere near the database server.
But out of options, I obliged him, as you should always do when your PM adoringly offers technical advice. I logged in, and immediately saw that the hard drive on which SQL stored its data on had only a few scarce kilobytes free; the transaction logs were bloated and went into a post-Thanksgiving-like food coma.
Fortunately, SharePoint is smart enough to understand such a situation and put itself into "read only" mode, keeping the site up and running, but disallowing updates (or anything that could potentially increase the size of the database any further). It had the same behavior as if you explicitly marked the database as read only via SQL Server's Management Studio.
So we called New Balance back, and set their DBAs loose on the problem. They truncated the transaction log (turns out the My Sites web application was the culprit, not our app), and configured some other settings to disallow any further unchecked expansion. As soon as the hard drive was freed up, the database was happy, and so was our site.
This was one of those things that was amazingly absent from Bing; how has no one beaten me to this blog topic? Either we were doing was no heinous that it is embarrassing to share it with the world, or I simply wasn't searching well enough. Either way, I hope this saves you from the panic that ensued with us!