Anatomically speaking, Encoding is the heart of SUESS. (I guess, to stretch out the metaphor, that makes Silverlight the head, SmoothStreaming the eyes and ears, and I guess Uploading the digestive system.) Encoding, implemented via Expression Encoder, is what makes the process of media uploading feasible to end users.
By "feasible" I really mean "usable" - people want online functionality to be really easy to understand. Usability 101 tells us that the more hoops people have to jump through, the less inclined they will be to use your site, regardless of how pretty the UI is. The hoops here are presented by the fact that Silverlight does not support every media format under the sun. We can't simply upload a random file and then watch it; Silverlight has to "know" how to play it.
Do you think if people could only upload Flash media to YouTube it would have been nearly as popular? Of course not. I'm surprised that abilities like uploading media and capturing live web cam feeds are exoteric enough for the general Internet-goer. But encoding a movie or image? Not only is it an extra hoop, but it's simply too much to expect from a user.
By automating the encoding process, we can enable people to provide IIS SmoothStreaming media to their website's audience with just as much effort as it would take to upload an image. SmoothStreaming itself is the other side of the coin: we need to encode anyway if we want to take advantage of this new media streaming technology! So by putting in all this effort to solve these intrinsic media usability problems (large files, wide arrays of formats, etc.) we are awarded by having the otherwise heavily-guarded SmoothStreaming gate automatically unlocked for us! (See the final post on SUESS for more information about IIS SmoothStreaming; we still need to open the gate, of course!)
Those are the two reasons why we encode. The next question, which leads us into our architectural discussion, is of course: how? I went with Microsoft Expression Encoder 3.0 (4.0 was still in beta during development). Here are some of the determining factors that lead me to this decision:
- .NET API that exposes progress events
- Much wider file support than native Silverlight
- Explicitly supports encoding to the IIS SmoothStreaming format
And, I didn't feel like dealing with the Media Player SDK, buying and kluding a third party control, or reverting back to a "Web 1.0" paradigm of an upload screen that says something like "Thank you for uploading your media. Please try back later to see it when it's done because we couldn't figure out a more elegant way to implement our site's media administration."
I wanted this to be first class.
Let's start by first taking a look at the detailed architecture of the Encoder. Back to everyone's favorite Visio diagram:
Basically, there are three major sub components running on the Media Server tier of SUESS. (And boy am I ever excited to have built something complex enough to have "sub components!") The first one is the Media Service WCF endpoint. This is a standard WCF service that, after the Silverlight client completes the Uploading phase of SUESS, it calls to kick off the encoding process.
The interface for the Media Service is very simple, containing only two methods: Encode and Cancel. I tried to choose really intuitive names here; "Encode" performs the encoding, and "Cancel" cancels it. Encode wraps the Expression APIs and does all the work needed to get our media into the IIS SmoothStreaming format.
There is also a web.config file that goes along with the Media Service to store the application settings (along with all the WCF goo). These dictate the size of the thumbnail images, and the locations of both the temporary upload path for our raw media files as well as the destination for the IIS SmoothStreaming-formatted files. These settings, combined with the file name passed in from the Uploader, is all the Encoder needs to do its thing.
There is one more quite architectural consideration to point out before we jump into code. The Expression Encoder APIs are 32-bit only. This is generally not an issue if you are hanging out in 2007 or earlier, because starting with Windows Server 2008, 64-bit has become the de facto standard in making developers crazy. That's not to say that 32-bit applications won't work; .NET handles all of that "WOW" stuff for us.
However, when you try to have a 64 bit process (such as a Windows Service, or an ASP.NET app like SharePoint) call into the 32 bit Encoder DLLs, things die. If your first instinct (like mine) was to compile the Media Service using Visual Studio's "x86" or "Any CPU" configurations, you won't get there. The problem isn't that one piece of code can't talk to another; it's that the entire process needs to be in the same architecture.
So to circumvent this, we need to be running our entire stack in 32-bit mode. Whether it's a Windows Service set explicitly to compile to x86 or a 64-bit IIS application pool that has 32-bit support enabled, since Encoder requires us to be in 32-bit world, we need a 32-bit process. The clear choice here is to use IIS and flip one setting (App Pool -> Advanced Settings -> Enable 32-Bit Applications -> True) verses going through all the hell of building, deploying, and configuring a Windows Service.
Windows Services suck.
The second sub component is the Encoder API itself. Let's look at the Encode method first to get an idea of everything that's going on programmatically. Then, we're going to jump back to the WCF side of things are discuss the bi-directional communication (awesome).
The main workhorse of the Expression Encoder API is the Job object. I couldn't find any formal documentation on the API, so I sort of reverse engineered it by mapping class names in Visual Studio to menu options in the Encoder product. Whenever you encode something using the application, you create a new "Job" and set a bunch of options on it. This is pretty much the same way I learned SharePoint many years ago; the Encoder API, although undocumented, is more intuitive than SharePoint's, so I was able to hack my way though it a little easier.
So let's take a look. Line #5 is the WFC bi-directional stuff, which we'll look at in a bit. Line #9 instantiates our Job object. It's global in the service because, skipping down to Lines #75-77, where we hook the events, we need to refer back to properties on the Job object. Again, we'll look at the event handlers themselves when we move back from the Encoder to the WCF/Silverlight communication logic. Lines #10-14 round out the Job initialization code, calling helper methods that simply grab values from the web.config file.
Starting at Line #16, we get into the meat and potatoes of media side of the Encoder API. Encoder allows you control a vast array of characteristics of an encoding process. The first one here is the video complexity (which I covered in the Uploader post). Basically, this enumeration dictates an arbitrary ratio between video quality and encoding speed. While this particular property is a bit black