Wednesday, September 29, 2010

Treating Umbraco macro references as data

A few months back, I was working on a website for a client using the Umbraco CMS.  Umbraco is an Open Source, .Net-based CMS (the version released in July (4.5) adds support for .Net 4.0).  The thing I like about building a public-facing site with Umbraco is how much control I retain as a developer – the CMS doesn’t dictate very much at all of what I can do on the front end part of the site, and provides a nice interface for users to enter and manage content.  It also provides a nice facility for me to restrict what kinds of content/formatting users can provide – if I don’t allow the option to put custom styles, they’re stripped out.  If I don’t allow anything beyond bold/italic/underline, I don’t have to worry about users (accidentally or intentionally) trying to do more, like adding H1 tags or changing font faces or sizes.  In addition, there are a large number of high-quality packages available from the community to add additional functionality to the product.  In each version, it seems the features of several of these community packages find their way into the main product.

Umbraco’s power and flexibility comes from two places – first, the use of ASP.Net masterpages for templates (including support for nested masterpages).  This means that when I have a need to have a template that is a slight customization of another template, I just need to make sure the parent template has the needed placeholders, and override at will in the child.   The second part of it’s power comes from the ability to create macros (using XSLT or .Net usercontrols, but we’ve mostly been using XSLT) to render content from the tree in new and interesting ways.  In an Umbraco XSLT macro, the entire content tree is just an XPath query away. 

In addition to being able to be referenced directly from a template/masterpage, a macro can also be dropped into a content block (assuming the field allows it) and configured by the user.  However, the only out-of-the-box way to use that user-supplied macro is to render it.  On this project I’ve been working on, we had a need to extract some information from the macro references for use in another template.  In this case, it was to extract information from a video player macro for use in a video sitemap, but the principal applies anytime you’d like to be able to run XPath queries against user-supplied macro references.

The first question is why – why not just create a separate content item for the video in the tree, have the macro reference that, and then use the content item to build the sitemap?  The answer to that is very simple – I’m trying to make it easier on the user.  For the video sitemap, I have to know not only all the metadata about the video, but also where it appears on the site.  If I have a content folder for videos, each video has to have a reference to the page it’s being shown on, and I have to rely on the user to maintain that.  If I create each video content item underneath the page it lives on, I solve that problem, but I make it more complicated for the user to select the correct video item when dropping the macro in, and still have room for the user to make a mistake.  We thought of going one of these ways, but it turned out that the ‘macro as data’ approach was reasonably easy to do, and ran *much* faster than I expected, so there was no reason to complicate things further.

Now, let’s dive into what is needed to make this work.  First, let’s look at what an macro reference looks like in a data node (note: I’m using the 4.0 XML schema here – 4.5 introduces a new XML schema, but it shouldn’t be much different in practice – just change ./data[@alias=’X’] to ./@X and it’ll still work).

      <node id="3723">
        <data alias="bodyTextLeft"><![CDATA[
<p>Pellentesque id vate.</p>
<?UMBRACO_MACRO macroAlias="VideoPlayer"
tags="Suspendisse,Phasellus" description="Duis vel enim sed
nisi mollis congue sed nec arcu." title="Sed in nunc ligula."
<p>Integer elit massa, ultricies a varius id.</p>

As you can see, an Umbraco macro reference is basically a <?UMBRACO_MACRO /> tag inline in the content block.  All we have to do is parse it out of the content, parse the properties, and we’re golden.  XSLT is a text-processing engine, right?  How hard can it be?

Yes, XSLT is a text parsing engine (of sorts), but the lack of regular expressions and advanced string manipulation (since the .Net framework still doesn’t support XSLT 2.0) means this code is much easier to write in .Net.  Luckily, Umbraco supports writing custom XSLT extensions that are callable from your XSLT macros.  Therefore, this will take two parts – one part XSLT template that pre-filters our properties, calls the extension, and processes the results, and one part C# XSLT extension that parses up the property.

Let’s start with the C# XSLT extension.  This is going to be passed a string that is the contents of the <data/> node in the document I showed above.

/// <summary>
/// XSLT extensions, referenced from /config/xsltExtensions.config:
///   <ext assembly="/bin/MyAssembly" type="MyTypeName" alias="MyExtensions"/>
/// </summary>
public class MyExtensions
    private static Regex MacroExpression = new Regex("<?umbraco_macro (.*?)/>",
        RegexOptions.Singleline | RegexOptions.IgnoreCase);
    private static Regex PropertyExpression = new Regex("(\\S*?)=\"(.*?)\"",
        RegexOptions.Singleline | RegexOptions.IgnoreCase);
    public XPathNodeIterator DecodeMacroReferences(string input)
        var doc = new XDocument();
        var macros = new XElement("macros");
        foreach (Match match in MacroExpression.Matches(input))
            var macro = new XElement("macro");
            foreach (Match propMatch in PropertyExpression.Matches(match.Groups[1].Value))
                macro.Add(new XAttribute(propMatch.Groups[1].Value, propMatch.Groups[2].Value));
        return doc.CreateNavigator().Select("/");

Basically, this code runs a regular expression to parse out all the macro references, another to parse the properties out from that, and builds an XDocument with all the information.  To use this, I have to drop my assembly in the /bin directory of my site, and add it to /config/xsltExtensions.config (format is in the comment on the class).  Once I’ve done that, I’m ready to call it from my XSLT.

First, I have to add the setup information for my namespace to the <xsl:stylesheet/> tag:

<xsl:stylesheet xmlns:MyExtensions="urn:MyExtensions" exclude-result-prefixes="MyExtensions">

Now, I can call MyExtensions:DecodeMacroReferences from my XSLT.

<xsl:variable name="macros-output">
  <xsl:for-each select="./data[contains(string(.), 'UMBRACO_MACRO') and contains(string(.), 'VideoPlayer')]">
    <xsl:copy-of select="MyExtensions:DecodeMacroReferences(string(.))"/>
<xsl:variable name="macros" select="msxml:node-set($macros-output)//macro"/>

I'm doing a little bit of pre-filtering before calling my extension method - this is because I'm trying to take an easy step to minimize the number of times the extension gets called. The original code here was inside a for-each element that would traverse the entire document tree (so . was a node element). To use it in a different context, change the reference to 'data'.

Now that I have the macro references as a node-set (by running it through msxml:node-set), I can query it and use XPath to traverse it just like it was any other input document. This means that I can 'see through' macro references my users put into their pages and do something other than render the macro. In my case, that means I can build a video sitemap with the users having to do nothing more than drop the video player where they want it, but the same technique is usable anywhere else you need to treat macros as data.

XML+XSLT+C# can go pretty far. Just another reason why I like Umbraco as a CMS.