The following logic flow seems to come up a lot in social media applications whenever there is some type of event aggregation (i.e. tracking page views, calculating popularity, counting "likes" or "comments," etc.).
If a record exists, increment a counter by one; otherwise, insert the record with a value of one.
On the surface, this isn't very difficult: depending on the technology you're dealing with, you're going to make some kind of query to see if the record exits, and a second to either insert or increment. For applications with a standard SQL backend, this logic can be wrapped up in a stored procedure. If you're using Entity Framework 4.0 (EF4), you can do it all in .NET code with Linq syntax. I've even implemented this logic using a SharePoint SPList as my data store. (Which...looking back...yikes...yeah...don't do that.)
However, irrespective to the implementation, things are not as easy as they seem. This logic flow can easily lend itself to a race condition, whereby one request is incrementing the counter at the same time another is querying it. In the grand scheme of things (of which nothing today is on a grander scheme than social media), dirty reads won't be the end of the world. If one article has 50,000 views and another has 20,000, the data analytics gathered will tell the marking team the same story as if one had 50,001 and the other had 19,999 if race conditions weren't alleviated.
But what if we are building a financial application that leaves absolutely no room for such imprecision? We need a way to ensure that these counters can be reliably incremented and reported against. Thankfully, the good folks on the SQL Server 2008 team have given us a statement that wraps all of this logic up, while at the same time maintaining the database locks necessary to ensure that the race conditions will run away.
This is the MERGE statement.
What I really wanted to talk about in this post was a little issue I had with merge. If you look around online, you'll see several examples where merge is used to do exactly what it was designed to do: literally merge two data sets together, with clauses to instruct SQL Server on what to do in the following cases (parentheticals with quoted text use the merge statement-specific jargon):
- Incoming data (the "source") is not found in existing data (the "target") [usually an insert is performed]
- Source data is found ("matched") in the target [usually an update is performed]
- Target data is not found in the source [optional, but usually a delete is performed when implemented]
My issue was that, well, it didn't work! I had a table with two integer columns: an Id (that was a foreign key into another table) and the actual counter. I created a proc that took in an int Id, and wrapped the logic to either insert or update the corresponding counter in a merge statement. I screwed with it for hours, but couldn't get the syntax right. I was getting annoyed because I kept looking at different examples, and they all seemed straightforward; my code, however, refused to merge!
I was getting weird outcomes. Some attempts of mine would increment every row's counter. Others always inserted new rows for the same Ids. Whenever I was sure that I had the syntax right, inserts would work, but I'd get a SQL error that multiple rows were matched on an update. What was going on? I felt like the examples were mocking me in some way.
And then, after adventuring into the fourth...even fifth page of Bing search results, I had finally grazed through enough examples to discern my situation form the rest: merge really wants to deal with two TABLES for the source and target, not "loose" data (such as the parameters passed to a stored procedure) as the source and a table as the target.
So I added a transaction to my proc, and within it, before my merge call, created a temp table that had one row (with the Id and counter I passed in). Then I merged the temp table with my real table, and just like all the other examples, everything worked just fine!
So the moral of the story is that since SQL wants to merge tables, then give it tables! Create a temp table that matches the schema of your existing table, and insert your input data into that, and then do your merge as follows:
Line #13 performs a "join" between the source and target, as well as isolates the row corresponding to the project page view we want to update. The INSERT and UPDATE clauses are then relative to this row.