Thursday, May 30, 2013

When a query is not a query

During a recent round of testing, we had a report of some performance issues related to excessive memory consumption. This was all happening during some critical testing with our business users - very not good. Some quick checking eliminated the DB as the problem source and pointed to both CPU and memory usage on the application server, and I needed to dig in.

My first thought was that they were probably related - I couldn't get reliable GC stats from .Net (via perfmon - several w3wp processes were mixing up their stats), but I could tell that GCs were happening, but not collecting much - .Net believed that memory had a purpose. My next step was to figure out *why* it thought it needed that memory. There are several tools that can provide information like this, but sometimes the best place to start is with the basic tools you can get from Microsoft. I installed the Debugging Tools for Windows (via the Windows SDK installer) on our test server, and copied over an adplus configuration file I keep around for just these occasions (attached). The format for these configuration files has changed a bit in the last version or two of the tools, but the basic idea is the same.

The configuration format is documented at <link>, but basically, this file tells ADPlus to launch a debugger session for each w3wp.exe process on the system, each of which does the following:

  1. Load SOS (a debugger extension that adds a number of commands to help you debug a .Net application)
  2. Reload modules, and try to get symbol information (which helps you get better stack traces)
  3. Dump out:
  4. The list of CLR threads
  5. The managed stack for each thread
  6. Statistics about the number and total size of each type of managed object
  7. GC stats
  8. Detach
  9. All of that information is then written out to log files in e:\dumps (specified in the configuration file), where I can analyze it. After looking at the end of the memory stats, I saw this:

    000007feecd27e10    90736      7071712 System.Collections.Generic.HashSet`1+Slot[[System.Data.Objects.EntityEntry, System.Data.Entity]][]
    
    000007feec5a7d78   183028      7321120 System.Data.Query.InternalTrees.VarRefOp
    
    ...
    
    000007feebe40268   215737      8629480 System.Collections.Generic.List`1[[System.Data.Query.InternalTrees.Node, System.Data.Entity]]
    
    000007feec5e8f30    65994     10205712 System.Collections.Generic.Dictionary`2+Entry[[System.Data.Query.InternalTrees.SubTreeId, System.Data.Entity],[System.Data.Query.InternalTrees.SubTreeId, System.Data.Entity]][]
    
    ...
    
    000007fe9cbb70f8   326963     10462816 <>f__AnonymousType9`2[[CertificationTrack],[StudentCertification]]
    
    000007feec57b798   105522     10974288 System.Data.Objects.EntityEntry
    
    ...
    
    000007feec59f8f0   186921     14953680 System.Data.Metadata.Edm.TypeUsage
    
    000007fe9be0d038    93579     14972640 System.Data.Objects.DataClasses.EntityReference`1[[CertificationTrack]]
    
    000007fef743dc90   843401     20241624 System.Int32
    
    000007feec5792d8   325918     20858752 System.Data.Objects.DataClasses.RelationshipNavigation
    
    000007fef743dc30   423051     22590408 System.Int32[]
    
    000007feec5a53d8   425204     30614688 System.Data.EntityKey
    
    000007fef741f1b8   392221     32182808 System.Object[]
    
    000007feecd07328   175224     57172704 System.Data.Objects.StateManagerValue[]

    Notice all of those System.Data-related objects, and things with CertificationTrack or StudentCertification in them? It looks like we have a DbContext alive somewhere that's tracking a lot of data. That seems odd to me, as we usually don't load a large amount of EF objects from our DbContext - if we're loading a lot of data, it's usually not for update, and hence loaded as a projection of just the data we need, and EF doesn't track projections. Next up, the stack traces - I want to figure out what code could possibly be doing this. Here's an excerpt from the only thread that had our application code on it:

    00000000102fd770 000007feec428ad9 (MethodDesc 000007feec040768 +0x99 System.Data.Objects.Internal.LazyLoadBehavior+<>c__DisplayClass7`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].<GetInterceptorDelegate>b__1(System.__Canon, System.__Canon)), calling (MethodDesc 000007feebf710d0 +0 System.Data.Objects.Internal.LazyLoadBehavior.LoadProperty[[System.__Canon, mscorlib]](System.__Canon, System.String, System.String, Boolean, System.Object))
    
    ...
    
    00000000102fd7e0 000007fe9cbc17ae (MethodDesc 000007fe9bcc5978 +0x4e System.Data.Entity.DynamicProxies.CertificationTrack_2343197D59B1BD2A312E6351989B47F5840D68982CB54A56548D9016568042A1.get_StudentCertifications())
    
    ...
    
    00000000102fd970 000007fef5a21b20 (MethodDesc 000007fef57cd9f0 +0x50 System.Linq.Enumerable.ToList[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>)), calling (MethodDesc 000007fef6dacc50 +0 System.Collections.Generic.List`1[[System.__Canon, mscorlib]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>))
    
    00000000102fd9b0 000007fe9cbc07cb (MethodDesc 000007fe9b032468 +0x5fb GrantRecertificationOrchestrator.GrantRecertification(PersonInfo, PersonRecertificationProgressSummary, System.Collections.Generic.List`1<System.Net.Mail.MailMessage>)), calling (MethodDesc 000007fef57cd9f0 +0 System.Linq.Enumerable.ToList[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>))

    Hmm… See that orchestrator method call to ToList calls a property getter, which then calls something with LazyLoad in its name? I wonder if we're accidentally triggering lazy loading somewhere we aren't intending to. Let's jump to the code in the orchestrator method:

        foreach(var toProcess in dataToProcess)
        {
           
    var certId = toProcess.CertId;
           
    var personId = toProcess.PersonId;
           
    var cert = cx.Certifications.FirstOrDefault(x => x.CertificationID == certId);
           
    if (cert.StartOver)
               
    continue;
           
    var currentCert = (from track in cert.CertificationTracks
                               
    from sc in track.StudentCertifications
                               
    where sc.PersonID == personId
                               
    where sc.IsCurrent
                               
    select sc).FirstOrDefault();
           
    // make updates to currentCert
            cx.SaveChanges();
        }

    (the above is simplified to consolidate all the code together) - look at the query that assigns currentCert. That *looks* like a LINQ-to-EF query that will be efficiently evaluated by the database, but look closer. We load the Certification object for the ID as a full .Net object, then run a query starting from cert.CertificationTracks. Since we didn't .Include() that in our request, it's not already loaded, so it has to hit the DB (there's almost always a single track for this model, so that's not the bad part). Then, for each track, we load *all* the StudentCertification objects into the context (some tracks have 200k StudentCertification objects), and *then* filter by PersonID and IsCurrent. By simply changing to something that *is* an actual EF query:

        foreach (var toProcess in dataToProcess)
        {
           
    var certId = toProcess.CertId;
           
    var personId = toProcess.PersonId;
           
    var cert = cx.Certifications.FirstOrDefault(x => x.CertificationID == certId);
           
    if (cert.StartOver)
               
    continue;
           
    var currentCert = (from track in cx.CertificationTracks
                              
    where track.CertificationID == certId
                              
    from sc in track.StudentCertifications
                              
    where sc.PersonID == personId
                              
    where sc.IsCurrent
                              
    select sc).FirstOrDefault();
           
    // make updates to currentCert
            cx.SaveChanges();
        }

    We no longer do all the lazy loading, and get what we expect - an efficient query that runs on the server to give us the current StudentCertification object for a given user and certification.

    So, why did this consume so much CPU time and memory? Object tracking. When loading a full object from the database, the EF DbContext/ObjectContext remembers the object and the original state of it’s properties, so it can do change detection. This tracking isn’t noticable for small numbers of objects, but when you have 200k tracked objects which you repeatedly reload from the database, it’s not exactly fast, or easy on the memory usage.