Optimizing Entity Framework (3): It’s all SQL at the end of the day

So now that we have our integration tests ready to freely experiment and optimize our code, we can easily attach a profiler such as dotTrace to our tests and we’re ready to have informed decisions about our bottlenecks instead of educated guesses.

In the test’s fixture setup, you might have noticed a commented line EntityFrameworkProfiler.Initialize(), this tells EF profiler to start monitoring your EF usage, namely the DbContext instances being created, the SQL being generated and sent to the server and the data returned. It is important to comment that line out when doing doing execution profiling – with dotTrace or any other profiler – otherwise it will mess up your stats as it adds to the overhead of some operations showing them as bottlenecks in the trace when they are not.

Before we start, forget about DbContext

Depending on your architecture, you will probably find yourself creating many DbContext instances when initializing your Repository or UoW. These could be created on a Repository constructor, or in its simplest form, you will have some variation of this code repeated as many times as the database operations you have:

using(DbContext context = new SomeDbContext())
{
    // Do something with DbContext
}

At first glance, without measurement, this feels like an obvious bottleneck to your application, you might feel tempted to cache your DbContext or share it somehow … don’t! Unless you really really know what you’re doing, this will create more problems than it will solve.

Quoting an MSDN blog post:

Some people are, quite understandably, concerned about the cost of recreating the ObjectContext again and again. The reality is this cost is actually pretty low, because mostly it simply involves copying, by reference, metadata from a global cache into the new ObjectContext. Generally I don’t think this cost is worth worrying about, but as always, there will be exceptions to that rule.

DbContext is just a wrapper around ObjectContext so the same statement applies to it. For me, even after reading such statements, I was still suspicious about the cost of creating these contexts, but profiling and measuring my application, it did turn out that the cost is actually negligible both in absolute and relative terms.

EF generates SQL .. DuH!

Everything that you’re doing with EF will eventually be transformed into SQL. As naive as you might think this observation is, it’s easy to lose track of that fact thinking that you’re simply working on objects. Most of your optimization will fall in the category of optimizing this generated this SQL, and I found that EF can really surprise you when it comes to the SQL that it generates.

Here is a very contrived example of a very simple model compromising of Artists, Songs and ProductionCompanies. We want to write a query to return all songs that were produced by a company based in “Sudan”. I know that I have three songs produced by a Sudan-based company, here are three variations of how to get this information

[Test]
public void ShouldReturnThreeSongs_Query1()
{
    using (var context = new MusicContext())
    {
        var sudanBasedCompanies = context.Companies
            .Where(company => company.Country == "Sudan")
            .Select(company => company.Id).ToList();

        var result = context.Songs
            .Where(song => sudanBasedCompanies.Contains(song.ProductionCompany_Id));

        Debug.WriteLine(result.ToString());

        Assert.AreEqual(3, result.Count());
    }
}
        
[Test]
public void ShouldReturnThreeSongs_Query2()
{
    using (var context = new MusicContext())
    {
        var sudanBasedCompanies = context.Companies
             .Where(company => company.Country == "Sudan");

        var result = context.Songs
             .Where(song => sudanBasedCompanies.Contains(song.ProductionCompany));

        Debug.WriteLine(result.ToString());

        Assert.AreEqual(3, result.Count());
    }
}

[Test]
public void ShouldReturnThreeSongs_Query3()
{
    using (var context = new MusicContext())
    {
        var result = context.Songs
            .Where(song => song.ProductionCompany.Country == "Sudan");

        Debug.WriteLine(result.ToString());

        Assert.AreEqual(3, result.Count());
    }
}

The three methods are returning 3 rows as expected but let’s look at the SQL they generate.

Query1 uses the IN keyword:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[IsClassic] AS [IsClassic], 
[Extent1].[ProductionCompany_Id] AS [ProductionCompany_Id], 
[Extent1].[Artist_Id] AS [Artist_Id]
FROM [dbo].[Songs] AS [Extent1]
WHERE [Extent1].[ProductionCompany_Id] IN (1,2)

while Query2 used WHERE EXISTS:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[IsClassic] AS [IsClassic], 
[Extent1].[ProductionCompany_Id] AS [ProductionCompany_Id], 
[Extent1].[Artist_Id] AS [Artist_Id]
FROM [dbo].[Songs] AS [Extent1]
WHERE  EXISTS (SELECT 
	1 AS [C1]
	FROM [dbo].[ProductionCompanies] AS [Extent2]
	WHERE ([Extent1].[ProductionCompany_Id] = [Extent2].[Id]) AND (N'Sudan' = [Extent2].[Country])
)

And Query3 used INNER JOIN:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Name] AS [Name], 
[Extent1].[IsClassic] AS [IsClassic], 
[Extent1].[ProductionCompany_Id] AS [ProductionCompany_Id], 
[Extent1].[Artist_Id] AS [Artist_Id]
FROM  [dbo].[Songs] AS [Extent1]
INNER JOIN [dbo].[ProductionCompanies] AS [Extent2] ON [Extent1].[ProductionCompany_Id] = [Extent2].[Id]
WHERE N'Sudan' = [Extent2].[Country]

This is just a simplified example to show how difficult it is to understand what EF is going to do unless you peak into the generated SQL. In my real-life case, I had a variation of Query1 except that the list of IDs I was matching against contained 15,000 IDs not just 2. This meant that a huge SQL statement was sent to the server everytime this seemingly-innocent query was called. Bad, embarrassing and gives your DBA more reasons to be mean (not that they need any!). In this real-life project, changing EF to use a JOIN created the sensible SQL statement I had in mind in the first place, and saved a huge amount of data being sent on the network.

This is just one example, but other changes like the order where you call .Single (after .Where as opposed to replacing .where), or calling .First() – or any other extension method – after materializing your list as opposed to adding it to your IQueryable, and many many other similar small changes will have a big impact on the SQL generated.

So the morale of the story is, turn on EF tracing (by using EF Profiler, adding IQueryable.ToString() in your Debug code or by using SQL Server Profiler) and monitor how EF interprets your code. You’ll be surprised.

Next: More optimizations


Comments Closed

Comments are closed.