Curled Cloudy Code

Generic data loaders for Entity Framework in GraphQL

This article contains specific information potentially useful for users of both the GraphQL .NET and Entity Framework Core libraries. This article described the process to get by the result. If you just want inspiration for a generic data loader, feel free to skip to the end.

GraphQL is an amazing framework for creating (re)usable API’s. Don’t ask me why, but I ended up stuck with Entity Framework Core as my data access layer. When you are trying to develop a well thought out, usable and performant API you will get to use the data loader rather sooner than later.

Figuring out how to work with the data loader I absolutely refuse to write the same kind of code over and over again. I did not feel like writing a different data loader resolver for each and every entity type I need to retrieve in my API. If this is also the case for you, read on!

The dataloader itself

Feel free to skip this part if you know about the dataloader. This part is mainly for people just starting out with the dataloader, or completely unfamiliar with it.

Just to refresh on the data loader mechanism shipped with the GraphQL.net library. This mechanism is only there to help you retrieve data somewhat efficiently in batches. Imagine the following data structure (from one of my hobby projects):

flight {
    departure {
        airfield {
            name
        }
        time
    }
    arrival {
        airfield {
            name
        }
        time
    }
}

The departure and arrival objects are of the same type, but represent a different piece of information. The way this information would be retrieved without dataloader is as follows:

  1. All queried flights are retrieved from the database
  2. For each flight result a database request for departure information is made
    • This departure type requests information about the airfield this flight departed from
  3. For each flight another request for it’s arrival information is made
    • For which a request for airfield information is made again.

For a single result in the result set (which is usually about 20 items long), about 5 requests need to be made. For a full page of information this results in 100 database roundtrips, which obviously is not performant.

Assuming we have a 15ms latency to the database server from our dev machine, and each database call takes about 5ms, this would become a 2 second call. All but performant.

By using the dataloader to batch our requests based on the entities we are loading we can reduce this to three different calls (3% of the previous number of roundtrips):

  1. A call for the flight entities
  2. A request to retrieve flight information (departure and arrival)
  3. A request for airfield information

We’ll get to the technical implementation of a dataloader later in the aricle.

What do we need?

There is something common about every data loader I have ever written. It’s either one of the following two:

Practical examples include these queries:

q => ids.Contains(q.Id)

or

q => ids.Contains(q.ParentId)

The differences are in the way we load the data. The first one will usually have a 1:1 mapping, while the 2nd example will usually involve a 1:many relationship. The common ground is that we always aim to retrieve a single entity type. There are only a few differences in each data loader method we usually have to write manually:

  1. The type of data loader being initialized
  2. The type for which we create the dataloader
  3. The predicate provided

Working with expression trees

LINQ queries are totally amazing! I love them, and I’d prefer to use these above weakly typed strings any time. In my GraphQL endpoints I only care about the data I want to request, and not so much the implementation details related to the dataloader. Therefore my goal: I want to provide a predicate, and a value for provided predicate, and retrieve all records where this evaluates to true.

I found using LinqPad highly beneficial while working with expression trees. You can easily dump the result of a subexpression to see what you’re working with.

The predicate shall be provided in the Expression<Func<T, TKey>> form. This could be used to provide a LINQ predicate like q => q.Id.

We face two problems:

  1. In order to be able to query data we need to use the Expression<Func<T, TKey>> in a way that we get to a Expression<Func<T, bool>> in order to use it as argument in the IQueryable<T>.Where(Expression<Func<T, bool>>) method. It can be compared with writing a T.Where(q => q.Id == 1) query.
  2. We do not only need to write a where query, but we need to compare values from a database to a list we created locally. (EF Core will translate this into a SQL IN ('', '', ...) statement.)

The combination of those two is interesting because we are going to be calling value methods through expression trees.

The code we would usually be writing will look a bit like this:

dbSet.Where(q => list.Contains(q.Id));

Let’s go through our variant built with expression trees.

Calling the contains method

var containsExpression = Expression.Call(
    Expression.Constant(ids),
    typeof(ICollection<TValue>)
        .GetMethod("Contains", new[] { typeof(TValue) }),
    predicate.Body
);

The Expression.Call method takes 3 arguments:

  1. The object to call a method on (of the type ICollection<TValue>)
  2. The method to call (on a type of ICollection<TValue>, with a single argument of TValue)
  3. The arguments (in this case of TValue) to invoke the method with.

If we were to dump the contents of the predicate body in LinqPad we can see what it could look like:

Expression body contents

The result we have now would look like the list.Contains(q.Id) section of a linq query. It will form the body of our LINQ final query.

Building the full expression

What’s left for us to do is to wrap the call in a where clause, and provide an actual value to run the expression with. This actual value will be provided by the .Where linq method.

var expression = Expression.Lambda<Func<T, bool>>(
    containsExpression,
    predicate.Parameters
);

As predicate.Parameters seems pretty vague at first sight; it’s nothing more than this data structure, which tells us variable q represents the type of Flight (it can be any type you want it to be if it’s generic):

predicate.Parameters

The full result

So to make a quick recap, these are the extension methods I am using to create a data-loader for a certain type.

public static class GenericDataLoader
{
    public static Expression<Func<T, bool>> MatchOn<T, TValue>(
        this ICollection<TValue> items,
        Expression<Func<T, TValue>> predicate)
    {
        return Expression.Lambda<Func<T, bool>>(
            Expression.Call(
                Expression.Constant(items),
                typeof(ICollection<TValue>).GetMethod("Contains", new[] { typeof(TValue) }),
                predicate.Body
            ),
            predicate.Parameters
        );
    }

    /// <summary>
    /// Register a dataloader for T by the predicate provided.
    /// </summary>
    /// <typeparam name="T">The type to retrieve from the DbSet</typeparam>
    /// <typeparam name="TValue">The value to filter on</typeparam>
    /// <param name="dataLoader">A dataloader to use</param>
    /// <param name="dbSet">Entity Framework DbSet</param>
    /// <param name="predicate">The predicate to select a key to filter on</param>
    /// <param name="value">Value to filter items on</param>
    /// <returns>T as specified by the predicate and TValue</returns>
    public static async Task<T> EntityLoader<T, TValue>(
        this IDataLoaderContextAccessor dataLoader,
        DbSet<T> dbSet,
        Expression<Func<T, TValue>> predicate,
        TValue value)
        where T : class
    {
        if (value == null) return default;

        var loader = dataLoader.Context.GetOrAddBatchLoader<TValue, T>(
            $"{typeof(T).Name}-{predicate.ToString()}",
            async (items) =>
        {
            return await dbSet
                .AsNoTracking()
                .Where(items
                    .ToList()
                    .MatchOn(predicate))
                .ToDictionaryAsync(predicate.Compile());
        });

        var task = loader.LoadAsync(value);
        return await task;
    }

    /// <summary>
    /// Register a dataloader for an IEnumerable<T> by the predicate provided.
    /// </summary>
    /// <typeparam name="T">The type to retrieve from the DbSet</typeparam>
    /// <typeparam name="TValue">The value to filter on</typeparam>
    /// <param name="dataLoader">A dataloader to use</param>
    /// <param name="dbSet">Entity Framework DbSet</param>
    /// <param name="predicate">The predicate to select a key to filter on</param>
    /// <param name="value">Value to filter items on</param>
    /// <returns>IEnumerable<T> as specified by the predicate and TValue</returns>
    public static async Task<IEnumerable<T>> EntityCollectionLoader<T, TValue>(
        this IDataLoaderContextAccessor dataLoader,
        DbSet<T> dbSet,
        Expression<Func<T, TValue>> predicate,
        TValue value)
        where T : class
    {
        if (value == null) return default;

        var loader = dataLoader.Context.GetOrAddCollectionBatchLoader<TValue, T>(
            $"{typeof(T).Name}-{predicate.ToString()}",
            async (items) =>
        {
            var compiledPredicate = predicate.Compile();

            return dbSet
                .AsNoTracking()
                .Where(items
                    .ToList()
                    .MatchOn(predicate))
                .ToLookup(compiledPredicate);
        });

        var task = loader.LoadAsync(value);
        return await task;
    }
}

To summarize above code:

What’s next?

As I’m progressing with the GraphQL dotnet library I’m starting to understand more and more about the design principles used while building this framework. This enables me to abstract more logic away than ever before. Right now I’m working on some additional general purpose data-loaders and I’m planning to release them as NuGet package somewhere soon.

I’ll let you know when this library is available.

5 habits for writing reliable software

Practice shows that writing reliable software is extremely difficult. It’s not as if writing code is so difficult. Even monkeys can write code. No, it’s the edge cases that make it difficult, undocumented behavior, maybe even worse is unexpected behavior. Black boxes which do not show what’s going wrong deep within the program, and when an error finally gets the change to bubble up the program crashes, with no trace left behind. I’ll share with you 5 of my habits for writing reliable code;

1. Fail hard, fail fast

Basically it’s the other way around. If anything goes wrong, try to fail fast, and then fail hard. Let your application be vocal about how it feels. Let it throw a tantrum when something goes wrong. A parent will probably be able to handle it.

Throw errors if you cannot handle it properly, rethrow errors whenever applicable, and please, keep the stack trace intact!

In case you are not entirely sure what to do and the application (state) might be corrupted, don’t feel bad to make the app crash. Crashing applications are usually better then running a crippled application which might cause continuous damage while running.

2. Avoid duplication

Code duplication also means bug duplication. Limit the amount of code you have and I promise you’ll still be amazed by the amount of bugs in it. Besides that, it’s just flat out annoying and embarrassing when you fixed a bug only to discover the bug still exists in the same code in a slightly different place.

One part of avoiding code duplication is making sure that you know where to find certain methods for certain tasks. There are too much occurrences where the same code has been written multiple times because the function was hidden in some obscure namespace or class.

Having duplicate code throughout the code base may be an indication that there’s something wrong with your architecture. Go fix these issues!

3. Write automated tests

Small changes in code can have big impact throughout your application. As the application grows you will miss things, forget about pieces of code and lose details about the application. Automated tests will save your ass.

Tests are not the law though. If a test fails you can either change the tested method or the test function. In case of the latter, just be sure you know what you’re doing!

If you’re just starting out, and you’re wondering how you test your code, a good begin is to think about all the small tasks your code has to do. Write tests which describe the desired output for your task, write your task and test it. Writing tests forces you to think about the purpose of a block of code. Doing so will generally result in better code reuse throughout the application.

4. Use logging

Writing stable code is amazing, but there will still be bugs. Part of stable code is minimizing the downtime and damage caused by bugs. In case there is a bug in a production environment (of course you’re not testing in production), it’s extremely beneficial to have a proper stack trace. Even better would be to have a log of events leading up to the bug so you quickly have an idea of what is going wrong.

Please be aware that there’s such a thing as too much logging. Emitting hundreds of log messages a minute might be usable while debugging but it’ll be a total disaster to spit through these in production. In addition this might result to real errors being missed due to the high volume of messages. And let’s be honest, logging code scattered throughout your codebase is just ugly.

5. Prepare for the worst

Believe it or not, even if you apply all of this advice you will still encounter bugs in the worst possible of times. It’s due to Murphy’s law:

“Anything which can go wrong will go wrong” 

Calculate your risk, and prepare for the unexpected.

Writing code isn’t hard. Writing stable software is.