Curls, clouds and code

Implementing pagination with GraphQL.NET and Relay

Trigger warning:
In this article I am bashing quite a bit on traditional REST api’s. Not that you can’t have a great architecture with REST api’s, but from my experience these types of API’s are badly implemented more often than not. Same goes for the tech described in this article. If used incorrectly you’ll end up voluntary amputating one of your limbs.

As a friend noted: “Talk to REST and get a certain object, or ask GraphQL and retrieve some information”.


0. Contents

  1. Introduction
  2. Background
    2.1 Cursor Based Pagination
    2.2 Connection Specification
    2.3 Specification, but simplified
    2.4 The Connection’s Structure
    2.5 Generating Cursors
  3. Overview
  4. Architecture
  5. The GraphQL Connection Endpoint
  6. Data Retrieval
    6.1 [Contextual] Model and Graph Types
    6.2 Resolving Arguments
    6.3 Data Filters
    6.4 Slicing for Pagination
    6.5 Creating the Connection object
  7. Overview of a real-world implementation
    7.1 The Connection
    7.2 Data access function
    7.3 Defining order on a connection
  8. Other Resources

1. Introduction

Pagination is always a topic that’s a bit tricky to tackle. The goal is to give as much freedom as possible for (client side) queries, while staying in control of the data that goes out.

GraphQL is no difference, but thankfully the Relay extension has been developed on top of GraphQL. Relay consists of a set of conventions which solidifies the GraphQL specification a bit more so that client side tools can quickly scaffold application logic for said imaginative api.

For the assumptions Relay makes about a GraphQL server, see this document.

2. Background

While quite poorly documented in the GraphQL.NET docs (at the time of writing), functionality for Relay compatible endpoints has been implemented. Some of this logic resides in the main graphql-dotnet package, while other tools and helpers reside in the relay package.

Because Relay is only a specification of assumptions about interfaces and available methods, it is fairly easy to implement yourself. Consider most of the helper methods provided by the graphql-dotnet/relay package as suggestions, and think about writing specialized variants which suit your use-cases. More on that later.

If you are already familiar with the design philosophy that goes with GraphQL, and indirectly with the graphql-dotnet package, feel free to skip to the overview.

2.1 Cursor Based Pagination

The biggest part of the Relay specification consists of details on the Connection types. A connection is a type which which describes a list of objects, and arguments to filter and slice this list. The arguments available in a connection by default:

These 4 arguments make up for a powerful concept which is called cursor based pagination. With this model every object has an unique identifier (you most certainly already have one already anyway), which you can use in order to create slices of data.

These slices of data are relative to one another. For the first call to the api you could provide the first argument to a connection. For all subsequent calls you would use the cursor of the last element in this list to retrieve a new slice of data.

This type of pagination, although it takes a bit more work implementing than traditional offset based pagination, is a delight to work with. On the client side it allows you to implement both traditional page based pagination methods, as well as more progressive and user-friendly paradigms. Another improvement over offset based pagination is that this method works fine with realtime data. A lot of complexities that are inherent to combining realtime data and offset based pagination just simply disappear.

If you look on Google you’ll find plenty of reasons why we should stop doing offset based pagination. Honestly I do not care about the technical implications at all, but the thing is that it’s just not user-friendly. The beginning of this article digs into that a bit deeper.

2.2 Connection Specification

For a detailed specification about connections and cursors, see this document: https://facebook.github.io/relay/graphql/connections.htm. Item 4.3 of the Relay specification covers expected behaviour of the pagination algorithm.

2.3 Specification, but simplified

The specification describes the way data should be retrieved. The following steps could be followed:

  1. Apply your data filtering operations
  2. If after is specified:
    • Take all data starting from this cursor
  3. If before is specified:
    • Take all data before this cursor
  4. If first is specified:
    • Take the specified amount of entities from the beginning.
  5. If last is specified:
    • Take the specified amount of entities from the end.
  6. Return data

The specification defines the cases where all parameters (after, before, first and last) are specified. There are few use-cases where using all these parameters at the same time will make any sense at all. It does provide some consistency and predictability to the use of cursors, though.

While it is possible to run all filter operations on the .NET runtime, I would totally NOT recommend doing so. It would have the implication that all data has to be loaded locally, which does seem to result in the infamous OutOfMemoryException in my case. If it works for you now but your data starts scaling later, it would be incredibly difficult to keep it all running smoothly after a while. I recommend to invest some time researching the different options for cursor based pagination within your data provider. E.g. MsSql, MySql, ElasticSearch or whatever your tool of choice is. If you still decide you want to shoot your own foot off and do pagination in-memory, go ahead. There are some methods available in the graphql-dotnet/relay project for which I have created the following extension method:

public static Connection<TSource> ToConnection<TSource, TParent>(
    this IEnumerable<TSource> items,
    ResolveConnectionContext<TParent> context,
    int sliceStartIndex,
    int totalCount,
    bool strictCheck = true)
{
    return ToConnection(items, context, sliceStartIndex, totalCount, strictCheck);
}

But again, this is definitely the unfun way to do it! If this is all you’ve been looking for you can close this article now. This article will further elaborate on offloading these filter and slice operations to an external data-store.

2.4 The Connection’s Structure

The connection is not only these four arguments you can slice your data with. Another part of the connection types are some subtypes to help query through the data. The main structure looks as follows:

friends(first: 10, after: "opaqueCursor") {
  edges {
    cursor
    node {
      id
      name
    }
  }
  pageInfo {
    hasNextPage
  }
}

As taken from this example.

Cursor based pagination with SQL Server using C# (and SQL)

Looking for some code examples?

0. Contents

  1. Introduction
  2. Requirements
  3. Proof of conept
    3.1 OFFSET and FETCH
    3.2 The subquery method
  4. Implementation

1. Introduction

On my quest to find an usable form of pagination (for realtime purposes) I learned about cursor based pagination. With this concept you retrieve chunks of data relative to surrounding data. If you are not yet familiar with this concept, check out this answer on StackOverflow which descibes it well and concise.

In this article we will explore one way to implement cursor based pagination for SQL Server. In order to make our lives easier we will continue with some C# code, in which we are going to build a helper method to apply cursor based pagination techniques to (SqlKata) queries.

The C# code will be compliant with .NET Standard 2.0. We will also use SqlKata as a dependency to help us constructing and building the SQL queries.

About using Entity Framework
I have tried using EF Core to implement cursor based pagination, but figured out it was not a viable option at the time of writing due to lack of support for the required methods. In the meantime I have created a feature request for implementing SkipWhile() in the Entity Framework Core repo. Though I realize implementing a single method wouldn’t be sufficient enough to support bidirectional pagination, but would be a start.

2. Requirements

Cursor based pagination (as described in the GraphQL Relay specification) consists of four arguments which are resolved in a certain way:

The goal is to resolve these four arguments in a way that we can take a query, and apply these slicing arguments regardless of the contents of said query. One of the more important implementation details is that we should be able to order our resultset however we want, and the cursor should (usually) be based on the primary key.

Although GraphQL is no part of this post, we’re mostly following the GraphQL Relay spec detailing how these four arguments should be resolved.

Note that the implementation we’re creating deviates from the specification in the way that combined usage of the first and last properties is handled differently. This implementation will select the overlap between these two properties based on the used cursors, if applicable. See the sketch below:

3. Proof of concept

Just to verify we can do cursor based pagination with SQL Server at all, we did some experiments to see whether SQL Server would be capable enough to suit our needs. With two possible, and somewhat generic options available we went on to see which would best suit our needs.

The cornerstone of both methods involves the ROW_NUMBER() function. This function numbers all rows according to the provided order arguments. Note that this function starts numbering records at 1.

3.1 OFFSET and FETCH

The first method is the method shown below. We rely on the OFFSET and FETCH methods to retrieve our data in the specified slices.

Note that [Query] is defined as a CTE (Common Table Expression).

SELECT * FROM [Query] ORDER BY [Timestamp] DESC OFFSET (
    SELECT TOP 1 [RowNumber]
        FROM (
            SELECT 
                [Query].Id,
                ROW_NUMBER() OVER(ORDER BY [Timestamp] DESC) AS [RowNumber]
            FROM [Query]
        ) [Temp]
    WHERE [Temp].Id = '0b12d2c9-1c38-4ffe-8177-0671c5d3b709'
) ROWS
FETCH NEXT 20 ROWS ONLY

The idea is that we’re building a sorted index of our data in-memory. This approach requires minimal mutation of the query we are working with. However, this method would have the following implications:

Another implication of this method is that we cannot easily iterate backwards over our data-set. While the FETCH documentation mentions a PRIOR keyword, we cannot use this as it requires an active cursor, which involves locking records for data retrieval. On of the only available options to possibly mitigate this would be to invert the ORDER BY clauses.

Therefore I conclude this method is not suitable for one of the following argument combinations:

3.2 The subquery method

The second approach involves butchering the query in order to include an additional (RowNumber) field. With this field injected into the CTE we get the ability to query more flexibly.

While I did not run any benchmarks, some rough timing that the CPU time on this method is slightly higher than with the offset/fetch method, but the elapsed time is usually cut in half.

Resulting queries look a bit like the one below. Each of the four arguments will resolve to a WHERE clause which checks the RowNumber field.

SELECT *
FROM [Query]
WHERE [RowNumber] > (
    SELECT [RowNumber]
    FROM [Query]
    WHERE [Id] = '136f30f6-cf73-4754-a1cf-aecf4d6f6cd7')
  AND [RowNumber] < (
    SELECT [RowNumber]
    FROM [Query]
    WHERE [Id] = '05528934-DE90-43AF-ABE3-8535433FE53A')

As with the offset/fetch method we have to remove the ORDER BY clauses. This time however we will inject the order statements back into the CTE as part of our ROW_NUMBER selection (see beneath).

ROW_NUMBER() OVER(ORDER BY [Timestamp] DESC) AS RowNumber

The amazing thing about this approach is that we can get bloody fast results when we know the exact row numbers to query, this however is not usually the case. Even then this method is faster than the offset/fetch method discussed earlier, and we have the flexibility to slice our data however we want by using the four arguments as described in chapter 2.

Even then, this method is faster than the offset/fetch method, and we’re quite flexible to implement this however we want it to.

4. Implementation

In order to limit complexity throughout the application we aim for an extension method which can be used in a generic way. In order to achieve this we can impossibly depend on the contents of the query. However due to the dynamic nature of the query argument we are required to make certain assumptions:

Heads-up: As I am no expert on SQL, these are two obvious cases on the top of my mind. There are probably many more catches, but this framework should suffice for most of your everyday query slicing needs.

The Slice method also does not return a data-set. Doing so would require tight coupling between query generation logic and some information about the database (credentials etc), which we are trying to stay away from.

[Pure]
public static Query Slice(
    this Query query,
    string after = null,
    int first = 0,
    string before = null,
    int last = 0,
    string column = "Id")
{
    var queryClone = query.Clone();

    // Manually compile the order clauses for later use in the query
    var order = new SqlServerCompiler()
        .CompileOrders(new SqlResult
        {
            Query = queryClone
        });

    queryClone.Clauses.RemoveAll(q => q.Component == "order");
            
    if (string.IsNullOrWhiteSpace(order))
        throw new Exception($"{nameof(query)} does not have an order by clause");

    queryClone.SelectRaw($"ROW_NUMBER() OVER({order}) AS [RowNumber]");

    var internalQuery = new Query()
        .With("q", queryClone)
        .From("q");
            
    // Select all rows after provided cursor
    if (!String.IsNullOrWhiteSpace(after))
    {
        internalQuery.Where("RowNumber", ">",
            new Query("q")
                .Select("RowNumber")
                .Where(column, after));
    }

    // Select all rows before provided cursor
    if (!String.IsNullOrWhiteSpace(before))
    {
        internalQuery.Where("RowNumber", "<",
            new Query("q")
                .Select("RowNumber")
                .Where(column, before));
    }

    // Select the first x amount of rows
    if (first > 0)
    {
        // If the after cursor is defined
        if (!String.IsNullOrWhiteSpace(after))
        {
            internalQuery.Where("RowNumber", "<=",
                new Query("q")
                    .SelectRaw($"[RowNumber] + {first}")
                    .Where(column, after));
        }
        // If no after cursor is defined
        else
        {
            internalQuery.Where("RowNumber", "<=", first);
        }
    }

    // Select the last x amount of rows
    if (last > 0)
    {
        // If the before cursor is defined
        if (!String.IsNullOrWhiteSpace(before))
        {
            internalQuery.Where("RowNumber", ">=",
                new Query("q")
                    .SelectRaw($"[RowNumber] - {last}")
                    .Where(column, before));
        }
        // If we have to take data all the way from the back
        else
        {
            internalQuery.Where("RowNumber", ">",
                new Query("q")
                    .SelectRaw($"MAX([RowNumber]) - {last}"));
        }
    }

    return internalQuery;
}

Note that the SqlKata library does not have pure functions for performance reasons. While I usually choose to use the same conventions when writing extension methods for a specific library or framework, it would not make sense to do so in this case. The Query object is mutated in such a specific way that the only use case for a non-pure function would be to introduce obscure bugs.

The techniques described in this article came to be after a lot of fiddling by trial and error. Given my lack of experience writing SQL I am aware that there are most certainly optimization available to this logic. If you know about an optimization or improvement to this code, please let me (us) know!