Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Optimistic concurrency #1004

Open
bart-degreed opened this issue May 25, 2021 · 6 comments
Open

RFC: Optimistic concurrency #1004

bart-degreed opened this issue May 25, 2021 · 6 comments

Comments

@bart-degreed
Copy link
Contributor

This RFC describes design considerations for enabling end-to-end optimistic concurrency in JsonApiDotNetCore. The topic was discussed at json-api/json-api#600 and json-api/json-api#824 in an effort to add this to the JSON:API spec, but the conversion seems to have stalled.

Introduction

By default, partial patch allows users to update different attributes of the same resource in parallel. For example, user A updates attribute X while user B updates attribute Y. The result is that both changes are applied. When multiple users update the same attribute, the last one wins. This has great scalability and is often the desired behavior.

But when the attributes of a resource are related, these partial changes are undesired. For example, changing the city of an address while another user changes the zipcode should fail, as merging both changes results in incorrect data. With optimistic concurrency enabled, the second request fails because the original resource was modified in-between. This can be detected by requiring that the client sends the original version, which the server compares with the stored version. This trades scalability for stronger consistency.

Our implementation for optimistic concurrency should provide a great experience when used with EF Core. In EF Core, optimistic concurrency is achieved by tagging an entity property to act as version stamp. The database then ensures that the version of the record changes each time the record is updated. To detect a collision from another user that updated the record in-between, the WHERE clause of UPDATE and DELETE statements get an extra condition:

UPDATE User SET FirstName='Joe'
WHERE ID=1

becomes:

UPDATE User SET FirstName='Joe'
WHERE ID=1 AND Version=3

In case the statement doesn't affect any rows, EF Core translates this to a concurrency exception. This is an all-or-nothing experience, ie there is no way to skip this check.

The conflict detection also affects relationship updates: if the value of a foreign key column changes, then the record version changes too. It could be debated whether this behavior is desired, because linking to a related resource that was changed since it was retrieved is considered a conflict. One could suggest to fetch the related resource as part of the Relationship Update request processing, but that only reduces the chance of failure: the related resource could still change between retrieving it and saving the changes (it is not transactional). Such an approach is like fighting against how EF Core works, without being able to guarantee consistent behavior.

The biggest problem is that for a Delete Resource request, the client has no way to supply the original version in the request. The same applies for a Patch Relationship request.

Solutions

  1. Send the version in a query string parameter or http header. This works nicely for a Delete Resource request, but for updating a to-many relationship this needs to be an array of versions. For an atomic:oprations request, this needs to be an array per operation. In practice, the length of URLs and header values is limited, so this poses a restriction on requests that affect many resources. It also requires escaping, which makes it harder to read. Lastly, having this information separate from the resources it applies to makes it harder to debug.

  2. Send the version in the request body
    This adds a version field to the 'data' json element, which becomes required for resource types that use optimistic concurrency. This means a Delete Resource request now requires a request body. For a Patch Relationship request, the 'data' element refers to the right side of the relationship, thus there is still no way to send the version for the left side of the relationship. An atomic:operations request would require adding a version field to the 'ref' element.

  3. Send the version in the request body and URL parameter
    This solves the problem for Patch Relationship requests, but requires an extra route parameter in all non-GET endpoints, which is a breaking change. But we would no longer need a request body for Delete Resource. We need to add extra action methods on our controller base class that take a versioned route. And we'd need to change the signature of resource service methods and various other places. Even if we postpone this to the next major release, it would still complicate existing APIs to upgrade. It also goes against all URL examples in the JSON:API spec.

  4. Encode the version in the resource ID
    This sounds like an odd solution at first, because the version part has a validating characteristic instead of being part of the resource identity. But it allows us to implement this in a non-breaking way. A separator like ';v~' makes a collision with an existing ID unlikely, while preventing ASP.NET Core to escape it. By ignoring the version part on GET requests we make it easy for clients to follow links. But the client needs to cut off the version from the ID string before storing it, to be able to uniquely identify the resource at a later time. Likewise we'd need to ignore the version part when used in a filter on ID. Alternatively, we could consider to make the version part optional in GET requests, yet strongly match on it when specified. This would return a 404 on version mismatch, without telling the client the correct version, so it does not seem very useful. ETags provide a better solution for such conditional requests, because they return the new data if the version has changed.

After various discussions, we prefer 4, but are open to feedback.

Aside from these, we'll need to ensure rendered links include the version, because a client should be able to patch/delete against that URL. For an atomic:operations request, we should track the last-assigned version while processing the list of operations and prefer the tracked version over the one in the request body, because due to foreign key changes versions are subject to change after each operation. We need to take special care when sparse fieldsets are used that we silently fetch the version as well.

Implementation

Depending on the type of database used, version stamps have different data types. It could be a DateTime or DateTimeOffset, a uint or even a byte[]. EF Core allows multiple columns to become part of the version. API developers need to map this from/to a string value that is exposed to the API client. Example:

// JADNC

public interface IVersionedIdentifiable : IIdentifiable
{
    public string Version { get; set; }
}

// API

public sealed class Address : Identifiable, IVersionedIdentifiable
{
    [Attr]
    public string ZipCode { get; set; }

    [NotMapped]
    public string Version
    {
        get => RowConcurrencyToken.ToString();
        set => RowConcurrencyToken = uint.Parse(value);
    }

    // PostgreSQL-specific column type
    public uint RowConcurrencyToken { get; set; }
}

public sealed class AppDbContext : DbContext
{
    protected override void OnModelCreating(ModelBuilder builder)
    {
        builder.Entity<Address>()
            .Property(address => address.RowConcurrencyToken)
            .HasColumnName("xmin")
            .HasColumnType("xid")
            .ValueGeneratedOnAddOrUpdate()
            .IsConcurrencyToken();
    }
}

Please let us know what you think. Did we miss anything? Should we implement this as proposed?

@maurei
Copy link
Member

maurei commented May 25, 2021

I'd like to point out a few more implications of option 4.

Given the following part of the spec

Within a given API, each resource object’s type and id pair MUST identify a single, unique resource.

this approach could be taken as a violation of the spec, because strictly speaking id no longer together with type identifies a resource, but instead only a part of the id field is required for that (because, as you described in detail, the remaining bit only serves for validation, not for identification). On the other hand, the spec only states that id must be a string, i.e. it does not state that the entire value of the id field as a whole must be used in identifying. It could be leaving the door open for encoding additional information. It's a grey area, but I think we should give it the benefit of the doubt.

In a different way of looking at it, one could consider the conjunction of the id and the version the "real id", therefore disregarding difference between the validating characteristic and identifying characteristic of the two parts in the id. However, this introduces a new problem: what is considered uniquely identifying is now no longer consistent over all endpoints. For example, for a GET endpoint the id can be trimmed to leave out the version and the request can still succeed, which implies that for this endpoint the id is defined as that smaller chunk data. For a PATCH endpoint however, the identification would be incomplete. I think this breaks with the intuitive notion that what defines an id should be equal across all endpoints of an API.

One can try to get around this problem by taking away any knowledge from a client of there being a version encoded in the id, for example by scrambling the value, but this would result in having non-constant id values for the same resource when for example repeatedly GETting a resource. This to me also clearly breaks with an intuitive notion of how ids should behave.

All in all, even though approach 4 has some drawbacks, I still think it is the best option we have considering the limitations in length of HTTP headers/query strings for option 1 and the inability of creating a spec extension without breaking with backward compatibility if we were to introduce an version field in the body for approaches 2 and 3.

@bjornharrtell
Copy link
Contributor

I have experience with this type of requirement in conjunction with a JSON API. We implement versioning fully in the data model so it is not a generic solution but fully separated from the JSON API fundamentals. We also physically separate historic versions from the current versions in the database and require conflicts to be explicitly resolved before "merges".

I have difficulties seeing clear benefits of a generic solution and I worry about the complexity it will add to this project. That said it is just my initial perhaps unfounded feeling.

@ThomasBarnekow
Copy link

Looking at option 4, how would that work with SQL Server, where the rowversion is a binary(8) (or varbinary(8) if it is nullable)? Further, could you provide an example of how this would look like from a client's perspective, e.g., for an Identifyable<Guid>?

@bart-degreed
Copy link
Contributor Author

@bjornharrtell Thanks for sharing your thoughts. What you're describing sounds like Temporal tables. EF Core support for them is on the roadmap for v6. Assuming such historic data is available, it becomes possible to resolve conflicts server-side, which provides a better client experience compared to what's proposed here. Without it, the best a server can do is detect that 'something changed' and leave it up to the client to re-fetch and try again, and that's what this proposal is about.

This design describes an opt-in feature that API developers can choose to activate for the subset of resource types within their API that require stronger consistency. It does not affect existing APIs and its clients. Existing APIs/clients that use non-versioned resources are oblivious to this, we're not going to break them by always sending/requiring a version where we didn't in the past.

I suspect that implementing this in JADNC is not complex nor a large amount of changes. It requires special handling in a few key locations (atomic:operations support is a little trickier) and input validation. Our main challenge is to ensure we've put the right set of integration tests in place for when this feature is enabled, so that the right error messages are produced etc.

One problem this solves is being unable to use the default authentication in ASP.NET Core: the generated database tables (AddEntityFrameworkStores() in https://docs.microsoft.com/en-us/aspnet/core/security/authorization/secure-data?view=aspnetcore-5.0) all have these tokens. Another reason for adding this to JADNC is that some users expect this to 'just work' because EF Core supports it. It protects against silently overwriting changes from other users, which can be a requirement in some business domains (security, payments, order inventories etc).

@ThomasBarnekow I intend to experiment with this using MSSQL too, and possible add an example to the documentation. I'm thinking to base64- or hex-encode the byte array in the getter/setter of IVersionedIdentifiable.Version to convert to/from string. Note that a nullable rowversion doesn't have much effect (see https://stackoverflow.com/questions/42215930/adding-a-nullable-rowversion-column-to-a-table).

What I've come up with so far is the following, but these are implementation details and subject to change:

  • Add IJsonApiRequest.PrimaryVersion, which contains only the version part from the ID value in URL. This type is injectable, so by using this property we avoid a breaking change in controller/service/repository methods. For the existing IJsonApiRequest.PrimaryId, we'll cut off the version part, if applicable.
  • The former concerns the primary ID aka left side of a relationship. Because the right side of a relationship already uses IIdentifiable in method signatures, we can do a safe cast to IVersionedIdentifiable and obtain the version from there.
  • IIdentifiable.StringId contains the full ID value from URL or request body, including version. IVersionedIdentifiable.Version contains only the version part and existing IIdentifiable<T>.Id contains the typed ID value (int, guid, long, string etc) as-is. We'll enhance the existing conversion to/from Id/StringId to check if the resource implements IVersionedIdentifiable and special-case for that.

So to answer your question, when using a Guid primary key instead of int, the only line that changes in the example is:

public sealed class Address : Identifiable<Guid>, IVersionedIdentifiable

@bjornharrtell
Copy link
Contributor

Sound good @bart-degreed and it removes my concerns.

@bart-degreed
Copy link
Contributor Author

From the solutions in the first post, I've tried to implement option 3 (Send the version in the request body and URL parameter) in #1119. I've come quite a long way but got stuck for now. So I thought I'd share what I came up with, so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants