A comparison of JavaScript CRDTs

Collaboration is one of the most requested features on uMap. I’ve talked in previous articles how we could add real-time features “the simple way”, by:

This works well in general, but it doesn’t take care of conflicts handling, especially when a disconnect can happen.

For this reason, I got more into “Conflict-free Resolution Data Types” (CRDTs), with the goal of understanding what they are, how they work, what are the different libraries out there, and which one would be a good fit for us, if any.

As things are changing quickly in this field, note that this article was written in March 2024.



Part 1 - What are CRDTs?

Conflict-free Resolution Data Types are a family of data types able to merge their states with other states without generating conflicts. They handle consistency in distributed systems, making them particularly well-suited for collaborative real-time applications.

CRDTs ensure that multiple participants can make changes without strict coordination, and all replicas converge to the same state upon synchronization, without conflicts.

Append-only sets” are probably one of the most common type of CRDT: you can add the same element again and again, it will only be present once in the set. It’s our old friend Set, as we can find in many programming languages.

Why using CRDTs?

For uMap, CRDTs offer a solution to several challenges:

  1. Simultaneous Editing: When multiple users interact with the same map, their changes must not only be reflected in real-time but also merged seamlessly without overwriting each other’s contributions. We need all the replicas to converge to the same state.

  2. Network Latency and Partition: uMap operates over networks that can experience delays or temporary outages (think editing on the ground). CRDTs can handle these conditions gracefully, enabling offline editing and eventual consistency.

  3. Simplified Conflict Resolution: Traditional methods often require complex algorithms to resolve conflicts, while CRDTs inherently minimize the occurrence of conflicts altogether.

  4. Server load: uMap currently relies on central servers (one per instance). Adopting CRDTs could help lower the work done on the server, increasing resilience and scalability.

Traditional data synchronization methods

Traditional data synchronization methods typically rely on a central source of truth (the server) to manage and resolve conflicts. When changes are made by different users, these traditional systems require a round-trip to the server for conflict resolution and thus can be slow or inadequate for real-time collaboration.

In contrast, CRDTs leverage mathematical properties (the fact that the data types can converge) to ensure that every replica independently reaches the same state, without the need for a central authority, thus minimizing the amount of coordination and communication needed between nodes.

This ability to maintain consistency sets CRDTs apart from conventional synchronization approaches and makes them particularly valuable for the development of collaborative tools like uMap, where real-time updates and reliability are important.

Solving complex cases

At first, I found CRDTs somewhat confusing, owing to their role in addressing complex challenges. CRDTs come in various forms, with much of their intricacy tied to resolving content conflicts within textual data or elaborate hierarchical structures.

Fortunately for us, our use case is comparatively straightforward, and we probably only need LWW registers.

Last Write Wins Registers

As you might have guessed from the name, a LWW register is a specific type of CRDT which “just” replaces the value with the last write. The main concern is establishing the sequence of updates, to order them together (who is the last one?).

In a single-client scenario or with a central time reference, sequencing is straightforward. However, in a distributed environment, time discrepancies across peers can complicate things, as clocks may drift and lose synchronization.

To address this, CRDTs use vector clocks — a specialized data structure that helps to solve the relative timing of events across distributed systems and pinpoint any inconsistencies.

A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations.

Wikipedia

CRDTs converging to the same state

Note that we could also use a library such as rxdb — to handle the syncing, offline, etc. — because we have a master: we use the server, and we can use it to handle the merge conflicts. But by doing so, we would give more responsibility to the server, whereas when using CRDTs it’s possible to do the merge only on the clients (enabling no-master replications).

State-based vs Operation based

While reading the literature, I found that there are two kinds of CRDTs: state-based and operation-based. It turns out most of the CRDTs implementation I looked at are operation-based, and propose an API to interact with them as you’re changing the state, so it doesn’t really matter in practice.

The two alternatives are theoretically equivalent, as each can emulate the other. However, there are practical differences. State-based CRDTs are often simpler to design and to implement; their only requirement from the communication substrate is some kind of gossip protocol. Their drawback is that the entire state of every CRDT must be transmitted eventually to every other replica, which may be costly. In contrast, operation-based CRDTs transmit only the update operations, which are typically small.

Wikipedia on CRDTs

How the server fits in the picture

While discussing with the automerge team, I understood that I was expecting the server to pass along the messages to the other parties, and that would be the way the synchronization would be done. It turns out I was mistaken: in this approach, the clients send updates to the server, which merges everything together and only then sends the updates to the other peers. It makes it easy for the server to send back the needed information to the clients (for new peers, or if the peers didn’t cache the data locally).

In order to have peers working with each other, I would need to change the way the provider works, so we can have the server be “brainless” and just relay the messages.

For automerge, it would mean the provider will “just” handle the websocket connection (disconnect and reconnect) and all the peers would be able to talk with each other. The other solution for us would be to have the merge algorithm working on the server side, which comes with upsides (no need to find when the document should be saved by the client to the server) and downsides (it takes some cpu and memory to run the CRDTs on the server)

How offline is handled

I was curious about how offline editing might work, and what would happen when going back online. Changes can happen both online and offline, making no difference for the “reconciliation” step. When going back online, a “patch” is computed by the newly reconnected peer, and sent to the other peers.


Part 2: JavaScript CRDTs

Now that we’re familiar with CRDTs and how they can help us, let’s create a map application which syncs marker positions, on different browsers.

We’ll be comparing three JavaScript libraries: Y.js, Automerge and JSON Joy, considering:

  1. Their external API: is it easy to use in our case? What are the challenging parts?
  2. Community and Support: What is the size and activity of the developer community / ecosystem?
  3. Size of the JS library, because we want to limit the impact on our users browsers.
  4. Efficiency: Probe the bandwidth when doing edits. What’s being transmitted over the wire?

I setup a demo application for each of the libraries. Everything is available in a git repository if you want to try it out yourself.

The demo application

All the demos are made against the same set of features. It

This should probably be enough for us to try out.

Here’s the whole code for this, using Leaflet - a JavaScript library for interactive maps.

import L from "leaflet";
import "leaflet/dist/leaflet.css";

// Create a map with a default tilelayer.
const map = L.map("map").setView([48.1173, -1.6778], 13);
L.tileLayer("https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png", {
    maxZoom: 19,
    attribution: "© OpenStreetMap contributors",
}).addTo(map);

// Features contains a reference to the marker objects, mapped by the uuids
let features = {};

// An upsert function, creating a marker at the passed latlng.
// If an uuid is provided, it changes the coordinates at the given address
function upsertMarker({ latlng, uuid }) {
    if (!uuid) uuid = crypto.randomUUID();
    let marker; 

    if (Object.keys(features).includes(uuid)) {
        marker = features[uuid];
        marker.setLatLng(latlng);
    } else {
        marker = new L.marker(latlng, {
            draggable: true,
            autoPan: true,
            uuid: uuid,
        });

        features[uuid] = marker;
        marker.addTo(map);
        marker.on("dragend", ({ target }) => {
            console.log("dragend");
        });
    }
}

// Add new features to the map with a click
map.on("click", ({ latlng }) => {
    upsertMarker({ latlng });
});

It does the following:

Note that the data is not “reactive” (in the sense of React apps): there is no central state that’s updated and triggers the rendering of the user interface.

Y.js

Y.js is the first library I’ve looked at, probably because it’s the oldest one, and the more commonly referred to.

The API seem to offer what we look for, and provides a way to observe changes. Here’s how it works:

import * as Y from "yjs";
import { WebsocketProvider } from "y-websocket";

// Instanciate a document
const doc = new Y.Doc();

When we add a new marker, we update the CRDT (markers.set).

let markers = doc.getMap("markers");
markers.set(target.options.uuid, target._latlng);

Another connected peer can observe the changes, like this:

markers.observe((event, transaction) => {
  if (!transaction.local) {
    event.changes.keys.forEach((change, key) => {
      let value = markers.get(key);
      switch(change.action){
          case 'add':
          case 'update':
              upsertMarker({ latlng: value, uuid: key, local: true });
              break;
          case 'delete':
              console.log(`Property "${key}" was deleted. ".`);
      }
    });
  }
});

We cycle on the received changes, and then apply them on our map. In the case of an offline peer coming back online after some time, the observe event will be called only once.

I’m not dealing with the case of marker deletions here, but deleted items are also taken into account. The data isn’t really deleted in this case, but a “tombstone” is used, making it possible to resolve conflicts (for instance, if one people deleted a marker and some other updated it during the same time).

Y.js comes with multiple “connection providers”, which make it possible to sync with different protocols (there is even a way to sync over the matrix protocol 😇).

More usefully for us, there is an implemented protocol for WebSockets. Here is how to use one of these providers:

// Sync clients with the y-websocket provider
const provider = new WebsocketProvider(
    "ws://localhost:1234",
    "leaflet-sync",
    doc
);

This code setups a WebSocket connection with a server that will maintain a local copy of the CRDT, as explained above.

It’s also possible to send “awareness” information (some state you don’t want to persist, like the position of the cursor). It contains some useful meta information, such as the number of connected peers, if they’re connected, etc.

map.on("mousemove", ({ latlng }) => {
  awareness.setLocalStateField("user", {
    cursor: latlng,
  });
});  

I made a quick proof of concept with Y.js in a few hours flawlessly. It handles offline and reconnects, and exposes awareness information.

Python bindings

Y.js has been ported to rust with the Y.rs project, making it possible to have binding in other languages, like ruby and python (see Y.py). The python bindings are currently looking for a maintainer.

Library size

The data being transmitted

In a scenario where all clients connect to a central server, which handles the CRDT locally and then transmits back to other parties, adding 20 points on one client, then 20 points in another generates ~5 Ko of data (so approximately 16 bytes per edit).

Pros:

Cons:


Automerge

Automerge is another CRDT library started by the folks at Ink & Switch with Martin Kleppmann. Automerge is actually the low level interface. There is a higher-level interface named Automerge-repo. Here is how to use it:

import { Repo } from "@automerge/automerge-repo";

const repo = new Repo();
let handle = repo.create()
// or repo.find(name)

To change the document, call handle.change and pass it a function that will make changes to the document.

Here, when a new marker is added:

handle.change((doc) => {
    doc[uuid] = cleanLatLng(target._latlng);
});

I had to use a cleanLatLng function in order to not pass the whole object, otherwise it wouldn’t serialize. It’s really just a simple helper taking the properties of interest for us (and letting away all the rest).

Another peer can observe the changes, like this:

handle.on("change", ({ doc, patches }) => {
  patches.forEach(({ action, path }) => {
    if (path.length == 2 && action === "insert") {
      let value = doc[path[0]];
      upsertMarker({ latlng: value, uuid: path[0], local: true });
    }
  });
});

The “patch” interface is a bit less verbose than the one from Y.js. It wasn’t well documented, so I had to play around with the messages to understand exactly what the possible actions were. In the end, it gets me what I’m looking for, the changes that occurred to my data.

Here I’m using path.length == 2 && action === 'insert' to detect that a change occurred to the marker. Since we don’t make a difference between the creation of a marker and its update (when being moved), it works well for us.

Python bindings

There are python bindings for automerge “core”. (It doesn’t — yet — provide ways to interact with “repos”).

Library size

This is quite a large bundle size, and the team behind automerge is aware of it and working on a solution.

The data being transmitted

In the same scenario, adding 20 points on one client, then 20 points in another generates 90 messages and 24,94 Ko of data transmitted (~12 Ko sent and ~12Ko received), so approximately 75 bytes per edit.

Pros:

Cons:


JSON Joy

JSON Joy is the latest to the party.

It takes another stake by providing multiple libraries with a small functional perimeter. It sounds promising, even if still quite new, and would leave us with the hands free in order to implement the protocol that would work for us.

Here is how to use it. On the different peers you start with different forks of the same document:

import {Model} from 'json-joy/es2020/json-crdt';
import { s } from "json-joy/es6/json-crdt-patch";

// Initiate a model with a custom ID

const rootModel = Model.withLogicalClock(11111111);

// populate it with default data
rootModel.api.root({
    markers: {},
});

// Fork it on each client
let userModel = rootModel.fork();

When adding a new marker, we can define a new constant, by using s.con

userModel.api.obj(["markers"]).set({
    [uuid]: s.con(target._latlng),
});

… and then create a patch and send it to the other peers:

import { encode, decode } from "json-joy/es6/json-crdt-patch/codec/verbose";

let patch = userModel.api.flush();
let payload = encode(patch);

On the other peers, when we receive a patch message, decode it and apply it:

let patch = decode(payload);
model.api.apply(patch);

We can observe the changes this way:

userModel.api.onPatch.listen((patch) => {
    patch.ops.forEach((op) => {
        if (op.name() === "ins_obj") {
            let key = op.data[0][0];
            let value = userModel.view().markers[key];
            upsertMarker({ latlng: value, uuid: key, local: true });
        }
    });
});

Similarly to what we did with automerge, we’re having a look into the patch, and filter on the operations of interest for us (ins_obj). The names of the operations aren’t clearly specified by the spec.

Metrics:

Pros:

Cons:


Part 3: Comparison table

I found Y.js and Automerge quite similar for my use case, while JSON Joy was taking a different (less “all-included”) approach. Here is a summary table to help read the differences I found.

Y.js Automerge JSON Joy
Python bindings Yes Yes No
Syncing Native JS structures Transactional functions Specific types (bonus points for handling constants)
Coded in JavaScript / Rust TypeScript / Rust Typescript
Awareness protocol Yes, with presence Yes, without presence No
Conflict-detection API No Yes No
Library size 24.3Ko §
1,74 mb § 143 ko

§ size of the connectors included

Working with patches

In order to observe the changes, we need to inspect the given patches and work on what we find. I found the different libraries expose different sets of APIs. All of these APIs were quite a bit hard to find, and it’s not clear if they are public or private.

One thing to keep in mind is that these “patch” events happen only once per patch received. You can see it as a “diff” of the state between the current and the incoming states.

Conclusion

The goal here is not to tell which one of these libraries is the best one. They’re all great and have their strenghs. None of them implement the high-level API I was expecting, where the clients talk with each other and the server just relays messages, but maybe it’s because it’s better in general to have the server have the representation of the data, saving a roundtrip for the clients.

I wasn’t expecting to have a look at patches to understand what changed at the low level. The way it’s currently implemented is very suitable for “reactive” applications, but require more involvement to sync between the CRDTs state and the application state.

In the end, adding CRDTs is made very simple, probably due to the fact all we really need is a sort of distributed key/value store.

I’m not sure yet which library we will end up using for uMap (if any), but my understanding is clearer than it was when I started. I guess that’s what progress looks like 😎

Extra notes

YATA and RGA

While researching, I found that the two popular CRDTs implementation out there use different approaches for the virtual counter:

  • RGA [used by Automerge] maintains a single globally incremented counter (which can be ordinary integer value), that’s updated anytime we detect that remote insert has an id with sequence number higher that local counter. Therefore, every time we produce a new insert operation, we give it a highest counter value known at the time.
  • YATA [used by Yjs] also uses a single integer value, however unlike in case of RGA we don’t use a single counter shared with other replicas, but rather let each peer keep its own, which is incremented monotonically only by that peer. Since increments are monotonic, we can also use them to detect missing operations eg. updates marked as A:1 and A:3 imply, that there must be another (potentially missing) update A:2.

Resources

#crdts , #umap , #sync - Posté dans la catégorie code