model-based systems engineering (MBSE) Eye tracking in VR: Everything you need to know
X
Definition

conflict-free replicated data type (CRDT)

What is a conflict-free replicated data type (CRDT)?

A conflict-free replicated data type (CRDT) is a data structure that lets multiple people or applications make changes to the same piece of data. In metaverse applications, CRDTs let one person make a change in a shared environment, such as the location of a piece of equipment, and have the change automatically propagated to everyone else in the same virtual world. CRDTs are widely used in collaborative applications such as Google Docs, Apple Notes and the design tool Figma to ensure changes made by one user are seen by all.

CRDTs are also used in certain distributed systems to propagate changes across databases and file systems shared by multiple participants. For example, in peer-to-peer applications, the InterPlanetary File System (IPFS) uses CRDT to ensure consistency of file changes on different servers. Redis, an open source in-memory data structure store, also uses IPFS to update database changes on multiple servers staged closer to various users.

The data structure is a good choice for applications that need to support concurrent updates. For example, when two users make different edits to the same sentence in Google Docs, CRDT provides a simple mechanism for conflict resolution. It is also relatively easy to implement. It can deliver a reasonable degree of consistency, but it's not perfect and is unsuitable for high-stakes financial systems, systems of record changes and other systems that require strong consistency guarantees. Alternative consensus protocols and algorithms such as two-phase commit (2PC), Paxos or Raft might be better choices in these cases.

Use cases

Common use cases for CRDTs include the following:

  • Chat applications.
  • Collaborative editing.
  • Auctions.
  • Data replication.
  • Gaming.
  • Shared virtual environments.
  • Map updates.
  • Distributed streaming applications.
  • Distributed caching.
  • Internet of things (IoT) data synchronization.
  • Desktop and mobile synchronization.

Types of CRDT

There are three broad types of CRDT to propagate changes for different copies of data. They include the following:

  1. Operation-based CRDTs focus on propagating updates across the different copies of the data and are more efficient since they only send changes. However, operation-based CRDTs require a more sophisticated broker to ensure changes aren't lost or duplicated owing to a bad internet connection or lost wireless signal, which could result in multiple retries.
  2. State-based CRDTs send the complete state across the network, which is merged with the copies for each client. State-based CRDTs are easier to develop but require significantly more network and processing overhead.
  3. Delta-state CRDTs are a hybrid between state- and operation-based CRDTs. They distill recent state changes and only send these rather than the entire state. They're more efficient than state-based approaches and less efficient than operation-based approaches. They also have more local processing and complexity to distill local changes.

Data consistency challenges

CRDTs can be challenging to work with, particularly when they're used in distributed environments. The following are some issues to be aware of:

  • Network failures can break updates.
  • Latency can confound the order of updates.
  • Offline updates might not propagate correctly.
  • Simultaneous editing can scramble changes.
  • Updates can conflict with one another.
  • Some kinds of updates can't be represented.

Eventual consistency vs. strong consistency

When multiple people work on the same document or collaborate in a shared world, it's important to keep everyone on the same page, a state known as consistency. But sometimes, things can get out of sync. For example, if you move the location of a piece of equipment to the right, and your colleague moves it up, where should it be once all the changes sync up? And what happens when two people put different virtual items in the same spot?

A spectrum of consistency in the progression of changes has the following ranges:

  • At one end, eventual consistency guarantees that all copies of the data eventually converge to the same state. This might be fine when the pace of updates is relatively slow, and people aren't changing the same things simultaneously.
  • In the middle, strong eventual consistency ensures that all copies of the data converge once everyone receives the same updates. This works well for simple things like numerical counters or incremental changes to different parts of a document, but problems can occur when various types of changes are possible, as in a 3D shared virtual environment.
  • At the other end of the spectrum, strong consistency ensures that all changes are made on a copy that accurately reflects everyone else's changes. Although this ensures everyone stays on the same page, you have to wait for everyone else to complete editing before you can start making a change.

Different types of CRDTs can be implemented using various algorithms to enforce these levels of consistency. CRDTs generally excel at eventual and strong eventual consistency, but they aren't ideal for strong consistency.

How are CRDTs developed?

Developing a CRDT starts by breaking down a specific problem into some combination of components that can then be combined into data structures to create the desired user experience and level of consistency. Those components include the following:

  • A counter is useful to capture incremental changes, such as the number of letters in a specific change in a document.
  • A set reflects a collection of things, such as all of the characters in a word, sentence or paragraph when considered as a whole.
  • A sequence captures the location of things, such as where a paragraph appears in a document.

Various CRDT algorithms are better suited for making changes across these basic constructs, so it's important to figure out which algorithms to use and their order of execution. Different combinations will make sense depending on the use case. For example, for a shopping list, it might make sense to start with algorithms that prioritize adding new items. In a document editor application, it could be better to prioritize algorithms that fluidly update individual words or characters. A collaborative spreadsheet might need to prioritize changes to individual cells, while a shared database might prioritize changes to individual records.

After identifying the best combination of algorithms, you must test the implementation to identify any problems for the intended use case. For example, in a document editor, testing might prioritize a quick response but also examine how concurrent changes to the same sentence appear. It's also important to explore what happens when the connection between multiple windows is slowed or temporarily blocked to see how this affects experience and eventual consistency.

This was last updated in February 2024

Continue Reading About conflict-free replicated data type (CRDT)

Dig Deeper on Digital transformation