The most surprising thing about Google Docs is that it doesn’t use a single, unified algorithm for real-time collaboration; it uses a hybrid approach.
Imagine three people editing a document simultaneously: Alice is typing "hello", Bob is deleting "l", and Carol is inserting "y" after "o". How does the document consistently end up as "heyo" for everyone, regardless of network latency or the order their operations arrive? This is the core problem real-time collaborative editors solve.
Google Docs, at its heart, uses a technique called Operational Transformation (OT). The idea is that instead of sending the raw document state, we send a sequence of operations (like "insert 'h' at position 0", "delete character at position 1", "insert 'y' at position 4"). When an operation arrives at a client that has already processed other operations since the sender’s version, it needs to be transformed.
For example, if Alice sends "insert 'h' at 0" and Bob sends "delete at 1" (which might have been "l" before Alice typed), Bob’s delete operation needs to be transformed. Since Alice inserted at position 0, Bob’s intended delete at position 1 is now effectively at position 2. So, Bob’s operation becomes "delete at 2". This transformation ensures that operations are applied in a way that respects the changes that have already occurred locally.
Here’s a simplified view of how OT might work in practice. Let’s say Alice is typing and Bob is deleting.
Initial State: abc
Alice’s Action: Inserts 'x' at position 1.
- Operation sent:
insert(1, 'x') - Local state:
axbc
Bob’s Action: Deletes character at position 2 (which is 'c').
- Operation sent:
delete(2) - Local state:
ab
Now, imagine Bob’s operation arrives at Alice’s client after Alice’s insert(1, 'x') has been applied. Alice’s client receives delete(2). Before applying it, it transforms the operation based on Alice’s local change:
- Alice’s local change:
insert(1, 'x') - Bob’s operation:
delete(2) - Transformation: Since an insertion happened before the target index of the delete, the delete index needs to be incremented. So,
delete(2)transforms todelete(3). - Applying transformed operation: Alice’s client has
axbc. Applyingdelete(3)results inaxb.
This transformation logic can get incredibly complex, especially with concurrent edits. Different OT algorithms exist (e.g., LWC, PoCS) with varying strategies for defining transformation functions and handling concurrency.
However, OT has a significant drawback: it’s notoriously difficult to implement correctly, especially for complex data structures or when dealing with rich text formatting. The transformation functions become a tangled mess of edge cases. This is where Conflict-free Replicated Data Types (CRDTs) come in.
CRDTs are an alternative approach. Instead of transforming operations, CRDTs are data structures that are designed to automatically converge to the same state on all replicas, even with concurrent updates and network partitions, without requiring a central server to coordinate. They achieve this by ensuring that operations are commutative or by using mechanisms that guarantee convergence.
Consider a CRDT counter. You can increment it on any replica, and the final value will be the sum of all increments, regardless of order. For text, a common CRDT approach is to assign a unique identifier to each character, often based on its position and the time of insertion. When characters are inserted or deleted, these identifiers ensure that the order is eventually consistent across all replicas.
Here’s a conceptual example using a character-based CRDT. Each character might have a unique ID like (site_id, timestamp, sequence_number).
Initial State: Empty document.
Alice: Inserts 'h' at the beginning.
- Character:
hwith ID(alice_id, t1, 1) - Document:
[h]
Bob: Inserts 'e' at the beginning.
- Character:
ewith ID(bob_id, t2, 1) - Document:
[e, h](IDs are compared lexicographically to determine order. Ifbob_id<alice_id,ecomes first.)
Alice: Inserts 'l' after 'h'.
- Character:
lwith ID(alice_id, t3, 2)(Sequence number increases for edits from the same site) - Document:
[e, h, l]
The beauty of CRDTs is that the order of operations doesn’t matter for convergence. If Bob’s insert of 'e' arrived after Alice’s insert of 'l', the document would still converge to the same state because the unique IDs dictate the final order.
Google Docs actually uses a combination. For simpler text edits, they might leverage OT for its performance characteristics when the transformation logic is manageable. However, for more complex scenarios, like rich text formatting, collaboration features (comments, suggestions), or when dealing with potential network issues that make OT transformations harder, they might use CRDT-like principles or specific CRDT implementations. This hybrid approach allows them to get the best of both worlds: the efficiency of OT where it’s feasible and the robustness of CRDTs where complexity demands it.
The "site ID" in CRDTs isn’t just a random number; it’s typically a unique identifier for the client session. When a document is opened by multiple users, each user gets a distinct site ID. This ID is crucial for generating unique character identifiers and for breaking ties when comparing insertion points.
One aspect often overlooked is how these systems handle the history and undo/redo functionality. In OT, undoing an operation means applying its inverse. However, if the original operation was transformed, its inverse also needs to be transformed. In CRDTs, undo can be more complex, sometimes involving keeping a log of operations or using specialized CRDTs that support reversibility.
The next challenge you’ll face is understanding how rich text formatting (bold, italics, colors) is integrated into these collaborative editing models, as it significantly complicates the transformation or convergence logic.