How SAFE Sequence CRDT Data Type is implemented

happybeing · July 20, 2020, 11:51am

I’m looking into CRDT data types for implementing a FileTree data type, and would like to understand how the existing CRDT type(s) work and what their features are.

For now I’m browsing the code and will post my notes, but any insights or summary that can be provided would be helpful (@oetyng do you know of any design notes which could be shared?).

Here’s how it works AFAIK

each vault copy of a Sequence CRDT is stored as a single file, and loaded into memory in its entirety when in use for gets (e.g. range requests) or mutation (e.g. insert/delete)
when the client accesses or mutates a Sequence CDRT, the entire object is fetched to the client and the mutation applied both to the local Sequence CDRT and sent as an individual CDRT op to the vaults. Note: This differs from local-first implementations where local mutations are typically accumulated within the local CRDT, and batched to other peers on demand. Whereas in SAFE, mutations are propagated as soon as possible so that they are not lost if the client device shuts down (or crashes). <- assumption
the Sequence CDRT object will grow in size with every mutation even if size of the key-value entries is the same, because the CDRT metadata grows in size as it records the entire history of changes.
at some point this growth will become a problem for vaults, but even sooner for client devices and applications because of device memory needed to hold the Sequence and the ‘costs’ (e.g. latency and mobile data costs) of retrieving a copy from vaults.
it is up to the client to encrypt the content (key-value entries) of Sequence CDRT
the CRDT metadata is, I think, not encrypted. <- assumption

Let me know if I’ve got this wrong or make corrections and post a reply noting the change.

@maidsafe all hints appreciated, thanks.

This post is a Wiki, so feel free to correct or add to it, with refs where useful.

bochaco · July 21, 2020, 1:38pm

We added a memory storage for keeping the local replica: https://github.com/maidsafe/safe-client-libs/blob/master/safe_core/src/client/mod.rs#L822

happybeing · July 21, 2020, 2:17pm

Thanks. This gives more impetus to questions I have about Sequence size. My assumption is that as currently implemented, Sequence data will grow with each mutation, even if the number of entries remains small and that at some point this will create difficulties. This was already a question for me given they are stored as a single file on a vault, but is more important if they have to be fetched and cached in the client.

Apps could attempt to keep their size small by creating new copies, but this could be tricky to anticipate and manage because what is a sensible size limit will depend very much on the end user’s client device (e.g mobile) and network connection (e.g. bandwidth, data cost).

So it would be preferable for the ‘cost’ of Sequence data retrieval and mutation to be kept low by the SAFE API. Are there any plans on how to do this? I guess one reason I imagined that SAFE was not keeping a local copy is that it seemed a neat solution to just send the mutations to the vaults. Was that considered?

Understanding Maidsafe’s implementation decisions for Sequence CDRT could help a lot with designing a solution for a FileTree CDRT, which I’d like to contribute to. Alternatively I can leave that side with @danda, but if you feel it is worthwhile me having a better understanding I’m keen to learn.

EDIT: I’ve updated the OP with my current understanding so please fix or let me know if I have anything wrong. Note: I have stated, perhaps incorrectly that Sequence CDRT does NOT work like the local-first CRDTs as described by MK.

bochaco · July 21, 2020, 3:54pm

We haven’t taken decisions around all those aspects, I think we will eventually need to support them all. We are just going step by step, keeping it simple to first support the basic or most common scenario, then we can evolve it. E.g. right now trying to get the Policies mutations to work properly on CRDT land atop current Seq impl.

happybeing · July 21, 2020, 4:01pm

This is obviously sensible, I’m trawling for information because there’s internal knowledge, discussion and docs which I’m not aware of, so my problem is I don’t know what is there and may continue with annoying questions from time to time. Sorry

I think I have now got a pretty good idea of Sequence CRDT, but if you could read and comment on the two “<- assumptions” in the OP that would help. I’m keeping it updated as a reference.

Thank you.