The SafeNET's stance on the CAP Theorem

ethanpailes · April 17, 2017, 7:57pm

When I first got interested in the SafeNET I did a lot of reading of various bits of documentation and source code. As I learned more there two feasibility questions that I could not quite wrap my head around. The first is the economics of the pay-for-put model (which I don’t want to talk about in this thread), but the second was the SafeNET’s decision when it comes to the CAP theorem. The theorem basically holds that in the face of a partition, you can choose for a network application to be either consistent or available. (The theorem is often presented as a choice between three options, but you can never choose for your network not to experience partitions). From an app developer’s point of view, it would be nice to know precisely what consistency/availability guarantees, if any, the SafeNET provides.

I would not be surprised if there was an RFC somewhere that talks about this in detail, but I have not been able to find it.

If there is a clear answer, and other people think this is as important as I do, I think it would be nice to try to make an effort to make the answer more clearly apparent for new developers. The SafeNET functions like a database for app developers and the approach to the CAP theorem is an important question to ask when looking into a new distributed database technology.

dirvine · April 17, 2017, 8:13pm

It’s not a simple answer I am afraid. We are implementing data chains (a small part of the answer) to provide high availability and importantly the ability for the network to accept data from any location and prove it was securely previously stored on the network. For consistency we use group consensus to return the latest data (mutable data is versioned). Immutable data for us is not versioned and always the latest.

So can we guarantee the latest version is returned ? This is the question data chains seeks to resolve and can in fact provide evidence a data item is not the latest if that is the case. What this means is the data identifier (think metadata) is secured in a chain of evidence, this chain (like a blockchain, but in tree form) is crytographically verifiable. Therefor the network will know the hash of the latest version, but may not actually have the data (yet). It can therefor provide an valid data identifier, but no data (yet) response. A user may select an earlier version if there is a rush on it, but would not be able to update it as the network will wait on the data appearing (through nodes restating and republishing the data).

Therefor the theorem may be in fact challenged or at least we will be able to provide figures on how consistent/available data is. As I say it’s not a simple binary decision to select C or A in our case. I hope that makes sense -)

ethanpailes · April 17, 2017, 8:23pm

Thanks!

So it sounds like the idea is to lean towards consistency, but aim for availability when it comes to metadata. It makes sense to me that we would want a blend of the two (because availability is nice for the less critical app data, but consistency seems really important for any cryptocurrency). I think I’ve heard that SafeCoin will be implemented with mutable data (which sounds like it has the right consistency semantics). It sounds like you guys have thought deeply about this (as expected)!

What is the approach to fixing places where the data diverges in data chains? If a partition occurs and the tree starts to branch is it ever possible to get things to look like a chain again? If there are already resources about data chains out there, I would be happy with just a link.

Cheers,
Ethan

dirvine · April 17, 2017, 8:27pm

Thanks for that There will be a lot more but for now there are a few links here and early POC code. There is background work on integrating some pbft/tangeroa like semantics as well. That’s all happening in the background now and will be published soon. So you will have plenty of background to help see decision making and testing done etc.

Hope it all helps

ethanpailes · April 17, 2017, 8:29pm

Absolutely. I’ve just been curious about this.

happybeing · April 17, 2017, 9:21pm

Off-topic: isn’t this a change? I thought immutable data was planned to be versioned so no file state was ever lost. It was one of the things that blew my mind in the first days of learning about the project.

ethanpailes · April 17, 2017, 11:20pm

If the same XorName resolves to a different value, it does not feel that immutable to me. I thought the idea of immutable data was that you can write it once, and that’s it. Not even the owner is allowed to delete it (so other people can rely on it always being there).

dirvine · April 18, 2017, 7:31am

Immutable data itself is not versioned, so the content always hashes to the same value. The pointer to ID is a datamap and can be stored as mutable data. Therefor the data state is always the same with immutable data. If you store a file as a straight store (immutable) then it’s state is never going to change. If you update it then it’s considered a new file (the network does not know of files, just bytes).

@happybeing I am not sure I am answering you here, shout if I am missing something.

happybeing · April 18, 2017, 8:04am

Thanks David, very clear even for me

Do you think we can expect the NFS API level will support access to a history of immutable versions of a file, or would that be a higher level (app provided feature)?

dirvine · April 18, 2017, 8:10am

The API is finalising, but yes it will provide all of that and also apps can provide a layer to do this. I imagine really the future will be that if you see info you wish to be maintained then you should be able to add yourself to the list of “owners”. That may be a few iterations away, but valuable for retaining proof IMHO. So say I posted something weird like “aliens abducted me” and you thought, that’s unusual for David, I will copy it. The copy is Ok but does not keep the signed evidence I posted that (I am thinking more governments etc. here). So adding yourself as an owner to the data would be cool. ofc you would mutate this away in a branch and the actual owner would mutate in line, but if the original owner deleted it you have the signed evidence that original owners ID did in fact post it.

Other things like wikipedia, perhaps we don’t care and several copies regardless of owner are fine.

happybeing · April 18, 2017, 8:16am

Way ahead of me as usual David. Mind blown again. Thank you so much man.