RFC 47 – MutableData

frabrunelle · November 15, 2016, 6:06pm

Discussion topic for RFC 47 – MutableData

tfa · November 16, 2016, 4:52pm

I don’t see any mention of removal of a whole mutable data. Can you confirm that they cannot be deleted? (That would be ok with me)

Another thing about deletion: will deletion of a value be implemented by emptying the content and incrementing the entry version? (That would be coherent with structured data deletion and would give an actual meaning to the entry version)

Qi_Ma · November 17, 2016, 6:15am

inside section of permissions,

pub enum Permission {
    Allow(User),
    Deny(User),
}

shall be

pub enum Permission {
    Allow(Action),
    Deny(Action),
}

There are insert, update, delete, ManagePermissions defined, shall that say update < insert < delete < ManagePermissions ? Or they are separate toggles, have to be disabled/enabled individually?

Also, permissions are against the whole data field in MD. When a mutable_data presents a forum thread (each data entry presents a post), doesn’t it mean any user can modify any other user’s post?

ustulation · November 17, 2016, 2:59pm

We are not supporting deletion in the first implementation to keep things simple, but we will after that (someone will hopefully inform when that happens).

Yes that is correct.

happybeing · November 17, 2016, 5:37pm

I’m liking this a lot

permissions: {
    User::Anyone → [
        Permission::Allow(Action::Insert),
        Permission::Deny(Action::Update),
    ],
    User::Key(K1) → [
        Permission::Deny(Action::Insert),
    ],
}

Notice that as User::Anyone allows everyone to update comments and User::Key(K1) forbids only insertion of new comments, it will still allow an owner of the key K1 to update his comments.

Should the above be as follows?

permissions: {
    User::Anyone → [
        Permission::Allow(Action::Insert),
        Permission::Allow(Action::Update),
    ],
    User::Key(K1) → [
        Permission::Deny(Action::Insert),
    ],
}

Versions

Some things are unclear to me here.

The RFC implies that versions are purely for network use by saying they are needed due to churn, but the client interface uses them so they aren’t just the four churn. For example: set_mutable_data_user_permissions() has a version parameter. Is this method conditional on finding a matching version?

Is there a relationship between the version of a Value, and the version of its MD? If none, what is the purpose of the former? I think the RFC should say when/how a Value’s version is changed, and maybe the initial value for a version for both MD and Value.

If they are independent, is it possible to make mutation of a Value conditional on its version? This doesn’t seem to be the case yet but could be useful for shared access.

And/or conditional on the version of the Value’s MD? Not sure, but expect it would be useful.

EDIT: BTW great job on these RFCs, they are very clear and concise.

ben · November 17, 2016, 10:32pm

Yes.

But also, it should say

As it is isn’t bound to whether they have previously touched them.

Re Version:

In general, wenn we say “churn”, the meaning is “in order to find consensus quickly” (as more work there causes more network _churn). The same is what is meant here. In order do certain activity (like deleting an entry) referencing the version lowers network churn, because nodes can quicker agree on things. Essentially the group agrees on a new value with the latest version and the highest version always wins. Just take the scenario of deleting data, without a version: which one is the current one? The deleted entry or this one that holds data? Well, if you have version you know: the higher number.

By supplying these from the client side, certain activity can be speed up, as well as work more reliably.

Secondly, there are two distinct versions in the MutableData now: the MD-Version and the individual value-version. They are unrelated from one another, as value versions change with any update or delete of that specific key-value, MD-versions only change when its metadata changes; so the owner or permissions. Thus they are meant to keep track of each part separately.

No, not right now. What use case are you thinking of? We were thinking that is more handy to be able to change both independently and not having to fetch that meta information just to do an update. Especially since the MD-version is linked to the metadata and thus barely of your concern when you change specific keys. Do you have a specific use case that would require that? We’d be interested to learn abou that.

happybeing · November 17, 2016, 10:49pm

The situation is not specific, but a general case of shared writable data - like the IfMatch / IfNonMatch headers.

So I’m suggesting it as an optional condition rather than a requirement. So a write could either be purely permission based, or permission and optional “only this version”, for example.

Given the values are big enough to be small to medium sized files, this could have quite a few uses around synchronised data stores or shared documents.

How about several users editing a document, where each value holds a small group of words ?

Remember that google thing they canned a few years back? Google Wave - it was awesome. You could type in one language and others viewing the document could see you typing character by character anywhere in the document they were also working on. With a plugin, your text was translated into their language on the fly as you typed. One big problem was it was too centralised to scale. Suddenly…

ben · November 18, 2016, 10:58am

Let’s continue the discussion here, rather than Authentication Flow, as it is more applicable to here than to the flow itself.

I agree with you and we have considered this case. In particular to make requests and permissions based on Regular Expressions for keys. So, you could ask to only retrieve all files matching safe-* or allow write requests only based on those.

In order to be able to finish something, we have left this out for now, but it designed in a way to allow this extension later. However, this isn’t going to be trival to make happen, once you incorporate the encryption, which is transparent to vaults. It makes keys random and hard to match. Thus it would only work for unencrypted data, which we assume to be the smaller amount of data, actually. So, that’s a problem.

So, we’ve pushed this idea down the road. Getting this system done for now and seeing how we want to extend it specifically once we have gained more experience how this behaves and what apps need.

happybeing · November 18, 2016, 11:34am

Thanks @ben this is getting more interesting. I’m guessing this is a big step towards the kind of NoSQL service that David has had in mind all along. I can see it better now.

Just for clarification, I hadn’t considered match of keys etc. At this point I was only thinking of writes having the option to fail if not matching version etc. But I don’t fully understand what I’m talking about, so may be using wrong terminology.

bochaco · November 24, 2016, 8:37pm

Would make sense to have a “Read” Action which is to allow/disallow to fetch data from a MutableData?
I’m thinking if you want to deny from app to even read data from this structure.

Also, will the MutableData provide an encryption key like a private AppendableData does? (this can be probably covered if the “Read” Action is added). In other words, can I implement the email app with the MutableData as is?

Viv · November 25, 2016, 3:01pm

It was certainly something that was discussed in the initial stages however it didn’t seem to fit in with the network quite decently. Fundamentally in the system data is protected via the data itself as in GETs being open and available to anyone and protecting the content occurs via encryption(self encryption or otherwise). It also highlights the case that data in the network is stored by nodes that the owner does not have to trust individually. Thus just having access to the content shouldn’t compromise it. In this case vaults storing the data even if they arent given read access still would have access to the actual data they’re storing and while the owner might be under the impression they set “read” permission to nobody, that doesnt quite become true anymore and to keep it secure they thereby end up needing to keep it encrypted so even the vaults storing it dont get to understand the content at which point the “Read” permission kinda becomes redundant.

In such cases the location(DataIdentifier) in the network where this data is stored can itself not get disclosed and even if someone does know the location when the data is encrypted, regardless of another client getting it or the multiple vaults storing it having the data doesnt get them anything but cipher text.

That should certainly get provided(via the authenticator or data itself holding a pre-defined key) depending on the MutableData in question(created by authenticator) and if its content is encrypted.

This certainly should be possible and we’ll be getting all the tutorials updated ourselves to the new approach too once the initial impl gets further along ofc.

bochaco · November 25, 2016, 6:21pm

Thanks @Viv for your detailed explanation, it makes sense.

I’m not following this part, why do you mention the authenticator? who provides the encryption key for an AppendableData with Private set to true? is it currently the Launcher?

Viv · November 28, 2016, 2:17pm

Reason I mention authenticator is cos in the new auth flow proposal, there are “containers”(MutableData objects pretty much) that the Authenticator is expected to create by default for a user when they create an account. It’s detailed in the supplementary section for RFC New Auth flow - Containers. Hope that helps

digipl · December 1, 2016, 3:14pm

In the “RFC New Auth flow - Containers” I read that:

It is recommended that the app should encrypt all data it isn’t intending to publicly share with the encryption key it was given for its AppContainer.

As I understand, except by doing a deep analysis of the code or data, we can not know if the App is encrypting the data or not. Even worse, we don’t know is the App is encrypting the data with the Authenticator assigned encryption key or by his own key hijacking the user data.

Is that right or am I missing something?

ben · December 1, 2016, 4:11pm

No, you aren’t missing anything. Once we hand over access via an SDK, anyone can modify the source of that SDK and write whatever and wherever they want (as that App). We can’t enforce any encryption on those from the network layer, especially not any specific one, that the very same vaults - for security purposes - shouldn’t even know of.

However, the RFC can state what is expected and generally promote good citizenship. The same we will do in the SDK, so unless you actively disable it by patching, the SDK will transparently encrypt all information the app stores for them. Thus we can expect that for the vast majority of apps (basically all using our vanilla SDK), the recommendation actually holds.

But even in that case an app could potentially encrypt data on their side in some other form. There will never be a way to prevent that or an App hiding data from its users.

digipl · December 1, 2016, 4:42pm

Thanks for your answer. This behaviour reassures me because fear more laziness than evil.

tfa · December 25, 2016, 10:47pm

This topic is created as a continuation of a post about features of SDs and MDs in the other forum.

MDs are supposed to replace SDs but the new structure is far from perfect:

There are features that were present in SDs but are removed in MDs, like the explicit signature list and the possibility to add the receiver’s signature in this list (to allow proof of receiver acceptance of transfer)
There are features that were absent in SDs and are still not present in MDs like the possibility to store the history of transactions on an object.

So, I will throw some ideas about missing features to design the ultimate MD structure and then the community can discuss and improve them. I know this should merit a RFC but I don’t have time to write one.

A preliminary remark: MDs are not that bad because:

They keep the sane SDs basic principle: an object is identified by a tag and a name in the global XOR namespace
They add the very useful notion of sub-objects within such an object (the values of the MDs).

That said, choices in the implementation (like the separation between the values and the owners’ keys and the absence of signatures) limit the potential use cases for MDs:

Usage of an enum wrapping both the owners’ keys and the values would allow some other use cases.

Use case: current MDs:

First, the enum could be initialized by a variant implementing exactly what is currently defined by Maidsafe (modifications and ownership transfers governed by a set of rules and owners globally applicable to a MD):

enum MutableDataEnum {
    GlobalOwnership {
        data: BTreeMap<Vec<u8>, Value>,
        permissions: BTreeMap<User, PermissionSet>,
        version: u64,
        owner: PublicKey,
    },
    // Other variants …
}

pub struct MutableData {
    /// Network address
    name: XorName,
    /// Type tag
    tag: u64,
    /// Data variant
    data: MutableDataEnum,
}

Use case: defunct SDs + partial transfers:

Then we could add a variant to allow modifications and ownership transfers on each value. The creators of the MD would be the initial owners of all the values. With only one value the use cases would be the same as these of SDs, a compatibility layer could even be implemented overs such MDs to revive defunct SDs. But with several values this variant allows SDs on steroids with partial ownership transfers on a value by value basis.

    PartialOwnership {
        data: BTreeMap<Vec<u8>, PartialData>,
    },

Where PartialData structure is defined by:

pub struct PartialData {
    data: Vec<u8>,
    version: u64,
    owner: PublicKey,
}

No permission set is defined because:

the owner of an object can do whatever he/she wants with it: modify it, transfer ownership or delete it
no new values can be added (the initial creator of the MD object defines the complete list of values)

Use case: previous use case + divisibility

To work around the impossibility to add a new value of previous use case, we could create a special variant with pairs of 2 integers as keys. It would allow an owner of a sub-object to split it into several parts. The owner could then transfer these parts to different owners. For example, the owner of (10, 29) sub-object could split it into 2 objects (10, 21) and (22, 29) and then transfer (10, 21) to a new owner but keep ownership of (22, 29). Routing crate must be able to read the keys to validate divisions.

    DivisibleOwnership {
        data: BTreeMap<(u64, u64), PartialData>,
    },

Where PartialData structure is the same as above.

Use case: legal applications (for property titles management and audit systems).

Finally, we could add a variant to allow tracing of all modifications and ownership transfers. The aim is to retain the complete history of the object. The keys of sub-objects are the version numbers and routing crate must be able to read them to validate new version. Signatures are added to each value to prove history of ownership. Contrary to other variants where all values could be modified or deleted in parallel, here none can be modified, nor deleted. Values can only be added one by one and each new value is a new version of the whole object with a new data and/or new owners.

    HistoricalOwnership {
        data: BTreeMap<u64, HistoricalData>,
    },

Where HistoricalData structure is defined by:

pub struct HistoricalData {
    data: Vec<u8>,
    owners: BTreeSet<PublicKey>,
    /// Sign data is composed of MD name, MD tag, u64 key of value (version), data and owners
    /// It must be signed by a majority of owners of previous version and
    /// all the new owners (so that no one can add someone else public key)
    /// all the signatures are verified by vaults
    signatures: BTreeMap<PublicKey, Signature>,
}

I didn’t try to find a solution that combines both divisibility and history retention. What I propose here seems an enough step forward compared to current MDs which implement none of them.

ben · January 2, 2017, 10:14am

I’ve split 19 posts into the new topic Using new (modified) MData to Model Safecoin, as they were diverging the conversation away from the Mutable Data at hand to the specifics of a Safecoin version on top of it.

ben · January 2, 2017, 10:24am

Could you name other uses cases that are not possible with the currently proposed structure?

The Safecoin use case is a very specific one and although we might take some learnings from this model made here, it will be a first-class data type in the network and thus have its own datatype and own behaviours. There is no reason to adapt this general purpose datatype to fit those specific needs (especially at this point).

In order to be able to ship anything, we’ve also decided to not take care of history for the time being. Being able to change multiple keys simultaneously is already quite a challenge, so we have to tackle history support later.

That said, I am very interested to discuss other use cases and problems you see that we might be able to solve before releasing this data type already. In the current description there is a proposed alternative, but I don’t have any use case/problem this is solving the other might not also solve - making them rather abstract and hard to reason about. Would you have some other use cases/problems that are not covered with the current structure?

tfa · January 2, 2017, 10:44pm

Instead of specific ad hoc mutable data types for each use case, variants of a common base type could be defined because many parts in their management don’t depend on the variant and only need to know the type tag, the xor name and the data part. But I don’t have a strong opinion on the matter, if Maidsafe prefers distinct types, then let it be.

Notes:

Version number is excluded from this base because in many use cases MDs don’t have a global version.
Owner keys are excluded from this base because in many uses cases the ownership unit is the individual value.
I have edited my proposal and changed the set of owners to only one owner in all but HistoricalData variant (I took model on current MD structure but I doubt that multi ownership can be handled without signatures).

PartialOwnership variant can be used when an initial owner creates a fixed set of related objects that can be transferred individually. At any point in time such an object has one owner. This owner can bring a modification in the data part. It can also decide to transfer the ownership to a new owner. Concrete examples are corporation shares, condominium units, pre-mined altcoins … anything that is collectively owned and that needs to record its individual owners. The key can identify the object in the real world (for example “flat 205”) or just express an abstract identification (for example “coin 198”). The data doesn’t play any role in the registry itself but can be useful for other purposes like personal data storage, or individual vote storage when a collective decision is to be made. In this variant, each unit weight can be handled implicitly with an agreed convention among the parties (for example flat 205 has 100 m2 living area and so has more weight than flat 206 that has only 75 m2).

DivisibleOwnership variant is a subcase of previous variant where each unit can be divided. It also handles units’ weight explicitly. In this variant, a key conveys both an abstract identification and a weight. One use case is land registry of a city which registers land plots that can be divided. Another one is divisible safecoins. The difference between the later and all the other use cases is mainly in the way they are created: safecoins are generated by the network whereas the others are created by users. After initial creation, users can transfer them, divide them and transfer the generated sub-parts in the same way. They can be deleted the same way, though in case of safecoins, the network can automatically trigger the deletion when the user stores a file in the network (but deletion is still controlled by the user public key).

HistoricalData variant is a totally different one. It describes the history of a single object: history of its modifications and of its owners. At any point in time such an object has a set of current owners. A majority of them can bring a modification in the data part. It can also decide to transfer the ownership to a new set of owners. The new version must be signed by all the new users and by a majority of previous owners. Contrary to other variants, no deletion is possible for this one (neither at the MD level, nor at the value level). The use case for this variant is legal applications like a notary service that registers the successive owners of an object. It can be a real-world object like a real estate but not necessarily: an altcoin could use this variant for users wanting traceability of transactions.