Transparency or opacity of SD modifications

But now lets say its a forum and most don’t know the poster, except they post some juicy whistle blowing stuff and links. But they delete it, then a scammer recreates the SD with the links pointing to spoof/scam pages. The issue here might be that readers had before it was deleted made links to that SD telling others how good is the juicy story. Ironically this is the reason the whistle blower deleted the SD.

Please that is just an example and can be pulled apart on its merit. The idea is to see the process and technically behind it. This I believe was @tfa’s point that a destroyed SD allows it to be replaced with another SD spoofing the original that was destroyed. And the original owner may not be trusted or known, so its not always reasonable to used SD ID + Owner ID to uniquely identity a SD as being genuine. But we have to work with purely the SD ID (addr+tag+ver) to assume an SD is genuine. Obviously a contract or such legal info should include the trusted owner ID and my bad for a bad example of a contract.

1 Like

This is the crux of the problem - (for me at-least) there cannot be a notion of semi-trust. A trust is, mathematically or cryptograhically, all or none in digital world. If you didn’t trust, or even know the owner as you say, why would you expect him to be well behaved ? What makes you think that this unknown owner could not have changed the content of his SD to contain something even worse than what the new guy, equally untrusted and unknown, would put ? It’s like going to a forum where the poster is tagged anonymous and you distributing a link to his website he posted in some comment. For all you know after he gets a certain number of hits he changes content to something gross or even illegal in your country.

So if you trust someone to distribute his/her SD name, just extract the sign key too and send it along as tuple (SD::name, SD::sign-key). That would solve all your problems.

[As an aside:
You might be surprised to know version field of SD was (and i think still is) never meant to be user-facing. It is something purely for the backend - for the network to handle and resolve conflicts and churn situation in a particular way. If we figure out another mechanism which we think is better, we could even remove/replace it. That is why the version bumps etc are internally handled by safe_core and vaults. User-facing is data field. That is what they should manipulate if they need and special behaviour.]

2 Likes

I mean data field of StructuredData struct in routing crate.

Currently validity of previous owners signatures is also checked when a SD is deleted. You cannot revamp this system without specifying what happens during a deletion.

The problem is that the owner can do this without modifying version field.

I want the simple mechanism which was implemented earlier and which maintained the following invariants:

  • version 0 proves that the object has never been modified since its creation

  • version > 0 indicates the number of times the object has been modified

Safe network should be trustless. Seeing a version equals to 0 should be enough to know that a SD has not been modified. We shouldn’t have to trust A for that, because he/she may behave nicely for months and then cheat for a very specific contract.

In fact, version field has just been added on my demand in low level API (RFC 42). So no need for a complex mechanism: if user sees version 0, then he/she can be sure that SD has not been modified since its initial creation.

Of course this is true only If SD destruction is reimplemented as before.

I don’t agree with this - ownership and trust go hand-in-hand. If someone owns something and you rely on it, you basically trust that someone - that is the whole point about signatures - else why would people require something to be signed. What if the person did change that and version is now > 1 ? All you would know is the data is changed. That person owned that data and has every right to do so. You can demonstrate no involvement in being included or being a part of that contract and because of this you cannot say it’s violation of contract between you and him - it is clearly not, you were never a part of it in the first place. If you wanted true immutability rely on ImmutableData - which is network owned and cannot be modified. If you were a part of a contract which you did not want changed without your involvement add yourself to the owners list with equal weightage. The system that you seek (trusting someone else’s data but not that someone) seems to be fundamentally flawed.

Yes and now that i know why, i don’t think there is any reason for that any longer. You want to trust some data owned by someone without trusting that someone - doesn’t seem fair. You want permanent immutability look for a different data-type (ImmutableData). You want something to be owned and not changed, make yourself a stake holder in that.

Anyway at this point i am only repeating myself. Pls go through all the discussions above so far to see what you can’t do with existing functionality. (To me) the whole premise of trustless trust seems fundamentally, logically and cryptographically flawed.

Multisig Revamp for StructuredData/AppendableData - #19 post explains further.

@tfa Just remembered two more points if it helps clear things further:

  1. Version can be incremented even if data did not change. Changes in the previous-owners field, current owners-field etc. will all result in version increment. You can have SD of version 20 in which data field never got changed. If there is some other field added in future and that requires version field to be manipulated in particular way by the backend all the apps that rely on its behaviour will suffer. That is why i am not comfortable with ppl using non-user-facing aspects of a data-type/network in frontend apps. It was not meant for that, is subject to change and may not do what you think it does. Using that in a particular way would mean a guarantee from us that this is what it is for, when at-least currently it is not (and might cause both us and app devs trouble if we commit to it without proper thought).

  2. If a system want to track the changes, use versioning. Store data as versioned files. And for some reason if there is a requirement that only cares to check if an SD that is being pointed to has been modified, just store the Hash of it (or its relevant fields - e.g data) too along with that pointer (to SD). That will be more robust, much better supported and wouldn’t need to tie in with how backend should go about deleting, version increments etc. which is ideally left to the way backend thinks best.

I am not talking about immutability but I am talking about another feature: mutable data for which I want to know it is has been modified or not. This feature was available in the past and has been removed when implementation of SD destruction was changed.

I don’t want to trust someone or some organisation. I want to trust data that are mathematically proved to be generated in sequence. Of course immutable data can do that, But here it is something else: the data can be modified and the proof only last while the data is not modified.

I am not sure to understand: I presented a valid usage for version field. Maidsafe accepted the feature apparently without really understanding it. And now that you understand it you will remove it because you don’t agree with it.

I think you were typing when i wrote the post just above and didn’t get a chance to read it. Can you go through it and see if it helps ?

You mention there:

version > 0 indicates the number of times the object has been modified (for example it could be the number of edits of a post)

And this is what i highlight in point 1 here. It might not do what you expect.

Point 2

I don’t think version field was meant to be the feature addressing that problem. Point 1.

I was not involved so i cannot comment, but what i know is sometimes there is huge (as in really huge) amount of work we tend to do to deliver the product that you see. Some details might have been missed by people involved or they might be too occupied at the time to give this a lot of thought. In that case, apologies if that caused inconvenience.

it’s not up-to me to remove. i will however vote for its usage-deprection, unless there is convincing reason not to (which so far i don’t see). I think you will agree on this too - a half baked thing that is not going to fly is better revoked. But if there is no alternative to it in some requirement then it should certainly be looked into and a proper solution sought, but that requirement is yet to be presented.

At least, if SD deletion is reimplemented as before, we still could assert that: if version field is 0 then data has never been modified.

Furthermore I think that most apps won’t develop complex things like adding another user able to modify a post in a forum app (I mean other than its creator). And when they do they might consider counting any kind of modifications in the edit count. This is the simplest implementation, but it can be justified.

Versioning is costly in terms of safecoins because an immutable data is created for each version. This should be used only when we want to be able to retrieve previous content. Also I don’t think we can trust versioned file: if I remember correctly versioned file are stored in versioned SDs whose history can be manipulated client side.

Anyway, versioning is a more complete use case. The use case I was talking about is much simpler: detecting if a SD has been modified.

Testing that its version is equal to 0 would be much simpler to implement. And just displaying version field can be more easily trusted by users than a specific app implementation.

No problem with that. I have answered points 1 & 2 in previous post.

Just to be sure that everything is clear, I would like both:

  • version field exposed to the user

  • SD deletion reimplemented as before

Apps would have then a simple way to indicate if a SD has been modified and the number of times it has been modified (any kind of modifications)

All that a user sees is the specific app’s implementation of stuffs. If app cannot interpret the data field of SD in the way it wants that SD is useless to the app and hence to the user. If app makes any mistake (even if it’s as simple as reading a particular field or a data) that’s what the user will see (the mistake).

pseudo-code

let stored_ptr_to_sd = S;
if stored_ptr_to_sd.SD.version > 0 { println!("Edited") } // Relies on SD being deleted in a particular way
                                                          // version to be incremented in a particual way etc
// vs
let (stored_ptr_to_sd, stored_hash) = (S, H);
if stored_hash != sha256(stored_ptr_to_sd.SD) { println!("Edited") }

I don’t think it is significantly simpler or harder - hardly noticeable in a code base.

Why ?

You can achieve those without caring about the ones mentioned previously (versioning and way SD is deleted).

Your use case might be to check for any modification. Someone else’s might be to check just for data modification and so on. The facilities provided can help achieve both.

Anyway i guess I have said all there is to it (i can literally now link stuffs from previous posts to answer you). i’ll stop now and give others a chance to comment.

My conclusion is this:

  1. version field is not user facing - if that is the behaviour wanted then MaidSafe should be ready to attach guarantees of operation to it and not change it in future (as currently it is meant to resolve conflict handling, churn etc).

  2. Whatever you want, i think i have demonstrated all of those can be achieved without relying on how SD is deleted, how/when version is incremented etc.

I will go with whatever the majority thinks - my personal vote being none of those requirements (of exposing the version field and deleting SD in a way to suit you) are currently of any use.

Of course, you don’t take into account the management of the hash.

With version field you only need the SD itself to know if it has been modified or not. With the hash you have to store it somewhere which creates complications like:

  • user has to be able to check that the object containing the hash has not been modified, so hash of this object, stored in another object, and then again … infinite recursion there

  • the SD can be a standalone one without any object pointing to it

  • the SD can also be pointed to by another object created before the SD, by another user (for example chain of SD objects whose names are determined by hash of a root name and then hash of hash, …)

I think that to solve all these problems you have to store the hash in an immutable data created in parallel to the SD. But then the question is: where to store its name? Not in the SD because storing the name here would modify its hash. Here I don’t see any solutions.

It is definitively simpler to get information internally from the version field of the SD itself.

No, you have not demonstrated this. Or at least your demonstration is not complete because it doesn’t solve the storage of the hash.

you store the hash in the same place you stored the pointer to the SD.

[Just as a suggestion if you want - start a new thread pointing to this - give some pseudo code or something concrete to show exactly what you want (not how you want to get what you want) - we can start there pointing to this thread from where we branched off]

Read again my last post:

  • There can be no such place (in a standalone SD)

  • There can be no possibility to modify this place (the SD can be pointed to by another object created previously by another user)

I have already stated what I want: the possibility for an app to prove that a SD has not been modified.

The how to do that I have proposed is because it is very simple and is already implemented (exposing the version field) or was already implemented in the past (SD deletion).

You say it is possible to do that without them but I don’t see in your solution where the hash can be stored.

If you want to access an SD (to verify it’s version) how do you plan on doing that ? You need to store it’s name/DataIdentifier or something to point to it - what is so difficult about storing another piece of information along with that “something” ?

For example it could be a SD storing the weather at Paris on 09/24/2016. The user selects the town and the date and the app concatenates these 2 informations, hashes the result and retrieves the corresponding SD. There is no pointer to the SD.

Using a SD allows 2 things:

  • no need to store a correspondance (town, date) => immutable data name

  • the SD can be modified if an error has been found after generation, and in this case it would be interesting to know that it has been modified and how many times.

But it could do that and get the behaviour u want (if noting it was important to you) or the app could do that on your behalf - and that would be better and more granular because maybe there was no error in generation but addition of ownership. Now assume we did make guarantees on the version field - or maybe even introduce a new field that is use-facing and guaranteed to increment for every change. Now consider what if somebody else demands the opposite - I own this data and i don’t want people to know how many times i edited it by looking at my SD at any point of time ?

Yes, but generating SDs and managing a correspondance table is necessarily more complex than just generating SDs.

In my example the app facing the user is just a displaying app. No risk that the user adds modifications of any kinds to SDs. The SDs are owned by another application generating them.

Yes my proposal doesn’t allow that. But I prefer transparency. Maybe this is the crux of the debate. It makes me thinks of editable blockchains, and clearly I don’t like that.

It is - that’s the reason i am not comfortable committing to it, specially since it’s just not the version field, it also assumes deletion should be in a particular way etc.

And if you consider AppendableData version field does not even work that way - the data field can keep changing and version does not increment - then suddenly (because owner decided to do something) version changed. Also even for StructuredData, you could do a POST of same data which would increment the version without changing anything else - at that point all you would be tracking is number of POSTs someone has done (not necessarily any editing) - that seems too much of a niche use-case (which can still be handled by way of hashing and without relying on versions) and such demands can be endless. This whole notion of version field is for something completely different - banking on it for a behavior which also must involve certain other guarantees from certain other operation (e.g. deletion) is dangerous right now. If i did commit and then later had to change it due to some reason (back-end) it would cause endless woes.

Nice thread :slight_smile:

Personally I’d like to side with “nullify signatures and data. ALSO bump the version”(not sold on this yet). Basically its a sugar-coated POST operation at this point. I’m not sure about if this operation qualifies a separate RPC(DELETE) or should it just be expected to be done as part of POST and DELETE not supported for these data types(SD or AD). Sure right now DELETE itself can get removed in such a case since ID doesn’t support it but that might change with Owned ID or something so just being specific here.

In terms of “how should delete function”, if I haven’t lost track of the details in this thread:

  • @tfa I guess means don’t remove the data itself, but nullify the data for delete and let someone else claim it if need be but the version would be incremented as part of the delete and then further again when new owner claims it.
  • @ustulation you’re not comfortable with ^^ I guess? If so I guess you’re making a case for delete means the data is gone from the network and recreating it would start with version 0 or whatever it normally starts from.

I see pros and cons to both sides here :frowning: like I said personally prefer option A, but just to confirm I’m not missing anything here:

The main advantage being mentioned for tfa’s approach is by using the version one can confirm if the data has “since” been mutated(regardless of data field in SD changing). I can certainly see the benefit/use-case to this. On the other side yes this means you’re storing the expected version somewhere and in its place with option B, one can simply store the hash of the SD to get the same behaviour.

This is where I see two things, one is why get the client to do a hash operation or equivalent if they don’t need to. Granted this might not be too expensive and get ignored. Also say a bit more future scope could be, maybe get vaults to support just a GetVersion or some other RPC where someone can just query for the version without actually retrieving the whole data before they validate it and confirm they want the payload. This can help not downloading useless data if they aren’t going to use it. It certainly needs support and discussion from Vault side but if acceptable certainly favours the version approach than using the hash approach I think? unless vaults generate the hash and return it :slight_smile:

Also it makes churn handling certainly easier in vaults with option A which was one of the reasons it existed previously as it helps prevent data coming back from the dead. Sure they’re ways to work around this by keeping a delete cache that will block any refresh of the data getting into the chunk store, which was an approach that was tried and found functional but this does leave a time based cache which itself isn’t refreshed to other nodes open.

In the pro side of actual delete, the main benefit is network isn’t handling useless data which is considered deleted. By nullifying SD, network is still holding it in the vaults and continuing to refresh with churn. It does thereby get the network to do work for this albeit small for something which the client has deemed not needed anymore. How scalable is this and is it just going to eventually be too much for the network(kind of similar arguments which start to favour owned ID). Also how might refunds work, with actual delete its easy to work out an agreeable refund I’d guess but if its just a nullify operation and the network is still “working” in terms of keeping this zeroed data, then should their be a cost factor to it and not allow full refunds if any refund is allowed at all.

This last point is what makes it hard to just pick option A. Removing my bias to try and keep vaults simple for churn handling :innocent: right now I can see the use-case of nullify SD than delete being “achievable” even if we just did a hard delete.

I don’t think he wants someone else to claim it either - i might be wrong though.