Transparency or opacity of SD modifications

ustulation · September 24, 2016, 7:40pm

you store the hash in the same place you stored the pointer to the SD.

[Just as a suggestion if you want - start a new thread pointing to this - give some pseudo code or something concrete to show exactly what you want (not how you want to get what you want) - we can start there pointing to this thread from where we branched off]

tfa · September 24, 2016, 8:03pm

Read again my last post:

There can be no such place (in a standalone SD)
There can be no possibility to modify this place (the SD can be pointed to by another object created previously by another user)

I have already stated what I want: the possibility for an app to prove that a SD has not been modified.

The how to do that I have proposed is because it is very simple and is already implemented (exposing the version field) or was already implemented in the past (SD deletion).

You say it is possible to do that without them but I don’t see in your solution where the hash can be stored.

ustulation · September 24, 2016, 8:21pm

If you want to access an SD (to verify it’s version) how do you plan on doing that ? You need to store it’s name/DataIdentifier or something to point to it - what is so difficult about storing another piece of information along with that “something” ?

tfa · September 24, 2016, 8:33pm

For example it could be a SD storing the weather at Paris on 09/24/2016. The user selects the town and the date and the app concatenates these 2 informations, hashes the result and retrieves the corresponding SD. There is no pointer to the SD.

Using a SD allows 2 things:

no need to store a correspondance (town, date) => immutable data name
the SD can be modified if an error has been found after generation, and in this case it would be interesting to know that it has been modified and how many times.

ustulation · September 24, 2016, 8:58pm

But it could do that and get the behaviour u want (if noting it was important to you) or the app could do that on your behalf - and that would be better and more granular because maybe there was no error in generation but addition of ownership. Now assume we did make guarantees on the version field - or maybe even introduce a new field that is use-facing and guaranteed to increment for every change. Now consider what if somebody else demands the opposite - I own this data and i don’t want people to know how many times i edited it by looking at my SD at any point of time ?

tfa · September 24, 2016, 9:19pm

Yes, but generating SDs and managing a correspondance table is necessarily more complex than just generating SDs.

In my example the app facing the user is just a displaying app. No risk that the user adds modifications of any kinds to SDs. The SDs are owned by another application generating them.

Yes my proposal doesn’t allow that. But I prefer transparency. Maybe this is the crux of the debate. It makes me thinks of editable blockchains, and clearly I don’t like that.

ustulation · September 24, 2016, 9:23pm

It is - that’s the reason i am not comfortable committing to it, specially since it’s just not the version field, it also assumes deletion should be in a particular way etc.

And if you consider AppendableData version field does not even work that way - the data field can keep changing and version does not increment - then suddenly (because owner decided to do something) version changed. Also even for StructuredData, you could do a POST of same data which would increment the version without changing anything else - at that point all you would be tracking is number of POSTs someone has done (not necessarily any editing) - that seems too much of a niche use-case (which can still be handled by way of hashing and without relying on versions) and such demands can be endless. This whole notion of version field is for something completely different - banking on it for a behavior which also must involve certain other guarantees from certain other operation (e.g. deletion) is dangerous right now. If i did commit and then later had to change it due to some reason (back-end) it would cause endless woes.

Viv · September 26, 2016, 11:15am

Nice thread

Personally I’d like to side with “nullify signatures and data. ALSO bump the version”(not sold on this yet). Basically its a sugar-coated POST operation at this point. I’m not sure about if this operation qualifies a separate RPC(DELETE) or should it just be expected to be done as part of POST and DELETE not supported for these data types(SD or AD). Sure right now DELETE itself can get removed in such a case since ID doesn’t support it but that might change with Owned ID or something so just being specific here.

In terms of “how should delete function”, if I haven’t lost track of the details in this thread:

@tfa I guess means don’t remove the data itself, but nullify the data for delete and let someone else claim it if need be but the version would be incremented as part of the delete and then further again when new owner claims it.
@ustulation you’re not comfortable with ^^ I guess? If so I guess you’re making a case for delete means the data is gone from the network and recreating it would start with version 0 or whatever it normally starts from.

I see pros and cons to both sides here like I said personally prefer option A, but just to confirm I’m not missing anything here:

The main advantage being mentioned for tfa’s approach is by using the version one can confirm if the data has “since” been mutated(regardless of data field in SD changing). I can certainly see the benefit/use-case to this. On the other side yes this means you’re storing the expected version somewhere and in its place with option B, one can simply store the hash of the SD to get the same behaviour.

This is where I see two things, one is why get the client to do a hash operation or equivalent if they don’t need to. Granted this might not be too expensive and get ignored. Also say a bit more future scope could be, maybe get vaults to support just a GetVersion or some other RPC where someone can just query for the version without actually retrieving the whole data before they validate it and confirm they want the payload. This can help not downloading useless data if they aren’t going to use it. It certainly needs support and discussion from Vault side but if acceptable certainly favours the version approach than using the hash approach I think? unless vaults generate the hash and return it

Also it makes churn handling certainly easier in vaults with option A which was one of the reasons it existed previously as it helps prevent data coming back from the dead. Sure they’re ways to work around this by keeping a delete cache that will block any refresh of the data getting into the chunk store, which was an approach that was tried and found functional but this does leave a time based cache which itself isn’t refreshed to other nodes open.

In the pro side of actual delete, the main benefit is network isn’t handling useless data which is considered deleted. By nullifying SD, network is still holding it in the vaults and continuing to refresh with churn. It does thereby get the network to do work for this albeit small for something which the client has deemed not needed anymore. How scalable is this and is it just going to eventually be too much for the network(kind of similar arguments which start to favour owned ID). Also how might refunds work, with actual delete its easy to work out an agreeable refund I’d guess but if its just a nullify operation and the network is still “working” in terms of keeping this zeroed data, then should their be a cost factor to it and not allow full refunds if any refund is allowed at all.

This last point is what makes it hard to just pick option A. Removing my bias to try and keep vaults simple for churn handling right now I can see the use-case of nullify SD than delete being “achievable” even if we just did a hard delete.

ustulation · September 26, 2016, 11:18am

I don’t think he wants someone else to claim it either - i might be wrong though.

rob · September 26, 2016, 11:41am

Thats my understanding as well and an important aspect.

So I gather its to protect the original user destroying the SD and then recreating it and claim that it is the original and has never been different, IE falsifying the original. With a version anyone can tell its been modified, Then its to also to protect against others spoofing something that was destroyed because the owner wanted it available no longer and the address was linked or computed

I am wondering if we could have a zero data SD that basically only contains address+tag+version and can be claimed by anyone, effectually the SD contents is destroyed, consumes little space on the network and this keeps version number.

If the owner wishes to keep ownership then they do not “delete” but simply zeros the data and mark the SD as empty. Maybe a flag in the SD can denote an empty SD. Or have the “deleted” SD also contain the owner field optionally

ustulation · September 26, 2016, 11:41am

This is something i have a problem to. As you said the original intended purpose of version field was (and for me is) solely :

It does not indicate change in a way ppl might think -

Maybe the data changed.
Maybe only the owners changed. Maybe some other field in future-scope.
Maybe nothing changed - I just did a post of exactly the same SD (it’s free anyway).

I don’t know what ppl would be tracking with tracking the change in version, unless it’s for the mere pleasure of version-gazing. For anything more serious, more concrete and better method should be sought (vaults return a hash of entire or fields of choice or client keep it themselves or whatever) - this being my argument and suggestion. If vaults could handle churn and others without version, we could even remove it from SD without affecting users.

This is just for my understanding - what was the downside to a full delete and restart from version 0 ? Was it the above concern mainly or was it tied to replay attacks or were there other caveats ?

rob · September 26, 2016, 11:44am

Oh about 1001 things especially with book keeping APPs and any sort of business auditing system

ustulation · September 26, 2016, 11:45am

Pls read that in context of that entire para from which you quoted.

rob · September 26, 2016, 11:53am

For some things using the version that is updated by the network on any change can save on complexity and additional storage needs elsewhere. Lets using an indexing service, rather than keep a parallel meta data system (for knowing when a set on indexes changes) it relies on the version # of the SD storing that set of indexes. This is just to show the concept of what it can be useful for

Unless I completely misread what you are saying then I think the above should give you a concept where network version can save additional space & computation by being able to use the version #

(I know its also for network functionality)

Viv · September 26, 2016, 12:02pm

Don’t really mind something being more than “originally intended” as long as its useful. Don’t see a point in restricting something just cos it wasn’t considered before.

In this case though it might not be considered “useful” which I think we’d agree on if the explicit requirement was to track the data portion and not the type’s metadata.

I’d say you’re lame for doing that and maybe vaults should probably return an error for no mutation. and on repeat occurrence prolly ping your ClientManagers and charge you some safecoin for spamming (just kidding or am I)

Yep churn handling complication. It’s a topic by itself and pretty much a reason why no data type offered a pure delete previously.

ustulation · September 26, 2016, 12:04pm

As i said it can be updated for no change at all too - point 3.

Even if you depended on versions, you will (in most cases i assume) would need to keep what the previous version was (a u64) so that you know that the freshly fetched SD version denotes a change. If you are keeping that u64 why not keep a hash which is way more robust if you combine this para with what i am saying in the previous para.

Viewing things form a bigger perspective (the big picture / philosophical aspect of something) is much more beneficial than thinking about specialised use-case. If you are looking for guarantees look at it in conjunction with all the caveats i am mentioning. There can be better alternatives.

ustulation · September 26, 2016, 12:12pm

It’s not the question of being lame - it’s the question of guarantees. You can have an api:

fn foo() -> i32 { ... } // returning a signed integer

and mention that only positive values can be returned. Then if someone returned a negative integer you could say you are lame OR you could make an api thus:

fn foo() -> u32 { ... } // returning an unsigned integer

My choice would be the second.

If someone depends me on changing my version only through actual mutation of data, i could always trick them by not doing so. With hash it would be a guaranteed approach. In secure systems, ppl should learn to depend on tangible and concrete guarantees rather than specific-tailored or extrapolated guarantees.

rob · September 26, 2016, 12:22pm

I know and my answer was written with that knowledge. Unless its changing often it shouldn’t really change the uses. Just that it might trigger an update elsewhere which results in no changes elsewhere.

Its increasing the processing and storage. Lets say the indexing is creating a search engine for SAFE network. Your extra u64 & hash processing will add a LOT of storage and electricity usage across the globe just so the versions are kept secret??? I am assuming that versions is updated by the network whenever changes of any type is made to the SD.

The point is lets design to reduce work for APPs when it does not impact the network code. Every u64 that has to be stored has to be multiplied for every APP and every SD the APPs use version # for. So if there are 1 x 10^9 indexes taking 1 x 10^7 SDs to store them and 1uSec to compute the hash to check the version means that at least 8 x 10^7 just to store the hashes. What when the number of users is at 2 billion users and 10 billion safeweb pages (10^12 indexes) then you need 80 TBytes of storage just for the hashes on top of the version # of each being stored. Address is computed so no need to store SD addresses. Its also possible to store version # as u16 since its unreasonable to expect the version to change 65,000 times between accesses. (or u32) So the hash is more than needed for version #

Fraser · September 26, 2016, 12:31pm

I don’t have long to chip in here (Viv’s a hard task-master and he’s wanting Routing work done today! ) but in the meantime here are my hurried thoughts:

I prefer the deletion to be in the form of a separate, new version appended to the latest.
I always thought the SD version could be used by client apps. Spandan, I take your point about version increments not always relating to data changes, but we could handle that by making the version something like:

pub struct Version {
    /// Sequential number incremented when `SD::data` changes.
    pub major_index: u32,
    /// Sequential number incremented when any SD field other than `SD::data` changes.
    pub minor_index: u32,
    /// Arbitrary, version-specific information.  May be empty.
    pub data: Vec<u8>,
}

Thus the major_version represents the number of changes to the data, but we can still retain a means for strict total ordering of all versions.

For actually removing the SD from the network, I prefer an expiry time as a mutable member variable of the SD. This would be a separate mechanism to the delete we’re talking about here (which is more akin to making the SD invisible I suppose). If an SD is Put with a long duration until the requested expiry time, the user can pay more. If the SD is modified to change the exipry time, the user can be refunded or further charged appropriately. When the expiry point happens, the vaults managing the SD just remove it. This isn’t trivial, and I know that using timestamps is highly controversial, but that’s my preferred approach just now.

OK - scurrying back to Routing now!

ustulation · September 26, 2016, 12:48pm

I don’t get this part - can you elaborate a bit more ? What was the requirement ? Was it to know whether a version changed ? In that case you would have to store the previous version (a u64) to know if it changed (else how would you know of a change).