Transparency or opacity of SD modifications

ustulation · September 26, 2016, 12:57pm

Yes, these things i would agree to - if there are specific changes/additions etc made to guarantee a certain user-side behavior then i am for it. Though i would maybe debate a little further on merits of doing this at client side vs vaults - like what use cases benefit from it and if there can be better alternatives. [quote=“Fraser, post:39, topic:138”]
For actually removing the SD from the network, I prefer an expiry time as a mutable member variable of the SD
[/quote]

You will need to elaborate more on this though - like why is it needed, what does it buy us, what are the downsides of not doing this etc.

rob · September 26, 2016, 1:33pm

Keeping version # in the APP would only require a variable large enough to detect change. EG using @Fraser’s idea the APP could store version number with a u8 so that it can see changes. Should only change little. If expect it to change more then use u16 and worse case u32. This holds true even for version simply being a u64 updated whenever anything changes (or more often)

Remember many APPs generate the SD address and do not actually store it and version # can be kept similarly using sparse array methods

so can use an sparse array of u8 (or u16) to store needed info for versions.

But remove versions from being used by clients then need version u8 (or u16) and hash u64

The difference is 72 bytes (or 80 bytes) instead of u8 (or u16).

Now if you talk after years of operation there are more than 100 billion safe web pages and 100 billion other public files which have on average 500 indexes per page/file which result in 1 x 10^14 indexes. Then each SD has 200 indexes stored in each. This means we need 5 x 10^11 index SD required.

5 x 10^11 times u16 sparse array is 10^12 bytes of storage needed (approx 10^7 SDs)

But without versions being revealed to client

5 x 10^11 times 10 bytes is 5 x 10^12 bytes of storage needed (approx 5 x 10^7 SDs) PLUS the processing (electricity) to compute the hashes when needed.

Why should we make APPs more complex & wasteful when version can be given to the client.

Remember some seemingly small increases in storage/computing can be massive when there is a lot of similar objects to be stored.

Same here and surprised there is consideration to removing it from the client’s access.

dirvine · September 26, 2016, 1:46pm

Hmmm … Time is related here as it how many X we discuss it and how many X’s is it close to the first tool we choose to pick up, or how X will break the system (btw X == time in this sentence). I think many people prefer time, servers and the status quo, but that is not what we are doing here, at least till now.

The amount of pushing to get (agreed and synchronised) time into the system really should be done as a separate RFC (there is one RFC that touches on dht based time like capabilities). It’s not a tool available to us just now.

It’s known to be a large issue so no point it appearing in so many suggestions without really exploring the side effects etc. of giving up on full decentralisation and relying on centralised trusted managed owned servers or hardware devices on each computer.

dirvine · September 26, 2016, 1:48pm

I probably agree with this part, but think that it shoudl not be able to be set by client apps (i.e. read/only) as the network does use this for resolving conflicts.

ustulation · September 26, 2016, 2:12pm

You will have to agree to this part in conjunction to the rest of what he mentions though:

With those modifications, yes - else all that i state e.g here in 3 points etc shows it wouldn’t make much sense in doing so. People will complain later and say we should have made it explicit or not expose it as a user-facing feature at all.

Also @tfa and @rob it might be worth mentioning the way you guys see Delete operation tying into this as i remember @tfa saying it (the usage of version field) would be useful only in conjunction to a particular way things are deleted. There might be better posts explaining those, but the most recent mentions are here and here. Worth mentioning those too in case they get missed getting discussed.

ustulation · September 26, 2016, 2:20pm

Well if using hash then just the hash is enough and sha256(something) == 32 bytes but yes compared to u16 you mention (== 2 bytes) it is certainly more. Also i think you misinterpreted u16/u64 etc (correct me if i am wrong though) - u64 is 64 bits == 8 bytes. u16 is 16 bits == 2 bytes etc.

(Edit: FWIW md5 would be 16 bytes i think (don’t know if there is anything shorter))

ustulation · September 26, 2016, 5:49pm

@Viv @dirvine @AndreasF Pls do look into this suggestion to see how much it complicates (or does not) churn/refreshes other operations - the point about version there - if one field is only suppose to change if data changes, and another for other changes etc. Also it would be great if you guys could state what you think about the discussion so far - expose version field - keeping in mind the use-cases, caveats etc. mentioned. Do you want to expose it with a disclaimer, do you not want to expose it, do you want changes in the structure as fraser suggests or somewhere in the operation of vaults and then expose it, what concrete use case would it solve etc ? Also remember if you are going for merging SD and AD would you want more fields for finer information in the struct Fraser presented - how would those things scale in future etc - vs of-course clients hashing the field they care about and want to track changes for.

Sorry if i seem to be a skeptical - just don’t want to rush in make a wrong decision which ends up supporting a total of 1 use case while causing confusion for rest.

dirvine · September 26, 2016, 5:52pm

No problems at all, if it’s exposed read only I cannot see a problem. I do not think though that having several version are going to work right now. Baby steps are Ok, but large changes like that needs an RFC itself, I would be amazed if there were no side effects there. So the version as is now is my vote really.

So I would say if we did expose the inner version then Ok, but not any more right now.

ustulation · September 26, 2016, 5:55pm

And would you want to put a disclaimer in the API that a version increment could mean any of these 3 points ?

dirvine · September 26, 2016, 5:57pm

I would say it’s an internal incrementing counter, required by the network to exist and only increment by 1 on each change of any data element within it. That may be enough?

ustulation · September 26, 2016, 5:59pm

If i did a post of same data none of the data elements would change though - just the version increments - is that OK?

If yes then we will have to say something like: it might increment for no change too.

And ofcourse for AppendableData - it might not change even if data was changed - depending on if owner did it as a POST OR owner/users did it as APPEND ?

Aren’t those going to cause confusions ^^^ ?

dirvine · September 26, 2016, 6:09pm

Yes possibly, perhaps “any POST will increment the version by one only or fail”?

ustulation · September 26, 2016, 6:18pm

Ok - so leaving it exposed in safe_core with this disclaimer. @Krishna Could you also put this disclaimer in the Launcher-App api’s - Something along the lines: version will increment for any successful POST operation by 1, but beyond that there is no guarantee - all internals could have changed, only a particular internal could have changed OR there might have been no change at all. Also if tracking for AppendableData, a changes in internals might not result in version increment.

( I want to see how a concrete use-case/production-grade-app based on above disclaimers/uncertainities even looks like).

Viv · September 26, 2016, 6:37pm

Personally I’m not for keeping individual versions for the same reason as you mentioned that the types get bloated based on the level of detail expected based on use case and this might better be served elsewhere at the app scope itself unless its generic enough to fit all types and be scalable with future changes.

If what we’re after is basic fingerprinting of data(if this is the use-case), then we can always add a hash identifier to the data type that can be retrieved from the vault explicitly via a different RPC to validate data before GET(and save some BW) or using it to match expected data when pulling OR provide the same RPC but do the hash in vaults on the fly and return the hash.

I still don’t see a specific use-case that cannot be achieved with whats proposed so in that case if we CAN achieve the requirements, would certainly prefer in not bloating the fundamental types as it might not be a feature for all.

ustulation · September 26, 2016, 6:40pm

This ^^^ i would vote and prefer infinitely more to an application that bases itself on uncertainties.

tfa · September 26, 2016, 8:39pm

Sugar-coated POST operation: I like this expression, it is exactly both what I want and what was implemented before. The consequences at the safe_level level are the following:

Nullify signatures (I suppose you rather mean nullify owner key): The user deleting the SD has many possibilities to define who can recreate the SD with the owner key field:
- anybody by nullifying this field
- only him/herself by entering his/her public key
- nobody (even him/herself) by entering the hash of a known string (like “David Irvine - Creator of the Safe Network”)
- a set of users by entering their public keys
Nullify data: The user has to provide an empty data that is part of the SD that can be signed
Bump version: the version must be incremented because the deleted version is normally different from previous version, but not necessarily like in any POST operation.

At the low level API these fields can be computed automatically, though the choice could be left to the user for the owner key, possibly a simple enumeration like: not rewritable, rewritable by user, rewritable by anybody. Just to be sure everybody follows: if the SD is recreated afterward, then its version field is bumped again.

We can also simplify things by checking in handle_delete function at the vault level, that the owner key field contains the hash of a specific string. That way no deleted SD can be recreated, whatever its origin (low lew API or safe_core).

But I don’t have a strong opinion about it (either the latter or the user choice).

A comment in former vault code indicates the reason:

// Reducing content to empty to avoid later on put bearing the same name

And it is exactly the reason why I want this code restored.

Yes, I completed agree with this. To be precise version is set by core, vaults continue to check it is sequentially incremented and strictly nothing is changed in the code managing this field.

Fraser:

pub struct Version {
    /// Sequential number incremented when `SD::data` changes.
    pub major_index: u32,
    /// Sequential number incremented when any SD field other than `SD::data` changes.
    pub minor_index: u32,
    /// Arbitrary, version-specific information.  May be empty.
    pub data: Vec&lt;u8&gt;,
}

I think that current version field is good enough.

I have mixed feelings about major_index and minor_index fields, because in the majority of cases only data is modified. They add complexity because currently vaults just check correct increment of version without actually comparing data or keys, which is fine for me because the rule is: if SD is modified then its version is incremented. The converse is not necessarily true, it is up to the application: it can repost a SD with same data and keys and version will be incremented. A use case can be the periodic generation of a SD and the app doesn’t bother about whether or not something has changed, or the version is an index into something else.

Currently this is possible because low level API increments version automatically. Maintaining automatic increment of the right version(s) depending on the kind of operations done by the user is an added complexity to low level API.

I don’t think these fields are worth the added complexity in both safe_vault and low level API. I am an adept of KISS principle: there is currently a field that indicates how many times the SD has been modified. We don’t know the kind of modifications and it is an upper bound but it is enough for transparency and certainly better than nothing.

Lastly, I don’t agree at all about data field (in struct Version) because vaults cannot check its correct increment. Users have to trust the app about this field, so this version could be simply added in the data part of the SD (by the app itself).

rob · September 27, 2016, 12:25am

Thats what I understood was the intention to be previously.

Read access only supplied through the API

I thought @Fraser’s idea of major/minor version was good. It would allow an APP to see when Data changes and When control/management info changes. Could be good to see that owners or other info has been updated for some APPs without the need to keep hashes. Yes I know thats low level but I used to write RTOSs for fun too.

I agree too, otherwise the your code would be more complex for no real benefit. And in the API doco it says that version increments on a number of factors data, owner, etc changes and can change for network operations.

To my way of thinking the version should be updated on an append. You changed the SD afterall. Seems odd you could increment the version on a network operation yet not increment on an append.

Fraser · September 27, 2016, 7:58am

Yeah - fair enough. I’d like to put together an RFC on this, but suffice to say that I’m not a fan of centralisation nor of trusted managed owned servers (I’d argue that we’re already proposing to make use of these hardware devices on each computer already, since we do make use of timers) so those aspects would not be what I was pushing for in an RFC.

But I agree that it’s a large topic and there will doubtless be difficulties and edge cases, so yes, an RFC would be good.

Fraser · September 27, 2016, 8:31am

As per my comment above - it’s not something I can justify quickly - it needs an RFC and I don’t want to drag this off topic. The main reasons for preferring this though are the ability to completely remove SD from the network, which is beneficial for the network (since it doesn’t then have to manage unwanted data ad infinitum) and for the clients (who get a refund or only pay proportionately to the amount of time a piece of data is managed by the network).

dirvine · September 27, 2016, 8:41am

We make use of durations, not time. The difference is agreeing what the time is, hardware based atomic clocks etc. may help but still have edge cases…