Transparency or opacity of SD modifications

rob · September 26, 2016, 11:53am

For some things using the version that is updated by the network on any change can save on complexity and additional storage needs elsewhere. Lets using an indexing service, rather than keep a parallel meta data system (for knowing when a set on indexes changes) it relies on the version # of the SD storing that set of indexes. This is just to show the concept of what it can be useful for

Unless I completely misread what you are saying then I think the above should give you a concept where network version can save additional space & computation by being able to use the version #

(I know its also for network functionality)

Viv · September 26, 2016, 12:02pm

Don’t really mind something being more than “originally intended” as long as its useful. Don’t see a point in restricting something just cos it wasn’t considered before.

In this case though it might not be considered “useful” which I think we’d agree on if the explicit requirement was to track the data portion and not the type’s metadata.

I’d say you’re lame for doing that and maybe vaults should probably return an error for no mutation. and on repeat occurrence prolly ping your ClientManagers and charge you some safecoin for spamming (just kidding or am I)

Yep churn handling complication. It’s a topic by itself and pretty much a reason why no data type offered a pure delete previously.

ustulation · September 26, 2016, 12:04pm

As i said it can be updated for no change at all too - point 3.

Even if you depended on versions, you will (in most cases i assume) would need to keep what the previous version was (a u64) so that you know that the freshly fetched SD version denotes a change. If you are keeping that u64 why not keep a hash which is way more robust if you combine this para with what i am saying in the previous para.

Viewing things form a bigger perspective (the big picture / philosophical aspect of something) is much more beneficial than thinking about specialised use-case. If you are looking for guarantees look at it in conjunction with all the caveats i am mentioning. There can be better alternatives.

ustulation · September 26, 2016, 12:12pm

It’s not the question of being lame - it’s the question of guarantees. You can have an api:

fn foo() -> i32 { ... } // returning a signed integer

and mention that only positive values can be returned. Then if someone returned a negative integer you could say you are lame OR you could make an api thus:

fn foo() -> u32 { ... } // returning an unsigned integer

My choice would be the second.

If someone depends me on changing my version only through actual mutation of data, i could always trick them by not doing so. With hash it would be a guaranteed approach. In secure systems, ppl should learn to depend on tangible and concrete guarantees rather than specific-tailored or extrapolated guarantees.

rob · September 26, 2016, 12:22pm

I know and my answer was written with that knowledge. Unless its changing often it shouldn’t really change the uses. Just that it might trigger an update elsewhere which results in no changes elsewhere.

Its increasing the processing and storage. Lets say the indexing is creating a search engine for SAFE network. Your extra u64 & hash processing will add a LOT of storage and electricity usage across the globe just so the versions are kept secret??? I am assuming that versions is updated by the network whenever changes of any type is made to the SD.

The point is lets design to reduce work for APPs when it does not impact the network code. Every u64 that has to be stored has to be multiplied for every APP and every SD the APPs use version # for. So if there are 1 x 10^9 indexes taking 1 x 10^7 SDs to store them and 1uSec to compute the hash to check the version means that at least 8 x 10^7 just to store the hashes. What when the number of users is at 2 billion users and 10 billion safeweb pages (10^12 indexes) then you need 80 TBytes of storage just for the hashes on top of the version # of each being stored. Address is computed so no need to store SD addresses. Its also possible to store version # as u16 since its unreasonable to expect the version to change 65,000 times between accesses. (or u32) So the hash is more than needed for version #

Fraser · September 26, 2016, 12:31pm

I don’t have long to chip in here (Viv’s a hard task-master and he’s wanting Routing work done today! ) but in the meantime here are my hurried thoughts:

I prefer the deletion to be in the form of a separate, new version appended to the latest.
I always thought the SD version could be used by client apps. Spandan, I take your point about version increments not always relating to data changes, but we could handle that by making the version something like:

pub struct Version {
    /// Sequential number incremented when `SD::data` changes.
    pub major_index: u32,
    /// Sequential number incremented when any SD field other than `SD::data` changes.
    pub minor_index: u32,
    /// Arbitrary, version-specific information.  May be empty.
    pub data: Vec<u8>,
}

Thus the major_version represents the number of changes to the data, but we can still retain a means for strict total ordering of all versions.

For actually removing the SD from the network, I prefer an expiry time as a mutable member variable of the SD. This would be a separate mechanism to the delete we’re talking about here (which is more akin to making the SD invisible I suppose). If an SD is Put with a long duration until the requested expiry time, the user can pay more. If the SD is modified to change the exipry time, the user can be refunded or further charged appropriately. When the expiry point happens, the vaults managing the SD just remove it. This isn’t trivial, and I know that using timestamps is highly controversial, but that’s my preferred approach just now.

OK - scurrying back to Routing now!

ustulation · September 26, 2016, 12:48pm

I don’t get this part - can you elaborate a bit more ? What was the requirement ? Was it to know whether a version changed ? In that case you would have to store the previous version (a u64) to know if it changed (else how would you know of a change).

ustulation · September 26, 2016, 12:57pm

Yes, these things i would agree to - if there are specific changes/additions etc made to guarantee a certain user-side behavior then i am for it. Though i would maybe debate a little further on merits of doing this at client side vs vaults - like what use cases benefit from it and if there can be better alternatives. [quote=“Fraser, post:39, topic:138”]
For actually removing the SD from the network, I prefer an expiry time as a mutable member variable of the SD
[/quote]

You will need to elaborate more on this though - like why is it needed, what does it buy us, what are the downsides of not doing this etc.

rob · September 26, 2016, 1:33pm

Keeping version # in the APP would only require a variable large enough to detect change. EG using @Fraser’s idea the APP could store version number with a u8 so that it can see changes. Should only change little. If expect it to change more then use u16 and worse case u32. This holds true even for version simply being a u64 updated whenever anything changes (or more often)

Remember many APPs generate the SD address and do not actually store it and version # can be kept similarly using sparse array methods

so can use an sparse array of u8 (or u16) to store needed info for versions.

But remove versions from being used by clients then need version u8 (or u16) and hash u64

The difference is 72 bytes (or 80 bytes) instead of u8 (or u16).

Now if you talk after years of operation there are more than 100 billion safe web pages and 100 billion other public files which have on average 500 indexes per page/file which result in 1 x 10^14 indexes. Then each SD has 200 indexes stored in each. This means we need 5 x 10^11 index SD required.

5 x 10^11 times u16 sparse array is 10^12 bytes of storage needed (approx 10^7 SDs)

But without versions being revealed to client

5 x 10^11 times 10 bytes is 5 x 10^12 bytes of storage needed (approx 5 x 10^7 SDs) PLUS the processing (electricity) to compute the hashes when needed.

Why should we make APPs more complex & wasteful when version can be given to the client.

Remember some seemingly small increases in storage/computing can be massive when there is a lot of similar objects to be stored.

Same here and surprised there is consideration to removing it from the client’s access.

dirvine · September 26, 2016, 1:46pm

Hmmm … Time is related here as it how many X we discuss it and how many X’s is it close to the first tool we choose to pick up, or how X will break the system (btw X == time in this sentence). I think many people prefer time, servers and the status quo, but that is not what we are doing here, at least till now.

The amount of pushing to get (agreed and synchronised) time into the system really should be done as a separate RFC (there is one RFC that touches on dht based time like capabilities). It’s not a tool available to us just now.

It’s known to be a large issue so no point it appearing in so many suggestions without really exploring the side effects etc. of giving up on full decentralisation and relying on centralised trusted managed owned servers or hardware devices on each computer.

dirvine · September 26, 2016, 1:48pm

I probably agree with this part, but think that it shoudl not be able to be set by client apps (i.e. read/only) as the network does use this for resolving conflicts.

ustulation · September 26, 2016, 2:12pm

You will have to agree to this part in conjunction to the rest of what he mentions though:

With those modifications, yes - else all that i state e.g here in 3 points etc shows it wouldn’t make much sense in doing so. People will complain later and say we should have made it explicit or not expose it as a user-facing feature at all.

Also @tfa and @rob it might be worth mentioning the way you guys see Delete operation tying into this as i remember @tfa saying it (the usage of version field) would be useful only in conjunction to a particular way things are deleted. There might be better posts explaining those, but the most recent mentions are here and here. Worth mentioning those too in case they get missed getting discussed.

ustulation · September 26, 2016, 2:20pm

Well if using hash then just the hash is enough and sha256(something) == 32 bytes but yes compared to u16 you mention (== 2 bytes) it is certainly more. Also i think you misinterpreted u16/u64 etc (correct me if i am wrong though) - u64 is 64 bits == 8 bytes. u16 is 16 bits == 2 bytes etc.

(Edit: FWIW md5 would be 16 bytes i think (don’t know if there is anything shorter))

ustulation · September 26, 2016, 5:49pm

@Viv @dirvine @AndreasF Pls do look into this suggestion to see how much it complicates (or does not) churn/refreshes other operations - the point about version there - if one field is only suppose to change if data changes, and another for other changes etc. Also it would be great if you guys could state what you think about the discussion so far - expose version field - keeping in mind the use-cases, caveats etc. mentioned. Do you want to expose it with a disclaimer, do you not want to expose it, do you want changes in the structure as fraser suggests or somewhere in the operation of vaults and then expose it, what concrete use case would it solve etc ? Also remember if you are going for merging SD and AD would you want more fields for finer information in the struct Fraser presented - how would those things scale in future etc - vs of-course clients hashing the field they care about and want to track changes for.

Sorry if i seem to be a skeptical - just don’t want to rush in make a wrong decision which ends up supporting a total of 1 use case while causing confusion for rest.

dirvine · September 26, 2016, 5:52pm

No problems at all, if it’s exposed read only I cannot see a problem. I do not think though that having several version are going to work right now. Baby steps are Ok, but large changes like that needs an RFC itself, I would be amazed if there were no side effects there. So the version as is now is my vote really.

So I would say if we did expose the inner version then Ok, but not any more right now.

ustulation · September 26, 2016, 5:55pm

And would you want to put a disclaimer in the API that a version increment could mean any of these 3 points ?

dirvine · September 26, 2016, 5:57pm

I would say it’s an internal incrementing counter, required by the network to exist and only increment by 1 on each change of any data element within it. That may be enough?

ustulation · September 26, 2016, 5:59pm

If i did a post of same data none of the data elements would change though - just the version increments - is that OK?

If yes then we will have to say something like: it might increment for no change too.

And ofcourse for AppendableData - it might not change even if data was changed - depending on if owner did it as a POST OR owner/users did it as APPEND ?

Aren’t those going to cause confusions ^^^ ?

dirvine · September 26, 2016, 6:09pm

Yes possibly, perhaps “any POST will increment the version by one only or fail”?

ustulation · September 26, 2016, 6:18pm

Ok - so leaving it exposed in safe_core with this disclaimer. @Krishna Could you also put this disclaimer in the Launcher-App api’s - Something along the lines: version will increment for any successful POST operation by 1, but beyond that there is no guarantee - all internals could have changed, only a particular internal could have changed OR there might have been no change at all. Also if tracking for AppendableData, a changes in internals might not result in version increment.

( I want to see how a concrete use-case/production-grade-app based on above disclaimers/uncertainities even looks like).