Transparency or opacity of SD modifications

But it could do that and get the behaviour u want (if noting it was important to you) or the app could do that on your behalf - and that would be better and more granular because maybe there was no error in generation but addition of ownership. Now assume we did make guarantees on the version field - or maybe even introduce a new field that is use-facing and guaranteed to increment for every change. Now consider what if somebody else demands the opposite - I own this data and i don’t want people to know how many times i edited it by looking at my SD at any point of time ?

Yes, but generating SDs and managing a correspondance table is necessarily more complex than just generating SDs.

In my example the app facing the user is just a displaying app. No risk that the user adds modifications of any kinds to SDs. The SDs are owned by another application generating them.

Yes my proposal doesn’t allow that. But I prefer transparency. Maybe this is the crux of the debate. It makes me thinks of editable blockchains, and clearly I don’t like that.

It is - that’s the reason i am not comfortable committing to it, specially since it’s just not the version field, it also assumes deletion should be in a particular way etc.

And if you consider AppendableData version field does not even work that way - the data field can keep changing and version does not increment - then suddenly (because owner decided to do something) version changed. Also even for StructuredData, you could do a POST of same data which would increment the version without changing anything else - at that point all you would be tracking is number of POSTs someone has done (not necessarily any editing) - that seems too much of a niche use-case (which can still be handled by way of hashing and without relying on versions) and such demands can be endless. This whole notion of version field is for something completely different - banking on it for a behavior which also must involve certain other guarantees from certain other operation (e.g. deletion) is dangerous right now. If i did commit and then later had to change it due to some reason (back-end) it would cause endless woes.

Nice thread :slight_smile:

Personally I’d like to side with “nullify signatures and data. ALSO bump the version”(not sold on this yet). Basically its a sugar-coated POST operation at this point. I’m not sure about if this operation qualifies a separate RPC(DELETE) or should it just be expected to be done as part of POST and DELETE not supported for these data types(SD or AD). Sure right now DELETE itself can get removed in such a case since ID doesn’t support it but that might change with Owned ID or something so just being specific here.

In terms of “how should delete function”, if I haven’t lost track of the details in this thread:

  • @tfa I guess means don’t remove the data itself, but nullify the data for delete and let someone else claim it if need be but the version would be incremented as part of the delete and then further again when new owner claims it.
  • @ustulation you’re not comfortable with ^^ I guess? If so I guess you’re making a case for delete means the data is gone from the network and recreating it would start with version 0 or whatever it normally starts from.

I see pros and cons to both sides here :frowning: like I said personally prefer option A, but just to confirm I’m not missing anything here:

The main advantage being mentioned for tfa’s approach is by using the version one can confirm if the data has “since” been mutated(regardless of data field in SD changing). I can certainly see the benefit/use-case to this. On the other side yes this means you’re storing the expected version somewhere and in its place with option B, one can simply store the hash of the SD to get the same behaviour.

This is where I see two things, one is why get the client to do a hash operation or equivalent if they don’t need to. Granted this might not be too expensive and get ignored. Also say a bit more future scope could be, maybe get vaults to support just a GetVersion or some other RPC where someone can just query for the version without actually retrieving the whole data before they validate it and confirm they want the payload. This can help not downloading useless data if they aren’t going to use it. It certainly needs support and discussion from Vault side but if acceptable certainly favours the version approach than using the hash approach I think? unless vaults generate the hash and return it :slight_smile:

Also it makes churn handling certainly easier in vaults with option A which was one of the reasons it existed previously as it helps prevent data coming back from the dead. Sure they’re ways to work around this by keeping a delete cache that will block any refresh of the data getting into the chunk store, which was an approach that was tried and found functional but this does leave a time based cache which itself isn’t refreshed to other nodes open.

In the pro side of actual delete, the main benefit is network isn’t handling useless data which is considered deleted. By nullifying SD, network is still holding it in the vaults and continuing to refresh with churn. It does thereby get the network to do work for this albeit small for something which the client has deemed not needed anymore. How scalable is this and is it just going to eventually be too much for the network(kind of similar arguments which start to favour owned ID). Also how might refunds work, with actual delete its easy to work out an agreeable refund I’d guess but if its just a nullify operation and the network is still “working” in terms of keeping this zeroed data, then should their be a cost factor to it and not allow full refunds if any refund is allowed at all.

This last point is what makes it hard to just pick option A. Removing my bias to try and keep vaults simple for churn handling :innocent: right now I can see the use-case of nullify SD than delete being “achievable” even if we just did a hard delete.

I don’t think he wants someone else to claim it either - i might be wrong though.

Thats my understanding as well and an important aspect.

So I gather its to protect the original user destroying the SD and then recreating it and claim that it is the original and has never been different, IE falsifying the original. With a version anyone can tell its been modified, Then its to also to protect against others spoofing something that was destroyed because the owner wanted it available no longer and the address was linked or computed

I am wondering if we could have a zero data SD that basically only contains address+tag+version and can be claimed by anyone, effectually the SD contents is destroyed, consumes little space on the network and this keeps version number.

If the owner wishes to keep ownership then they do not “delete” but simply zeros the data and mark the SD as empty. Maybe a flag in the SD can denote an empty SD. Or have the “deleted” SD also contain the owner field optionally

This is something i have a problem to. As you said the original intended purpose of version field was (and for me is) solely :

It does not indicate change in a way ppl might think -

  1. Maybe the data changed.

  2. Maybe only the owners changed. Maybe some other field in future-scope.

  3. Maybe nothing changed - I just did a post of exactly the same SD (it’s free anyway).

I don’t know what ppl would be tracking with tracking the change in version, unless it’s for the mere pleasure of version-gazing. For anything more serious, more concrete and better method should be sought (vaults return a hash of entire or fields of choice or client keep it themselves or whatever) - this being my argument and suggestion. If vaults could handle churn and others without version, we could even remove it from SD without affecting users.

This is just for my understanding - what was the downside to a full delete and restart from version 0 ? Was it the above concern mainly or was it tied to replay attacks or were there other caveats ?

Oh about 1001 things especially with book keeping APPs and any sort of business auditing system

Pls read that in context of that entire para from which you quoted.

1 Like

For some things using the version that is updated by the network on any change can save on complexity and additional storage needs elsewhere. Lets using an indexing service, rather than keep a parallel meta data system (for knowing when a set on indexes changes) it relies on the version # of the SD storing that set of indexes. This is just to show the concept of what it can be useful for

Unless I completely misread what you are saying then I think the above should give you a concept where network version can save additional space & computation by being able to use the version #

(I know its also for network functionality)

1 Like

Don’t really mind something being more than “originally intended” as long as its useful. Don’t see a point in restricting something just cos it wasn’t considered before.

In this case though it might not be considered “useful” which I think we’d agree on if the explicit requirement was to track the data portion and not the type’s metadata.

I’d say you’re lame for doing that :slight_smile: and maybe vaults should probably return an error for no mutation. and on repeat occurrence prolly ping your ClientManagers and charge you some safecoin for spamming :smiley: (just kidding or am I)

Yep churn handling complication. It’s a topic by itself and pretty much a reason why no data type offered a pure delete previously.

As i said it can be updated for no change at all too - point 3.

Even if you depended on versions, you will (in most cases i assume) would need to keep what the previous version was (a u64) so that you know that the freshly fetched SD version denotes a change. If you are keeping that u64 why not keep a hash which is way more robust if you combine this para with what i am saying in the previous para.

Viewing things form a bigger perspective (the big picture / philosophical aspect of something) is much more beneficial than thinking about specialised use-case. If you are looking for guarantees look at it in conjunction with all the caveats i am mentioning. There can be better alternatives.

It’s not the question of being lame - it’s the question of guarantees. You can have an api:

fn foo() -> i32 { ... } // returning a signed integer

and mention that only positive values can be returned. Then if someone returned a negative integer you could say you are lame OR you could make an api thus:

fn foo() -> u32 { ... } // returning an unsigned integer

My choice would be the second.

If someone depends me on changing my version only through actual mutation of data, i could always trick them by not doing so. With hash it would be a guaranteed approach. In secure systems, ppl should learn to depend on tangible and concrete guarantees rather than specific-tailored or extrapolated guarantees.

I know and my answer was written with that knowledge. Unless its changing often it shouldn’t really change the uses. Just that it might trigger an update elsewhere which results in no changes elsewhere.

Its increasing the processing and storage. Lets say the indexing is creating a search engine for SAFE network. Your extra u64 & hash processing will add a LOT of storage and electricity usage across the globe just so the versions are kept secret??? I am assuming that versions is updated by the network whenever changes of any type is made to the SD.

The point is lets design to reduce work for APPs when it does not impact the network code. Every u64 that has to be stored has to be multiplied for every APP and every SD the APPs use version # for. So if there are 1 x 10^9 indexes taking 1 x 10^7 SDs to store them and 1uSec to compute the hash to check the version means that at least 8 x 10^7 just to store the hashes. What when the number of users is at 2 billion users and 10 billion safeweb pages (10^12 indexes) then you need 80 TBytes of storage just for the hashes on top of the version # of each being stored. Address is computed so no need to store SD addresses. Its also possible to store version # as u16 since its unreasonable to expect the version to change 65,000 times between accesses. (or u32) So the hash is more than needed for version #

I don’t have long to chip in here (Viv’s a hard task-master and he’s wanting Routing work done today! :slight_smile: ) but in the meantime here are my hurried thoughts:

  1. I prefer the deletion to be in the form of a separate, new version appended to the latest.

  2. I always thought the SD version could be used by client apps. Spandan, I take your point about version increments not always relating to data changes, but we could handle that by making the version something like:

pub struct Version {
    /// Sequential number incremented when `SD::data` changes.
    pub major_index: u32,
    /// Sequential number incremented when any SD field other than `SD::data` changes.
    pub minor_index: u32,
    /// Arbitrary, version-specific information.  May be empty.
    pub data: Vec<u8>,
}

Thus the major_version represents the number of changes to the data, but we can still retain a means for strict total ordering of all versions.

  1. For actually removing the SD from the network, I prefer an expiry time as a mutable member variable of the SD. This would be a separate mechanism to the delete we’re talking about here (which is more akin to making the SD invisible I suppose). If an SD is Put with a long duration until the requested expiry time, the user can pay more. If the SD is modified to change the exipry time, the user can be refunded or further charged appropriately. When the expiry point happens, the vaults managing the SD just remove it. This isn’t trivial, and I know that using timestamps is highly controversial, but that’s my preferred approach just now.

OK - scurrying back to Routing now! :slight_smile:

1 Like

I don’t get this part - can you elaborate a bit more ? What was the requirement ? Was it to know whether a version changed ? In that case you would have to store the previous version (a u64) to know if it changed (else how would you know of a change).

Yes, these things i would agree to - if there are specific changes/additions etc made to guarantee a certain user-side behavior then i am for it. Though i would maybe debate a little further on merits of doing this at client side vs vaults - like what use cases benefit from it and if there can be better alternatives. [quote=“Fraser, post:39, topic:138”]
For actually removing the SD from the network, I prefer an expiry time as a mutable member variable of the SD
[/quote]

You will need to elaborate more on this though - like why is it needed, what does it buy us, what are the downsides of not doing this etc.

Keeping version # in the APP would only require a variable large enough to detect change. EG using @Fraser’s idea the APP could store version number with a u8 so that it can see changes. Should only change little. If expect it to change more then use u16 and worse case u32. This holds true even for version simply being a u64 updated whenever anything changes (or more often)

Remember many APPs generate the SD address and do not actually store it and version # can be kept similarly using sparse array methods

so can use an sparse array of u8 (or u16) to store needed info for versions.

But remove versions from being used by clients then need version u8 (or u16) and hash u64

The difference is 72 bytes (or 80 bytes) instead of u8 (or u16).

Now if you talk after years of operation there are more than 100 billion safe web pages and 100 billion other public files which have on average 500 indexes per page/file which result in 1 x 10^14 indexes. Then each SD has 200 indexes stored in each. This means we need 5 x 10^11 index SD required.

5 x 10^11 times u16 sparse array is 10^12 bytes of storage needed (approx 10^7 SDs)

But without versions being revealed to client

5 x 10^11 times 10 bytes is 5 x 10^12 bytes of storage needed (approx 5 x 10^7 SDs) PLUS the processing (electricity) to compute the hashes when needed.

Why should we make APPs more complex & wasteful when version can be given to the client.

Remember some seemingly small increases in storage/computing can be massive when there is a lot of similar objects to be stored.

Same here and surprised there is consideration to removing it from the client’s access.

Hmmm … Time is related here as it how many X we discuss it and how many X’s is it close to the first tool we choose to pick up, or how X will break the system (btw X == time in this sentence). I think many people prefer time, servers and the status quo, but that is not what we are doing here, at least till now.

The amount of pushing to get (agreed and synchronised) time into the system really should be done as a separate RFC (there is one RFC that touches on dht based time like capabilities). It’s not a tool available to us just now.

It’s known to be a large issue so no point it appearing in so many suggestions without really exploring the side effects etc. of giving up on full decentralisation and relying on centralised trusted managed owned servers or hardware devices on each computer.

I probably agree with this part, but think that it shoudl not be able to be set by client apps (i.e. read/only) as the network does use this for resolving conflicts.

1 Like