RFC 38 – Appendable Data

frabrunelle · August 29, 2016, 3:44pm

Discussion topic for RFC 38 – Appendable Data

Fraser · August 30, 2016, 12:22pm

I have a few questions on this:

It does seem like we can achieve the functionality of StructuredData using these types, with the exception of the missing type_tag field. Do we still need StructuredData if these types are added (and if we add type_tag to them)?
Why do we only have the choice of a blacklist or whitelist? Wouldn’t it be more flexible to allow a filter where each element was a blacklisted or whitelisted node?
Could AppendedData::pointer be a Vec<u8> rather than DataIdentifier to avoid the indirection and cost of storing a tiny amount of data as an ImmutableData chunk? I guess if we keep it as is, users will abuse the field to store arbitrary data as a DataIdentifier anyway to avoid the cost.
I guess the name field is immutable even for the owner? (bullet point 3 of “Detailed Design” doesn’t imply this)
Do we have a plan for handling conflicting concurrent POST requests? It might not even be an issue?
For the sake of three characters each, couldn’t we rename Priv to Private and Pub to Public for the data types?
I’m not sure that we need both current_owner_keys and previous_owner_keys in the data types. We really just need the current keys I think. Keeping both does allow the signature validation to be carried out by any node which doesn’t already hold the previous version of the data, but I’m not sure that we need that ability?

ustulation · August 30, 2016, 12:39pm

Noooooooooooo

StructuredData (SD) will also have data field and then with that all the complexity of versioned vs unversioned representation of data. OTOH AppendableData (AD) have filter, encrypt-key, deleted-data, appended-data etc field which are not required by normal SD and hence would bloat it. So this was done to avoid complexity and bloating of a single type.

They are mutually exclusive so it didn’t make sense to have them both - but could have missed something. Can you explain more ? So if both are present and a key is not listed in either one of them, what do you do with it ? And similar questions.

Indirection meant that pointed to data could be any type as determined by user/app. It could be SD as well as ID.
Also under normal usage (no misuse) the pointer would take much less data. Keeping data as is could be an encouragement to put data inline which would bloat AD and necessitate frequent clean-ups.

They will both be required during transfer of ownership. The trnasferers would be in previou_owners field and transferee in the current_owners. This is the same as SD.

Viv · August 30, 2016, 12:44pm

Shouldnt be an issue and should get dealt same as SD for POST requests.

If we plan to use the chunk store stored item to extract previous owners and the incoming request for current to do validation, then this can hopefully get discussed and actioned separately as it’d impact SD in its current impl too than just this type.

ustulation · August 30, 2016, 12:53pm

Yes, that’s a valid point. Will need to state that modifying the name of existing AD will not make sense as it will no longer point to it - or something along those lines.

It’s the same for SD right now i think - so all advs/limitations there would apply here. The additional operation is the merge of data field by vaults. Every POST will need a proper version bump and only owner can do it. In that sense it is not very different than an SD.

lol - this will collection of public opinions, so will wait for more to comment. My personal vote is no. Besides becoming a longer type-name, another one i have is given we use rust it’s more rust-like to abbreviate - CondVar SocketAddr etc. But apart from convenience (without sacrificing expressionism) and more-rust-like, i don’t have any other reason.

dirvine · August 30, 2016, 12:57pm

I take your 5 chars and replace with one 3

[Edit my vote as you will know is no ]

AndreasF · August 30, 2016, 1:04pm

Since deleted_data is signed, safe_vault can’t remove it without breaking the signature. That means that only the owner can remove outdated entries in there, e.g. in their next update. Is that intended?

ustulation · August 30, 2016, 1:06pm

Yes - it’s in the 5th bullet point rfcs/text/0038-appendable-data/0038-appendable-data.md at c513fd24f6ba742fe97d27f5dd0d5ef132b5fcc6 · maidsafe/rfcs · GitHub

Fraser · August 30, 2016, 1:15pm

I’m not following… SD is versioned in the same way as these new types isn’t it?

By bloating, do you mean code bloat? The space required to store or transmit these isn’t very different, right? If so, I think there may be more code required to handle these three types rather than just two (getting rid of SD). Do you agree that SD could be replaced by if we add the type_tag, or am I missing something?

Yeah - I was thinking of e.g. a case where a user decides to change from allowing anyone to comment, but has blacklisted a few folk, to allowing only friends to comment. So I saw this as a blacklist and a whitelist. I guess I should have thought it through a bit further though, since I think you’re right - they are mutually exclusive at any given point in time. Yeah - just pretend that question never happened

Not sure what the frequent clean-up are? (moving to deleted_data won’t reduce the size of the element to below 100KiB) but I think it’s less work for the network and cheaper/quicker for the user to store a 90KiB piece of data in the pointer field rather than putting that up as a second data chunk. This is even more pronounced if the data is only a byte or so. With the current proposal, for data less than 65 bytes, I’d just stick it in the pointer field anyway to save myself the cost and effort.

Yes, and I had the same question for SD at the time. At transfer of ownership, if the vault handling this doesn’t have the old version, it’s not going to process the request anyway. If it does have the old version, it already has the previous owner keys (in the old version). The only way I see having both sets of keys as worthwhile is if we want to have vaults which don’t have the previous version validate the request. I don’t think we want or need that?

Fraser · August 30, 2016, 1:20pm

I originally wrote this as “three chars each” and then saw the hypocrisy…

dirvine · August 30, 2016, 1:30pm

My feeling is SD versioned SD, PrivAppend, PubAppend are very close and will end up possibly as subtypes or similar. We discussed that at some length but backed out of inheritance/traits etc. for now untill all use cases were tested in the wild.

As this would be an internal refactr if it happened (later) rather than an API change itself (if we get that right) then it probably makes sense to look at similarities but not currently couple the types. So very similar features in dynamic data types for now and when complete (I still see push types being added) then looking again at the base type may be a very good ides for code and algorithmic completeness IMHO.

Fraser · August 30, 2016, 1:44pm

Yeah, if this is all hidden from users, so we can seamlessly go from storing mutable data as one type to another in the future, then I have less of an objection.

I think my concern is imagining the Vault code to handle this - I see there being a lot of common code for all these types, and it may make sense to try to move the point at which that funnelling happens as close to the Client API as possible to reduce the overall complexity and code size.

I’m not at all convinced by my own argument here though - so probably best to just stick with the proposal as is if this issue has already been discussed!

Fraser · August 30, 2016, 1:52pm

A minor point on implementation (not nearly as important as the missing three characters!)… would it be slightly cleaner to change

filter: (FilterType, Vec<sign::PublicKey>)

to just Filter where this is defined as:

enum Filter {
    BlackList(Vec<sign::PublicKey>),
    WhiteList(Vec<sign::PublicKey>),
}

AndreasF · August 30, 2016, 2:40pm

Let’s use BTreeSet instead of HashSet, because impl<T> Hash for BTreeSet<T> where T: Hash, so we can #[derive(Hash)] for PubAppendableData.

ustulation · August 30, 2016, 3:23pm

ah no, this is more sort of the way safe_core utilizes the data field of SD (nothing to do with the versioning of SD itself). So for instance when SD of type 500/501 is created any data added to data field is represented in implementation specific way by safe_core. This is because 500 represents unversioned data in SD while 501 is versioned data in SD. So we didn’t want to club SD with AD just yet and have common implementation.

Not the code-bloat, rather the type-bloat. When not creating AD (as was all the use case until now due to absence of AD), one wouldn’t want the presence of other fields bloating the type, because some of them do not even have a sensible default.

Complete deletion is moving to delete_data followed by purging from there too (after sometime).

that’s a valid point - considering we went that way (data itself as Vec) how do we differentiate between small data (typing a 5 byte text) vs an Image of 100 KB which would end up utilising the complete space ? One solution i can think is to further wrap it into an enum:

pub enum Representation {
    Data(Vec<u8>),
    Pointer(DataIdentifier),
}

But now the interface actually lures ppl into using Representation::Data all the time and that would require frequent cleanups by the owner to recover the limited 100KiB space.

Do you have any particular solution in mind ?

The apps putting the data and apps reading it must agree else it will be discarded as invalid DataIdentifier.

EDIT: btw XorName is 32 bytes now, not 64

Fraser · August 31, 2016, 7:52am

Ah - OK. So would you plan to move unversioned data exclusively to SD and versioned to Appendable?

But this won’t be exposed to users, right? So we have the option of a single type with more fields than might be required for every context, and maybe some branches in functions in Client/Vault to handle the different contexts versus multiple types each designed to suit a specific context where we have to handle each in separate functions? Anyway - I think this point has now been decided.

I’m comfortable with either a plain Vec<u8> (if we have type_tag it’s maybe easier for client apps to know what the field represents) or the enum. Not sure that Clients or Vaults benefit from knowing what the field represents, so I guess I have a slight preference for the plain Vec<u8> for simplicity.

Agreed. Again, the type_tag might be of benefit here.

Oh yes! Good point

AndreasF · August 31, 2016, 9:35am

The RFC doesn’t mention the response types yet. If the wrapper types are separate, probably we need PubAppend, PubAppendSuccess and PubAppendFailure request/response variants and three analogous Priv... variants.

Alternatively, we could make the AppendWrapper an enum with a Pub and Priv variant. Then we’d only have Append, AppendSuccess and AppendFailure request/responses, with a DataIdentifier in the responses to distinguish?

ustulation · August 31, 2016, 2:09pm

Not sure i understand completely - but data field of SD is very different from data field of AD. So i guess part of the confusion is the name data for the field. Just for this discussion let’s rename the field to SD::data and AD::appendable_data.

SD::data is signed, AD::appendable_data is unsigned. So the mention of version/unversioned data only applies to SD::data a.t.m. AD::appendable_data is neither - it has a fixed representation in which public is only allowed to add and owner is allowed to do whatever.

If you combine SD and AD, and make it a new type SAD (couldn’t resist the temptation) then you would have two fields now: SAD::data which is signed and will represent unversioned/versioned/neither data according to type-tag and SAD::appendable_data which is unsigned and has fixed format (and now there is further complication because the format is different for Pub and Priv AppendableData - and we dont want only one type-tag to mean public and one to mean private - so we will have to allocate a range of type-tag and pemutaions will occur (versioned SAD with priv appedable data etc) - not nice). So when i stated we didn’t want to club SD and AD to have a common implementation i was referring to this part (along with the rest which is clear i think - Filters, encrypt-keys etc).

Viv · September 9, 2016, 12:15pm

k just to note some pending points here in terms of where the impl currently stands. A few of these points maybe need discussed a bit too:

In terms of safe_vault handling with PR https://github.com/maidsafe/safe_vault/pull/547/files

Append requests dont go through ClientManager yet. Currently its going straight to NaeManager and thereby not incurring the charge for same operation

An amount needs to be decided for this. Currently pre-safecoin, capacity is just a num with PUT’s restricted to 500 and each PUT operation consuming 1 from allowance. Should Append also just consume 1? Ideally considering this “num” is just a temporary thing until safecoin, we don’t have to make this very complicated.

ErrorCodes needs agreed on. Currently APPEND RPC failures due to Filter/POST(invalid sig/version/…) all come back as InvalidSuccessor

This again needs thought about not just in terms of creating new error codes but if we want to return errors in all such cases. One example raised by @ustulation was to maybe not inform blocked senders from knowing their request was blocked with a failure. Similar to current spam filters. Is this going to be relevant when the spammer can just get the AppendableData anyways and check if (s)he is in the filter which they cant do normally?

Append requests are limited to 1 at a time from the target group PoV. So multiple users appending to the same item, only a maximum of 1 is to succeed until quorum vaults agree and update their chunk stores. This isnt going to be great for famous items ofc that are being updated in parallel in close intervals.
Following maybe generic to not just be about this type but also about SD/Immut
Response handling. With the group refreshing the incoming data before accepting it to the chunk_store, non accumulated requests no longer trigger a failure to the sender. This is going to then trigger a timeout after safe_core exhausts waiting for a reply and is going to be misleading with general timeouts due to churn and probaly needs thought about. Maybe for this when vaults accumulate a particular IdAndVersion, they should actively clear any other entries in pending_write and return failure for the same with a separate code to thereby indicate to the sender their request needs to be tried again.

Vaults verifying data cap. This certainly is generic where vaults need to enforce the size restriction in their incoming requests which they aren’t doing yet.

dirvine · September 9, 2016, 12:21pm

Perhaps make this count 5000 then
Id = 100
SD = 10
AppendItem = 1

These are only rations and we have not yet figured how to let the network decide on the numbers yet, so a “good guess” right now may be simplest. These magic numbers should vanish at some stage though. The item is only a pointer and the pointed thing is paid for as per the other rules. So perhaps this would be a small change right now with an RFC to tidy this up.