My understanding is that newPublic() for example doesn’t cause a GET for the MD, but that calling GetVersion() triggers a GET.
I’m looking at how to minimise GET for MutD and ImmD in my code, so first I’m interested in when the API will and won’t trigger a GET, and when it will return information held locally on an MutD or ImmD without touching the network.
So for example, does every MutD access API (mData.getEntries() , entries.get() etc) cause a GET, and if not every such API does a GET, then which do/don’t?
For ImmD, similar questions! And maybe these need to be considered between different file sizes (less than X, greater than X).
Please include access to metadata for both MutD and ImmD.
Finally, the NFS emulation: what network access happens in fetch(), open(), read() etc?
If somebody from @MaidSafe who knows this area could write a short guide it will help a lot, and I hope lead to faster SAFE apps! Mine will certainly benefit.
I’m about to start caching data structures in SafenetworkJs so this information will help a lot - and be of immediate benefit to the SAFE FUSE app which is now working for read, but too slooooow!
We have already started writing new documentation and guides describing the inner workings of Client Libs. What you’re asking for sounds like a very useful addition to that, and since it’s practical for SAFE app developers we’ll happily prioritize this topic I’ll get back to you here with a full answer within a few days!
That would be brilliant, thanks. If during the process you are OK letting me have a rough not 100% accurate draft that could also be useful as a guide. Some info sooner could help, but I’m also OK waiting - plenty of testing to do and that’s bound to throw up things to get on with
@happybeing Here’s some info about GETs/mutations in the API provided by Client Libs. It should closely mirror the Javascript API we provide but if there is any confusion let me know, it’s also not a complete list yet.
Also, feel free to give suggestions for improvement for when we release an official guide/reference. I’m thinking of moving this to table format but I don’t know if it’s possible on these forums.
MutableData
mdata_put - 1 mut mdata_get_version - 1 GET mdata_serialised_size - 1 GET mdata_get_value - 1 GET mdata_entries - 1 GET mdata_list_keys - 1 GET mdata_list_values - 1 GET mdata_mutate_entries - 1 mut mdata_list_permissions - 1 GET mdata_list_user_permissions - 1 GET mdata_set_user_permissions - 1 mut mdata_del_user_permissions - 1 mut
MDataInfo
mdata_info_new_private - 0 GET mdata_info_random_public - 0 GET mdata_info_random_private - 0 GET
…
all mdata_info_* functions - 0 GET
ImmutableData
idata_new_self_encryptor - 0 GET idata_write_to_self_encryptor - 0 GET, 0 mut, done locally until SE is closed idata_close_self_encryptor - 1 mut idata_fetch_self_encryptor - 1 GET idata_serialised_size - 1 GET idata_size - 0 GET idata_read_from_self_encryptor - 0 GET idata_self_encryptor_writer_free - 0 GET idata_self_encryptor_reader_free - 0 GET
Notes
The actual number of GETs and mutations depends on the size of the data. Any idata at least 3kb in size will be self-encrypted to at least 3 chunks. Smaller than that, and the entire idata is held in the datamap. Each chunk is a separate GET/mutation. The maximum size of a single chunk is 3mb. So keep in mind that getting a whole idata will almost always be more expensive than e.g. getting the size.
Entry Actions
all mdata_entry_action_* functions - 0 GET
Note that all queue’d up actions perform a single mutation total when mdata_mutate_entries is called.
file_open - 0 or 1 GET (to get the datamap if reading or writing in APPEND mode), +3 GET for medium files (except in OVERWRITE mode) file_size - 0 GET file_read - 0 GET file_write - 0 GET file_close (writing only) - 1 mut (to put the new datamap), +3 mut for a medium file
Notes
file_open initially does 3 GETs for a medium file when writing in APPEND mode, same as reading. We need to get the existing data and decrypt so we can re-encrypt it again after writing – in OVERWRITE mode the existing data doesn’t matter.
The amount of muts by file_close depends on the file size after writing, as you can open a small file and it can be medium or large by the time you’re done writing.
Thanks @marcin this is great and before I needed it so thanks very much. Now, I have questions!
I was thinking a GET would be a useful unit for performance estimation - so effectively a unit of latency - but you have differentiated between GET and number of chunks so I’m not sure how that will affect latency. So some guidance here would help - in how to interpret 1 GET, 1 mut etc.
For my immediate purposes, I’m wondering whether it affects things much if a GET results in one chunk or three? For example, would that happen in parallel and therefore best be regarded as just one unit of latency? A bit more than one? Or three?
Again, from a latency per API call point of view, what is the impact of “1 mut”?
If I want to look up a safe_app_nodejs API call to see which lib calls it makes, is there a single file or directory to look at?
It might be handy to include PUTs while you’re compiling this and I will be wanting that for performance before long myself.
BTW You can do tables on the forum using HTML tags:
One Heading
Two
Colums
Line
one
Line
two
Line
three
I’m not sure what the capabilities are and didn’t find the reference just now, but this is what I just did:
Good question. Client Libs and Routing are fully asynchronous so these operations can be done in parallel, and each chunk will have a different XorName and usually will be handled by a different DataManager. 3 GETs should not be much more expensive than 1 GET most of the time. As for mutations, they will be more expensive than GETs because they have to go through a MaidManager to make a PUT request which is then passed to a DataManager. A GET can go to a DataManager directly. I’m not sure if we have quantified the difference in latency though.
You can look at the API definitions in src/api to find the underlying Client Libs functions. For example, newPrivate is a wrapper around mdata_info_new_private:
Now, the newPublic function you mentioned in your first post does not have a Client Libs equivalent, but you can rest assured that it does not cost any GETs as all the work is done locally
Thanks for the correction @tfa, the file_* functions were totally wrong. In my defense, Mark said the first draft didn’t have to be totally accurate
I’ll be editing my post with the following corrections:
file_open - 0 GET -> 0 or 1 GET (to get the datamap if reading or writing in APPEND mode), +3 GET for all medium files file_size - 0 GET -> 0 GET file_read - 0 GET -> 0 GET file_write - 0 GET -> 0 GET file_close (writing only) - 0 GET -> 1 mut (to put the new datamap), +3 mut for a medium file
If you look at reading a medium file, for example, you’ll find the numbers match up now (dir_fetch_file + file_open + file_read = 1 + 4 + 0 = 5 GET).
You’ll notice that file_open initially does 3 GETs for a medium file when writing in APPEND mode, same as reading. We need to get the existing data and decrypt so we can re-encrypt it again after writing – in OVERWRITE mode the existing data doesn’t matter. Also, the amount of muts by file_close depends on the file size after writing, as you can open a small file and it can be medium or large by the time you’re done writing.
@marcin I want to check the NFS fetch is really 1 GET? I tried examining the code but get lost looking for what implements dir_fetch_file.
I’m asking because I don’t understand how it gets the information about a file. So is that part of the value for the NFS file entry? If not, and the info has to be accessed from somewhere else I’d expect a fetch() to be two GETs, one to get the value, and another to lookup the file info. Just trying to understand what happens inside these functions so I can make a performant design.
A few things seem odd because I don’t have a grasp of how MD is implemented across the API to the network. For example, if you GET the MD, I would assume you now have all the keys and values, but if you then do a get(key) it costs an extra GET.
Same for most other operations, at least one GET, so there’s no caching even though we hold a reference to the MD, is that just an address, and every access related to the MD goes to the network?
This seems sensible, from a data integrity point of view, but at a cost to performance. So in some applications it might be useful to be able to get a cached MD in one GET and then work with that? Does that make sense, and if so, what’s the cheapest way to obtain that now?
I’d find it useful to have an explanation of how this operates, or if not a pointer to the relevant code might help. Cheers.
When you ask the network for an MD you actually get back an MDataInfo which then needs to be passed as a handle to other functions to get further information about the MD.
If you want all the entries, you simply call mdata_entries with an MDataInfo handle to get an MDataEntriesHandle. The full list of entries is now in the object cache locally (meaning there’s no performance hit for further access) and you can access it using the mdata_entries_get function, passing in the corresonding entries handle.
This is what it all looks like from the Rust point of view, let me know if you need any help matching this information over to the Javascript API. Did I answer all of your questions @happybeing?
So after getting the entries, does that cache stay available until the handle is released, in which case any gets() on the entries handle would I assume not cause a network GET?
What if anything (apart from releasing the handle) would invalidate the entries cache?
What happens to the cache on insert /update for example?
And with NFS, can I hang onto a File handle from NFS fetch() even after messing with the file entry (say `open() /write() /close()) - can I just keep reusing the fetched File handle, or should I always do a new NFS fetch() each time I want to operate on a file entry?
I have a cache of NFS file state and am seeing some strange behaviour which might just be my buggy code, but it just occurred to me it could be due to my having cached File handles which later don’t work as I expect.
Also, I’m curious about this:
dir_update_file - 1 GET, 1 mut
If insert/update require a GET, why does delete not:
We call it the object cache but unlike many caches memory will not be dropped when it gets old. The “cache” is basically just local storage, though we provide a function app_reset_object_cache to invalidate the entire cache. Apart from that, the object cache is persistent for the lifetime of a session. Maybe this is something we can look into changing, e.g. we can drop old unused objects so that users don’t run out of system memory. It’s the developer’s responsibility to free up memory and drop object handles by calling the appropriate free functions.
That’s a good question. Yes, the file handle is removed from the object cache when close is called. Is there a use case where you’d need to reuse it after closing?
dir_update_file needs to get the file version so it can increment the version properly on update. dir_delete_file just sends a delete mutation to the network.