What in the API causes GET?


#1

My understanding is that newPublic() for example doesn’t cause a GET for the MD, but that calling GetVersion() triggers a GET.

I’m looking at how to minimise GET for MutD and ImmD in my code, so first I’m interested in when the API will and won’t trigger a GET, and when it will return information held locally on an MutD or ImmD without touching the network.

So for example, does every MutD access API (mData.getEntries() , entries.get() etc) cause a GET, and if not every such API does a GET, then which do/don’t?

For ImmD, similar questions! And maybe these need to be considered between different file sizes (less than X, greater than X).

Please include access to metadata for both MutD and ImmD.

Finally, the NFS emulation: what network access happens in fetch(), open(), read() etc?

If somebody from @MaidSafe who knows this area could write a short guide it will help a lot, and I hope lead to faster SAFE apps! Mine will certainly benefit.

I’m about to start caching data structures in SafenetworkJs so this information will help a lot - and be of immediate benefit to the SAFE FUSE app which is now working for read, but too slooooow! :snail:


#2

Hi @happybeing, great question!

We have already started writing new documentation and guides describing the inner workings of Client Libs. What you’re asking for sounds like a very useful addition to that, and since it’s practical for SAFE app developers we’ll happily prioritize this topic :slight_smile: I’ll get back to you here with a full answer within a few days!


#3

That would be brilliant, thanks. If during the process you are OK letting me have a rough not 100% accurate draft that could also be useful as a guide. Some info sooner could help, but I’m also OK waiting - plenty of testing to do and that’s bound to throw up things to get on with :slight_smile:


#4

Sure, I’ll post a draft here soon.


#5

@happybeing Here’s some info about GETs/mutations in the API provided by Client Libs. It should closely mirror the Javascript API we provide but if there is any confusion let me know, it’s also not a complete list yet.

Also, feel free to give suggestions for improvement for when we release an official guide/reference. I’m thinking of moving this to table format but I don’t know if it’s possible on these forums.


MutableData

mdata_put - 1 mut
mdata_get_version - 1 GET
mdata_serialised_size - 1 GET
mdata_get_value - 1 GET
mdata_entries - 1 GET
mdata_list_keys - 1 GET
mdata_list_values - 1 GET
mdata_mutate_entries - 1 mut
mdata_list_permissions - 1 GET
mdata_list_user_permissions - 1 GET
mdata_set_user_permissions - 1 mut
mdata_del_user_permissions - 1 mut


MDataInfo

mdata_info_new_private - 0 GET
mdata_info_random_public - 0 GET
mdata_info_random_private - 0 GET

all mdata_info_* functions - 0 GET


ImmutableData

idata_new_self_encryptor - 0 GET
idata_write_to_self_encryptor - 0 GET, 0 mut, done locally until SE is closed
idata_close_self_encryptor - 1 mut
idata_fetch_self_encryptor - 1 GET
idata_serialised_size - 1 GET
idata_size - 0 GET
idata_read_from_self_encryptor - 0 GET
idata_self_encryptor_writer_free - 0 GET
idata_self_encryptor_reader_free - 0 GET

Notes

  • The actual number of GETs and mutations depends on the size of the data. Any idata at least 3kb in size will be self-encrypted to at least 3 chunks. Smaller than that, and the entire idata is held in the datamap. Each chunk is a separate GET/mutation. The maximum size of a single chunk is 3mb. So keep in mind that getting a whole idata will almost always be more expensive than e.g. getting the size.

Entry Actions

all mdata_entry_action_* functions - 0 GET

Note that all queue’d up actions perform a single mutation total when mdata_mutate_entries is called.


NFS

dir_fetch_file - 1 GET
dir_insert_file - 1 GET, 1 mut
dir_update_file - 1 GET, 1 mut
dir_delete_file - 1 mut

file_open - 0 or 1 GET (to get the datamap if reading or writing in APPEND mode), +3 GET for medium files (except in OVERWRITE mode)
file_size - 0 GET
file_read - 0 GET
file_write - 0 GET
file_close (writing only) - 1 mut (to put the new datamap), +3 mut for a medium file

Notes

  • file_open initially does 3 GETs for a medium file when writing in APPEND mode, same as reading. We need to get the existing data and decrypt so we can re-encrypt it again after writing – in OVERWRITE mode the existing data doesn’t matter.
  • The amount of muts by file_close depends on the file size after writing, as you can open a small file and it can be medium or large by the time you’re done writing.

#6

Thanks @marcin this is great and before I needed it so thanks very much. Now, I have questions! :slight_smile:

I was thinking a GET would be a useful unit for performance estimation - so effectively a unit of latency - but you have differentiated between GET and number of chunks so I’m not sure how that will affect latency. So some guidance here would help - in how to interpret 1 GET, 1 mut etc.

For my immediate purposes, I’m wondering whether it affects things much if a GET results in one chunk or three? For example, would that happen in parallel and therefore best be regarded as just one unit of latency? A bit more than one? Or three?

Again, from a latency per API call point of view, what is the impact of “1 mut”?

If I want to look up a safe_app_nodejs API call to see which lib calls it makes, is there a single file or directory to look at?

It might be handy to include PUTs while you’re compiling this and I will be wanting that for performance before long myself.

BTW You can do tables on the forum using HTML tags:

One Heading
Two Colums
Line one
Line two
Line three

I’m not sure what the capabilities are and didn’t find the reference just now, but this is what I just did:

<table>
<tr><th>One Heading
<tr><th>Two<th>Colums
<tr><td>Line<td>one
<tr><td>Line<td>two
<tr><td>Line<td>three
</table>

#7

Good question. Client Libs and Routing are fully asynchronous so these operations can be done in parallel, and each chunk will have a different XorName and usually will be handled by a different DataManager. 3 GETs should not be much more expensive than 1 GET most of the time. As for mutations, they will be more expensive than GETs because they have to go through a MaidManager to make a PUT request which is then passed to a DataManager. A GET can go to a DataManager directly. I’m not sure if we have quantified the difference in latency though.

You can look at the API definitions in src/api to find the underlying Client Libs functions. For example, newPrivate is a wrapper around mdata_info_new_private:

Now, the newPublic function you mentioned in your first post does not have a Client Libs equivalent, but you can rest assured that it does not cost any GETs as all the work is done locally :slight_smile:

Ah thanks :+1:


#8

NFS operations cost more than what you mentioned, due to datamap not stored in MD anymore:

Operation Cost Requests
Creation of small file 2 units 2 PUTs
Creation of medium file 5 units 5 PUTs
Update of small file 2 units 1 POST + 1 PUT
Update of medium file 5 units 1 POST + 4 PUTs
Read of small file 0 units 2 GETs
Read of medium file 0 units 5 GETs

(small file is less than 3KB, medium file is between 3KB and 3MB).

See this post for more information.


#9

Thanks for the correction @tfa, the file_* functions were totally wrong. In my defense, Mark said the first draft didn’t have to be totally accurate :wink:

I’ll be editing my post with the following corrections:

file_open - 0 GET -> 0 or 1 GET (to get the datamap if reading or writing in APPEND mode), +3 GET for all medium files
file_size - 0 GET -> 0 GET
file_read - 0 GET -> 0 GET
file_write - 0 GET -> 0 GET
file_close (writing only) - 0 GET -> 1 mut (to put the new datamap), +3 mut for a medium file

If you look at reading a medium file, for example, you’ll find the numbers match up now (dir_fetch_file + file_open + file_read = 1 + 4 + 0 = 5 GET).

You’ll notice that file_open initially does 3 GETs for a medium file when writing in APPEND mode, same as reading. We need to get the existing data and decrypt so we can re-encrypt it again after writing – in OVERWRITE mode the existing data doesn’t matter. Also, the amount of muts by file_close depends on the file size after writing, as you can open a small file and it can be medium or large by the time you’re done writing.


#10

@marcin I want to check the NFS fetch is really 1 GET? I tried examining the code but get lost looking for what implements dir_fetch_file.

I’m asking because I don’t understand how it gets the information about a file. So is that part of the value for the NFS file entry? If not, and the info has to be accessed from somewhere else I’d expect a fetch() to be two GETs, one to get the value, and another to lookup the file info. Just trying to understand what happens inside these functions so I can make a performant design.

A few things seem odd because I don’t have a grasp of how MD is implemented across the API to the network. For example, if you GET the MD, I would assume you now have all the keys and values, but if you then do a get(key) it costs an extra GET.

Same for most other operations, at least one GET, so there’s no caching even though we hold a reference to the MD, is that just an address, and every access related to the MD goes to the network?

This seems sensible, from a data integrity point of view, but at a cost to performance. So in some applications it might be useful to be able to get a cached MD in one GET and then work with that? Does that make sense, and if so, what’s the cheapest way to obtain that now?

I’d find it useful to have an explanation of how this operates, or if not a pointer to the relevant code might help. Cheers.


#11

Yes, it is (for reading the MD corresponding to the container).

Yes, the file metadata are part of the file entry in this MD.


#12

When you ask the network for an MD you actually get back an MDataInfo which then needs to be passed as a handle to other functions to get further information about the MD.

If you want all the entries, you simply call mdata_entries with an MDataInfo handle to get an MDataEntriesHandle. The full list of entries is now in the object cache locally (meaning there’s no performance hit for further access) and you can access it using the mdata_entries_get function, passing in the corresonding entries handle.

This is what it all looks like from the Rust point of view, let me know if you need any help matching this information over to the Javascript API. Did I answer all of your questions @happybeing?


#13

Very helpful @marcin thanks.

So after getting the entries, does that cache stay available until the handle is released, in which case any gets() on the entries handle would I assume not cause a network GET?

What if anything (apart from releasing the handle) would invalidate the entries cache?

What happens to the cache on insert /update for example?

And with NFS, can I hang onto a File handle from NFS fetch() even after messing with the file entry (say `open() /write() /close()) - can I just keep reusing the fetched File handle, or should I always do a new NFS fetch() each time I want to operate on a file entry?

I have a cache of NFS file state and am seeing some strange behaviour which might just be my buggy code, but it just occurred to me it could be due to my having cached File handles which later don’t work as I expect.

Also, I’m curious about this:

dir_update_file - 1 GET, 1 mut

If insert/update require a GET, why does delete not:

dir_delete_file - 1 mut

What is the GET doing?

Thanks for your time.


#14

We call it the object cache but unlike many caches memory will not be dropped when it gets old. The “cache” is basically just local storage, though we provide a function app_reset_object_cache to invalidate the entire cache. Apart from that, the object cache is persistent for the lifetime of a session. Maybe this is something we can look into changing, e.g. we can drop old unused objects so that users don’t run out of system memory. It’s the developer’s responsibility to free up memory and drop object handles by calling the appropriate free functions.

That’s a good question. Yes, the file handle is removed from the object cache when close is called. Is there a use case where you’d need to reuse it after closing?

dir_update_file needs to get the file version so it can increment the version properly on update. dir_delete_file just sends a delete mutation to the network.