SAFE NFS - open() capabilities and restrictions

happybeing · November 23, 2018, 12:42pm

Mapping POSIX file handling (as used by FUSE, fs etc.) to SAFE NFS has some tricky bits. As a result I’m exploring the limits of the SAFE NFS API and my experiments with git have just pushed past what I assumed those limits might be, by appearing to be able to open files for read and write, and also being able to open the same file multiple times.

Some of this at least appears to work, well doesn’t throw errors in the SAFE API. But I’m not sure what is actually supported, and if there are things I should do in a certain way etc.

I would find this very hard to discern from the code myself, so would appreciate someone who knows expanding on this, and maybe the answers could be fed into the API docs.

So, is it ok to open a file for read and write for example? Consider, the NFS open() mode flags we have are:

NFS_FILE_MODE_READ
NFS_FILE_MODE_APPEND
NFS_FILE_MODE_OVERWRITE

The docs show:

open(file: [File](https://docs.maidsafe.net/safe_app_nodejs/#file), openMode: ([Number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number) | CONSTANTS.NFS_FILE_MODE_OVERWRITE | CONSTANTS.NFS_FILE_MODE_APPEND | CONSTANTS.NFS_FILE_MODE_READ)): [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)&lt;[File](https://docs.maidsafe.net/safe_app_nodejs/#file)&gt;

But I don’t understand what the (Number | etc means because it seems different to the parameter description (next), so it would help to clarify what flag combinations are acceptable.

openMode ((Number | CONSTANTS.NFS_FILE_MODE_OVERWRITE | CONSTANTS.NFS_FILE_MODE_APPEND | CONSTANTS.NFS_FILE_MODE_READ) = CONSTANTS.NFS_FILE_MODE_OVERWRITE)

Unfortunately, POSIX options are subtly different, and rely on being able to specify the position of each read and write rather than just having this implied (as with NFS ‘overwrite’ (pos = 0) or ‘append’ (pos = EOF) modes). Here are the POSIX modes:

#define O_RDONLY  00
#define O_WRONLY  01
#define O_RDWR    02

Currently I’m getting away with this mapping:

O_RDONLY to NFS_FILE_MODE_READ
O_WRONLY to NFS_FILE_MODE_APPEND
O_RDWR   to NFS_FILE_MODE_APPEND|CONSTANTS.NFS_FILE_MODE_READ

Fingers crossed this seems to work fairly well with how FUSE does stuff on behalf of various applications, without too much hoop jumping in my code. Phew!

But now I see git is pushing the boundaries, more threading (ok so far), and opening a file for O_RDWR, and then opening it again multiple times for O_RDONLY! God knows why, but it is doing this, and I’m passing most of that to SAFE NFS and it isn’t throwing errors. I’m not sure if it behaves as expected yet, but… what should I expect?

Some explanation of how SAFE NFS works would be useful, or more details of what is allowed / not allowed / any restrictions for:

opening a file for both NFS append and read
opening the same file multiple times simultaneously
opening a file for append, read, and opening it again for read etc

Thanks.

EDIT: Just remembered another question…

Another thing that happens is requesting file attributes after creation, normally just to check it is there which works ok. But git is also doing this after writing a load of stuff before closing the file, and I don’t save anything to the network until close().

So is there a way to get the actual bytes written (i.e. file size) from a File object before closing it? I need to check if I’m actually getting this atm, but if I am it is coming back as size zero. -will edit this when I’ve confirmed what the code is doing here.

joshuef · December 4, 2018, 2:27pm

On your pre-edit stuff:

I’m not really sure, to be honest @happybeing.

I don’t think the JS api would have any problem with this per se with your suggestions (although… maybe the FFI lib might… that’s going a bit beyond my realm of ken).

@bochaco can you see any issues from the JS side?

And if FFI doesn’t have a problem with it, I’m not sure on the client lib side, what might be acceptable.

@nbaksalyar or @marcin could you perhaps shed a bit of light on what’s theoretically possible on the Rust side?

And we can see if you come up against anything specific @happybeing that might be a JS specific issue.

I’m also not au fait enough to answer solidly, but I suspect that’s not possible. (happy to be told I’m wrong though ) (Though perhaps there’s a way to spoof it if needs be…?)

happybeing · December 18, 2018, 5:09pm

Sorry to hassle you guys! No really It would help if you can shed light on what is supported as asked at the bottom of the OP. Thanks.

@DGeddes please can you bump this. It would help to know if these things are supported. Thanks!

One additional specific question:
If I call nfs.open() without args to create a file, can I subsequently call nfs.open() on it before it is closed, so that I have one (the first) object open for write() and a second open for read?

This appears to be what git is doing under some conditions, unless it is due to a bug I’ve not spotted.

DGeddes · December 19, 2018, 7:07pm

@nbaksalyar / @marcin - bump!

marcin · December 20, 2018, 11:58am

Hey @happybeing, some interesting questions you raise here. I had a look at the code Rust-side to see what should happen in these “edge-cases”. I also took an hour to write a test to verify these cases since I noticed we didn’t have one.

This is perfectly valid. How it works on the Rust side is like this: when you open a file you get a file context handle which points to either a Reader, a Writer, or both. These are stored locally. The Reader and Writer aren’t aware of each other, as they have their own internal state, so you can write new contents while still being able to read the original contents. When the file is closed, the Reader is dropped and the data written to the Writer is saved to a new file object which is returned.

Note: You can even open a file in both append and overwrite mode at once, in which case the mode defaults to append. I wonder if this shouldn’t result in an error instead…

Valid. Again, each Reader/Writer has its own internal state, which will not be affected by any changes to the file since being opened.

Do you mean concurrently or not? Either way, it is totally fine.

Let’s say that you have two Writers opened, A and B. You call A.close() and B.close(). You now have three distinct files: the original, the one returned by A.close(), and the one returned by B.close(). Any changes made in B will not affect A, etc.

I don’t think it’s currently possible with the API, but there are no technical blockers. We can easily add a new function to the API – the only questions are whether we should and what to call it. Say we have a file opened for both read and write. file.size() will always get the original size, but we could maybe add something like file.written_size(). Thoughts?

Note also that file.size() currently returns an error if a file is only opened in write mode. We could modify size() to return the written size in this case, but then what would it do if the file is opened for both read/write? I think that’s why we would need two different functions.

I’m not sure what you mean by “without args” – does this just create a brand new file? Wouldn’t the second open for read just be reading an empty file in this case? Maybe one of the FE guys can better answer this question as it sounds specific to the JS API.

Basically you can consider files, once opened, to exist as buffers locally. Any changes to the buffer do not affect existing buffers, and are only saved, to new files, when close() is called. This design helps minimize unnecessary network requests.

Of course, let us know if this explanation doesn’t match your experience. It’s possible that some things may not match up between Rust and JS.

marcin · December 20, 2018, 1:36pm

Thinking about this more, I wonder if this should even be allowed. It seems counterintuitive; a file opened for reading and writing will return a single file handle, but the Reader and Writer can have completely different contents. So you’d be using the same file handle for two different purposes which can be confusing and error-prone.

There is also no performance benefit to simultaneous read/write since it’s basically two separate fetches in one call to file.open(). I think it would be better if we disallow opening a file for both read/write (make it an error case) and instead encourage two separate calls to file.open(). Then we can make file.size() work for files in write mode. Would like to hear what others think

Again, I’m referring to the Rust side of things here – it’s possible that a restriction on simultaneous read/write exists in the JS API.

happybeing · December 21, 2018, 9:09am

Thanks @marcin this is very helpful. I’ll have a think.

Regarding what the ideal behaviour should be, I think we should look at something like POSIX and try to get as close to that behaviour as we can, first at the SAFE API level, and secondly making it easy to account for differences above that (eg in a utility library).

I don’t know what POSIX mandates if anything, or what applications like git expect in these edge cases, but it helps to know what the limits and possibilities are on the Rust side. I’ll see if I can find out more about both and will feed that back to you.

If any of you folk looked at POSIX or similar when designing these areas and have useful sources on this level of behaviour please let me know

From @joshuef comments earlier it sounds like this should be ok, and any differences I find would likely be bugs.

So all good for now, just one clarification…

When I’m using File objects to open a file more than once simultaneously, does it matter which onei use? Should i always use the one returned by fetch()?

And what about the File of a new file (returned by NFS open() called with no args), can I call that to open it for read? I think it’s a bit academic, but the answer may help with understanding.

I think my work now is to look into what git expects in the latter case and figure out if I need to emulate that. I’ll be happy if git is not really trying to do that. It seems odd, so may be a bug in my code or even FUSE.

Thanks all, and happy holidays

marcin · December 21, 2018, 10:24am

I’m afraid I don’t understand this question fetch() gives you the file, open() gives you file handles. It doesn’t matter which file handle you use (do you mean for reading or writing?) Do you have a specific scenario in mind?

I don’t know what open() does behind the scenes, but a brand new File that hasn’t been written to will not have a valid data map and trying to open it in read or append modes will fail to retrieve the data and error out. Of course, there’s no reason to be opening in these modes for a new file, so I suspect that open() doesn’t support this. Have you tried this yourself?

Thinking out loud… it’s possible to figure out the xor name of the data map for empty file data (since it is immutable data) and hard code it in the code for creating a new file… we would just have to keep it backwards compatible.

You’re welcome, happy holidays!

jlpell · December 21, 2018, 11:08am

Simultaneous read and write is a common use case. Consider the simple use of a text editor like Geany or Mousepad. I can open a file to view the contents in window A, then open the same file again in another window B and begin to make changes. Then let’s say I save those changes from B to disk. Opening the same file again in window C will show the change. However, let’s say I decided the edits were garbage. So now I go back to window A to click save. The file is now back to it’s original state. In this case three read/write file handles were active.

happybeing · December 21, 2018, 11:19am

The application doesn’t have to keep the first file open, so the handle which A has might be discarded after load and the scenario you describe would still work.

I’m really not sure either how useful these features I’m asking about are, or what the expected behaviour would be the more we dig into this.

I’m going to see if I can figure out what git is up to here, or at least why I’m seeing behaviour that doesn’t make sense to me yet.

Later we can come back to what is sensible etc., but if anyone can find references for what is expected (even tutorial style file system explanations could help), particularly wrt POSIX it could help a lot.

jlpell · December 21, 2018, 11:29am

In the above example, each new “window” is a new process and run of a different executable. There isn’t any IPC going on… just a posix compliant os/filesystem.

happybeing · December 21, 2018, 3:30pm

The relationship between File from fetch() and File from open() isn’t clear to me yet. They aren’t distinguished at all in the docs, but I have realised the latter has a different structure from the debugger and Hunter’s responses.

So fetch() returns a File object, which you are calling the file, and open() returns a File object which you are calling a file handle.

My question is really about if/how these can be used interchangeably and so on.

And my follow on question was: if I have a handle for a newly created file (returned by NFS open() with no args) am I allowed to pass that to open() again in order to open the file for read? You answer is that I can expect this to fail, so I’ll have to handle this in SafenetworkJs.

More generally, should open() only ever be passed the File object returned by fetch(), or where the file does have a datamap, is it ok to open() it using any related File object?

I’m just dotting 'i’s here. I think I’ve already got what I need so will not pester you over Christmas!

Thanks very much for spending time finding the answers to my questions @marcin.

EDIT…

I think its a useful addition and could simplify things later assuming we continue to provide the NFS API as a basis for fs style access. So I submitted a feature request as safe_client_libs #issue 719

Coming back to this which potentially feeds into that issue:

I’m not sure whether this is good or not. I’d suggest if we can sensibly mimic POSIX behaviour we try our best to do that unless it is unhelpful (ie seems to do the right thing but doesn’t actually work in practice). I’ll add a note to the issue so this can be considered.

happybeing · December 24, 2018, 3:15pm

@marcin an initial response from the git mailing list confirms that git may well open the file for read after writing the content, but before closing it, and that this is POSIX supported functionality.

Is it feasible to implement SAFE NFS so a File object that is open for write, can be opened for read (with the ability to read the content written to the buffer before it is committed to the network - ie before close)?

That ability would make handling this case (and POSIX compliance) simpler if that can be supported.

In the mean time I’m not sure how to implement this without implementing my own buffered file i/f on top of SAFE NFS which I’d like to avoid. Alternative suggestions welcome!

For now I might try forcing the file to be closed and re-opened, but that could cause other issues, so not a good solution.

marcin · January 3, 2019, 8:58pm

Hope you had a good holiday season @happybeing. I’ve been back at work but haven’t had the time to address your latest questions yet – will try to do so tomorrow or early next week

happybeing · January 3, 2019, 9:12pm

Yes, good thanks, and the same to you.

There’s no hurry but thanks for tracking this, I’ll be interested to hear back when you can.

BTW I’ve also learned a bit more about POSIX semantics which I thought I’d posted here but now see I didn’t yet.

The following quote and the article it is from show some of the challenges of doing this with SAFE without the buffering layer I don’t want to implement!

That is, writes must be strongly consistent–that is, a write() is required to block application execution until the system can guarantee that any other read() call will see the data that was just written. While this is not too onerous to accomplish on a single workstation that is writing to a locally attached disk, ensuring such strong consistency on networked and distributed file systems is very challenging. (ref)

marcin · January 11, 2019, 4:01pm

In the Rust FFI, fetch and open return two different things (as I mentioned before), whereas in the documentation you linked they both return a Promise.<File>. I therefore don’t think that fetch and open in Rust behave the same as in JS, so I can’t really answer this.

Whether a file can be opened for Read depends on whether it’s been written to. I have no idea what the JS version of open does, in Rust there does not exist an open without args.

Going to call for some help from Frontend for this one @hunterlester / @bochaco?

Thanks for writing it up on GitHub!

Currently Files opened both for Read and Write will have two different internal handles, one for reading and one for writing, both of which could point to entirely different data. What looks like one file handle is actually two! I think this is just confusing and it offers no performance benefit as opposed to opening two different Files for each mode, anyway. Keeping this behavior would only mimic POSIX in a superficial way, as POSIX only uses one file handle for both read and write, which is more flexible as you’ve mentioned.

So we either disallow opening a File in both modes to avoid this confusion, or we reimplement the internals to support this better, making simultaneous read/write work on the same data. I think the latter should be possible, as we use self-encryption beneath the NFS implementation and SE was designed to emulate POSIX as much as possible (see for example this comment).

If we want to be more like POSIX, there are some other deviations we would need to address – for example, write should return the number of bytes written, error handling would be different, etc. Some changes would be impossible with the design of the network, others would be fairly simple and get us closer to POSIX.

I believe it theoretically is feasible, yes. We would need to redesign the current NFS implementation, but I think the underlying self-encryption mechanisms support this.

To be fair, this might be more efficient than going through the NFS anyway, even if NFS supported simultaneous read/write of the same contents. By using a local buffer you would avoid the performance cost of self-encryption. It’s something you’d need to measure and get numbers on. (Of course, as you mentioned in the post above, local buffers would introduce their own problems with consistency etc.)

Hope that’s all clear and that I answered all your questions, if I missed any let me know