Proposal to change implementation of SAFE NFS

proposal
api

#1

Having worked a bit with the NFS implementation I’m finding that using keys for the path is likely to be very wasteful of mutD entries, and have an impact on both application performance and implementation difficulty.

For example, say we have a bunch of files in an NFS mutD container with path:

/my-docs/invoices/may/03.txt
/my-docs/invoices/may/04.txt
/my-docs/invoices/may/05.txt
/my-docs/invoices/may/06.txt
/my-docs/invoices/may/07.txt
/my-docs/invoices/may/08.txt
/my-docs/invoices/may/09.txt
/my-docs/invoices/june/0124.txt
/my-docs/invoices/june/0125.txt
/my-docs/invoices/june/0126.txt
/my-docs/invoices/june/0127.txt
/my-docs/invoices/june/0128.txt

If I rename any individual file, it uses up a new entry and leaves the old one effectively unusable because it won’t ever be used again unless a file with the identical path and name is created in future.

Worse, if I rename any part of a path, every entry which has a key matches up to the changed ‘directory’ will take up a new entry and leaves another unusable.

So in the above example, renaming ‘june’ to ‘June’ consumes five entries (one for every document under /my-docs/invoices/june). Renaming ‘invoices’ to ‘2018-invoices’ creates a new entry for every single invoice in 2018.

This not only makes 1,000 entries per mutD likely to be used up fast, but means that applications that operate on those entries will have to search many more entries than the files they are used to managed (because deleted entries have to be checked somewhere).

I think this will make NFS impractical for fairly mundane and common use cases, because having very large numbers of deleted entries still takes time to check and sort through them.

I understand that changes to implementation, based on RDF are being considered, so I’d like to suggest some discussion about ensuring that entries are re-usable for most common file system operations (and other use cases), by placing mutable aspects such as the path in the value rather than the key.

If NFS implementation is not part of changes related to RDF I think we should re-consider how it is implemented.

I’m not sure of the implications of this kind of change, and imagine there were good reasons to want the key to correspond to a file path, but I don’t know what they would be.

Any thoughts on why using keys as path is important and useful, please post them so we can factor that into the discussion.

cc @bochaco @joshuef @krishna


#2

I think this is more related to how we are gonna handle/support storing data in perpetuity rather than deleted entries (as they wouldn’t be deleted :slight_smile: ), as every version of a “mutated” piece of data shall be kept and indexed, so yes, regardless of the data representation (current NFS convention or RDF) an efficient way for looking it up will need to be considered.


#3

Indeed, the problem arises because (for good reason) entries are not actually deleted, but reset/cleared.


#4

3 posts were split to a new topic: Should All MDs have all mutations recorded (Keep All Versions)