Self Encryption for large files


#1

self-encryption src/lib.rs:L187 says

/// The maximum size of file which can be self-encrypted, defined as 1GB
pub const MAX_FILE_SIZE: usize = 1024 * 1024 * 1024;

This would seem to be a temporary limit.

What is the planned largest file size, if any?

The data map may become very large if there’s no largest file size.

eg a 1 MB data map (assuming it only includes a list of chunk names each 256 bits ie 32 bytes) would be able to contain a list of 32768 chunk names, which would map to 32 GB of data. Still less than a bluray disc.

A 100 GB file would have a data map greater than 3 MB. This seems potentially problematic to me (considering all chunks are 1 MB max).

Some possible ideas for self-encrypting large files:

  • remove the 1 GB limit and accept data maps greater than 1 MB
  • data maps can be a list of chunk names or other data map names, ie large data maps may themselves be chunked
  • apps split large files into parts less than 1 GB before upload, then recombine them in the app (but this is ugly and will lead to many different incompatible implementations)

Related topic: RFC: Increase limits on file sizes in self_encryption

Just curious to understand the longer-term plans with large files and self encryption.


#2

What I understand, not much but at least a little.

In all the talk from ages ago, the idea was always that datamaps could be more than one chunk in size.

I suppose that means a datamap for the datamap. In a directory structure you take up multiple directory entries to store one file’s datamap.


#3

More or less, the datamap is just a smaller file than the original, so recursive self_encrypt reduces this each time. In the above case the 3Mb datamap would encrypt to a very small datamap. So very large files would end up slightly larger when you include the datamap.


#4

Having looked further, I can see that managing large data maps isn’t (and shouldn’t be) part of the self_encryption library.

safe_client_libs immutable_data.rs:L112 recursively calls pack() until the test for data.validate_size() passes. Large data maps will be ‘packed’ (ie self encrypted) until they’re small enough. This is the part of the code which ensures all chunks on the safe network will be less than 1 MB, which is what I was originally looking for.

data.validate_size() happens in routing immutable_data.rs:L67 and is the condition that self.serialised_size() <= MAX_IMMUTABLE_DATA_SIZE_IN_BYTES

Just wanted to post this clarification for future readers.


#5