Uploading data API

What are the options (specific APIs) for uploading data (ie not files stored on disk)?

From my research so far I have identified:

  • Data <1MB FilesApi::get_local_payment_and_upload_chunk() will pay for and store a chunk (<~1MB).

  • Data >1MB I’m not aware of an API that will store data larger than a Chunk (other than from a file).

Missing APIs
I believe we need some new APIs for encrypting data in memory. The equivalents of the following for for a block of data rather than a file or file-tree:

FilesUploader
start_upload()
ChunkManager
chunk_file()
encrypt_large()

Without this developers will write data to a file and upload that which is a security issue.

Issue raised:

I’d be interested in understanding this too.

Currently, in a POC for sn_httpd, I’m just storing data to a temp file, then uploading it. sn_httpd uses very similar code to sn_cli atm (as it was used for inspiration). FWIW, it does look like we should already be able to pay/upload chunk by chunk (sn_httpd/src/main.rs at 4bab513e7b43178e9fd7d16034d7c3e2d927e2e3 · traktion/sn_httpd · GitHub) with a bit of splitting/re-ordering of calls.

It feels like we should be able to stream data uploads, much like we can with downloads already (e.g. sn_httpd/src/main.rs at 4bab513e7b43178e9fd7d16034d7c3e2d927e2e3 · traktion/sn_httpd · GitHub).

It looks like self_encyption is happy to work with bytes (self_encryption/src/tests.rs at 431382c8ad2edd068d5cfbfa676af058c1ffd685 · maidsafe/self_encryption · GitHub) and also streaming from a file (self_encryption/src/tests.rs at 431382c8ad2edd068d5cfbfa676af058c1ffd685 · maidsafe/self_encryption · GitHub), but not from other streams.

So, it looks like the self_encryption library would need to be improved to support streaming uploads from memory. However, it should be fine if the total data is stored in memory.

Given the total (file) size is also needed to know how to chunk the file, that could pose challenges (for streaming) too, i.e. if the total files size < 3 * MAX_CHUNK_SIZE, it is handled differently to those which are > 3 * MAX_CHUNK_SIZE. So, for now at least, the total data length would need to be known in advance (at least for aforementioned ranges).

1 Like