Writing a core library in Rust --- possibilities for SAFE Browser (WASM)

wasm
rust
safe-browser

#1

Last year there was quite some interest on a topic about WASM and SAFE, and now I’m starting to read more and more about WASM and its possibilities. Also in the context of Rust and the SAFE Network. I’ve been thinking about some infrastructural, secure ways to use Rust and make it useful in NodeJS and the browser.

As Rust seems more robust and secure in comparison to the NodeJS/NPM ecosystem, Rust would be ideal to create core libraries. (Just like MaidSafe builds the core libraries in Rust and uses all sorts of bindings to make them useful in other languages.)

Let’s say a Rust library is written that communicates with the SAFE Network (with the Rust safe_app dependency). Would there be a way to make use of that library within the SAFE Browser? The Rust library can be compiled to WASM, but the browser/JS is like a sandbox that blocks network calls and such. That renders the library useless. From Rust, JavaScript can be called, so in theory a Rust library could call the SAFE API in window.safe, but that JS API is a very limited API in that it abstracts away a lot.

Would anything like the above be possible and feasible for the SAFE Browser? I saw some mention of something like this in @joshuef’s topic about the SAFE Browser last year:

WASM could be great (using built rust code directly in the browser would remove a layer of abstraction / unify the API interface; instead of having safe-node-app and window.safeXXX interfaces which are inconsistent/ more things to maintain).

The future of WASM is promising, it opens more reliable libraries up to the web development world.


#2

Yes, I’d like to spend time again working a safe_app_wasm experiment that I paused back in March.
What I’d like to solve next for it is to replace our memmap dependent functionality with some kind of native WASM memory mapping IO. Same for our fs2 dependent functionality.
Both of these crates require either Unix or Windows target families.

Not sure how long we’d be waiting for those future features. The other option suggested by Alex Crichton is to put effort into customising those crates for WASM.


#3

Rust + wasm is the only chance I have for being a safe app dev… I find most javascript to be incomprehensible…

interesting example and commentary…


#4

How far along did you get with that? I’ve been spending a pretty large amount of time with WASM and Rust.


#5

I came across your issue while reading up on all the various pieces relevant to our challenge. Finally I’m starting to understand the complexity of this puzzle. I’ll lay out my journey here. :world_map:

Briefly, if I understand correctly: safe_app relies on a few crates in its dependency tree that have code written for specific targets, except for the wasm32-unknown-unknown target. These are packages like fs2 and memmap that interact with low-level OS APIs.


Braindumping :brain: the possibilities for memmap:

  • I found a ‘WASM-enabled’ crate that also implements allocation for Unix and Windows. The SAFE library self_encryption would then use that instead of memmap I assume. (I’m no expert at Rust and the SAFE libraries so forgive me if I’m incorrect to assume anything here.)
  • Else you would need to write a patch (or submit a PR) for memmap to support a WASM target. (Like Crichton mentioned.)
  • Or wait for the WASM/Rust ecosystem to mature and expect everything to work in a few years.
  • Implement the sequencer without memmap if that is at all possible. @pierrechevalier83 (The sequencer seems the only piece of code that safe_app relies on that uses memmap)

For fs2 things are a little more confusing to me. From WASM apparently the libc from Emscripten has a virtual filesystem. fs2 relies on libc only for Unix, so might be easy to fix fs2 to rely on libc for WASM too.


Then, from my understanding JavaScript (ECMAScript) does not support any networking natively. Of course, this is because browsers won’t allow direct interaction with TCP/UDP from within the JS sandbox (similar to fileystem interaction described before). This means any WASM won’t natively have a sockets API.

Is this something you considered, @hunterlester? I’m not sure how this affects the feasibility of a safe_app WASM library…


Taking all the above into account I am a little less optimistic about the ease of what I described in the OP :thinking: .

But, I have though of an alternative, be it a bit complex. :bulb: This would be to have a library similar to safe_app_nodejs, but instead of abstracting away the safe_app::ffi API it would expose the API fully as a module. (This might be done with node-ffi like now, or as a Node.js Addon (perhaps with Neon).)

This module would be imported from JS (or WASM) as import safe_ffi from 'safe_ffi'. Where safe_ffi then equals the safe_app::ffi namespace from Rust.

From WASM this module can also be imported and called. If the module is a Node.js Addon the performance should be quite good.

Now, assume a simple Rust library is written that depends on safe_app. I assume targeting Linux/Windows would result in linking to safe_app statically or dynamically (not sure exactly). But, targeting WASM the library should ‘link’ to the JS module. This might be done in Rust with cfg_if like here in wee_alloc or other mechanisms (not sure exactly). If targeting WASM it would opt to rely on externally defined functions that are exactly like safe_app::ffi.


I welcome any feedback or correction on what I wrote. I would love to further pursue this as a side project. Rust excites me a lot more than I expected. :slight_smile:


#6

I attempted to create a Node.js Addon with Neon, but I could not get a proof-of-concept to compile. I get into a linking error when depending on safe_app:

  = note: /usr/bin/ld: /tmp/rustcSyC92a/librust_sodium_sys-16d6eaeb0b9b9a35.rlib(librdrand_la-randombytes_salsa20_random.o): relocation R_X86_64_TPOFF32 against `stream' can not be used when making a shared object; recompile with -fPIC
          /usr/bin/ld: /tmp/rustcSyC92a/librust_sodium_sys-16d6eaeb0b9b9a35.rlib(libsodium_la-randombytes.o): relocation R_X86_64_PC32 against symbol `randombytes_sysrandom_implementation' can not be used when making a shared object; recompile with -fPIC

I filed an issue here: https://github.com/maidsafe/safe_client_libs/issues/722

This probably is related to Neon. I hope someone from @maidsafe has any hint as to what this might cause and whether it’s solvable.


Edit: I found it to have nothing to do with Neon. That makes it a little less specific. So, the issue boils down to not being able to create a dylib crate that depends on safe_app on Arch Linux.

Edit: Issue was solved by providing a specific environment variable to disable ‘PIE’ so rust_sodium can be used within a shared library on Arch Linux. I’m no expert, but I sort of understand the implications of this.


#7

Wasm would be a much better alternative to native/neon i think, as there’s no need to download a precompiled platform depended binary, we could maybe even use the same wasm file in a web browser. It might be easier to use wasm with the rust wasm_bindgen crate than neon.

But we would still need a way to communicate to the vault or some kind of service/daemon, maybe by using a simple tcp connection and use a simple text (or even binary) protocol to communicate to the safe browser?


#8

Yielding to correction from my seniors, my understanding of the purpose of anonymous memory mapping in self_encryption is to prevent memory leaks.

Referencing libc docs on freeing memory allocated with malloc:

Occasionally, free can actually return memory to the operating system and make the process smaller. Usually, all it can do is allow a later call to malloc to reuse the space. In the meantime, the space remains in your program as part of a free-list used internally by malloc.

Very large blocks (much larger than a page) are allocated with mmap (anonymous or via /dev/zero) by this implementation. This has the great advantage that these chunks are returned to the system immediately when they are freed. Therefore, it cannot happen that a large chunk becomes “locked” in between smaller ones and even after calling free wastes memory.

This is why a memory map is created for files over 50 MiB instead of utilizing a Vec which is allocated on the process’ heap segment, then at most 50 MiB of memory, even when freed, would remain in the process.

Can WASM provide the same memory efficiency?
As far as I can tell, so far no, with possible future consideration.

According to wee-alloc:

wee_alloc will never return freed pages to the WebAssembly engine / operating system. Currently, WebAssembly can only grow its heap, and can never shrink it. All allocated pages are indefinitely kept in wee_alloc 's internal free lists for potential future allocations, even when running on unix targets.

@bzee has revived my interest in Neon and it looks to hold more promise than WASM for our needs and to remove our dependency on Node-FFI. Also significant performance gains over FFI, which would most likely be noticeable for file encryption.

@bochaco @Krishna Perhaps we can revisit this topic and look at how development could fit into our milestones and priorities.


#9

in addition to my last post: i, for some reason, thought that there is no way to impl the safe client in wasm, but it should be doable, but it needs a few requirements:

  • websocket and/or webrtc support for the vaults (or the “proxy” vault), thus it couldn’t operate as a full vault, but there are other easons to not do that like no direct file access. But node could use tcp/udp, so a it could even operate as a full vault.
  • that would require that the low level io primitives can be swapable
  • compiling and linking of c/c++ dependencies, i have no idea how hard it is to cross compile (clang?) and link (lld?) those libs to rust.
  • encryption will be slower (at least if you can use native CPU extensions for encryption on native builds) in wasm, but this shouldn’t be a hard problem for the client lib (?)

but that’s just optimization, and as you’re writing, will be available in the future. the wee-alloc is also a very simplified allocator, its primary goal is to save on every bit of generated code, it shouldn’t be taken as a reference.


#10

Do you mean implementing the stubs Rust uses for std::net etc?

This is interesting. Do you mean compiling/linking to WASM instead of Rust?


#11

no, because we would need async io (crust is already using tokio for the network io, but this would also need async filesystem io), std::net is sync.

edit: we actually don’t need async io if we get multi threading instead.

i mean compiling the c deps (eg libsodium) to wasm and then linking it to the wasm-compiled rust, so that we end up with a statically linked (thus self contained) wasm file.


@hunterlester has a point, as this would need (i guess) quite some changes in the code base. Just compiling the code as it is now and adding the ffi layer to node on top might be a better/quicker solution for now.


#12

It’s a very important aspect you brought up. This is actually the challenge I’ve been meaning to understand. A lot of crates depend on the libc crate, which links to the OS’s C Library.

The libsodium example you mention is interesting, Alex Crichton (Mozilla) tried to get it working for wasm32: wasm-sodium. It’s interesting to read how he solved it, though it’s a bit of a hack. Especially the libc stub that’s somehow needed by the real libsodium library that still depends on libc.

Anyway, the above highlights one of the challenges of depending on these libraries.


#13

@torri I really appreciate your input. I’d be disappointed to put the nail in WASM’s coffin for our needs.
As it so happens, I’ve been informed that Rust 1.32.0 removes jemalloc, thank you @nbaksalyar, as the default allocator in favor of default system allocator.

Next I need to research common system allocators which hopefully behave as libc’s malloc, which upon meeting a threshold, will automatically make a system call for memory mapping instead of allocating to heap segment. In particular I want to study Windows’ CreateFileMappingA.

We may be able to remove that logic altogether from self_encryption, which would be a great way to start paving the way for WASM.


#14

Not that I’ve experience with WASM, but there seems to be some truth in the following book :slight_smile:


#15

But this shouln’t change anything for wasm, as

  1. there is no system allocator in wasm, the app needs to bring its own allocator
  2. the default allocator in wasm is https://crates.io/crates/dlmalloc

there are no syscalls in wasm (at least not at the operating systems level), the whole point of wasm is to have a sandboxing layer integrated into the “language”/ISA, but there are “syscalls” to the runtime environment like grow_memory. Also i don’t get why a mmap would be better than just “growing” the memory (particularly in the case that the runtime environment can use mmap-allocs as an optimization).

also: https://github.com/WebAssembly/design/blob/73cb0e6e379e473071533f14437e1516dd7e94c8/FAQ.md#what-about-mmap


#16

I’m not sure I’m understanding but we may be getting around to the same point.

What I’ve been focused on is the point that self_encryption depends on the memmap crate which makes it not possible at the moment to compile for wasm32-unknown-unknown, as memmap requires the target platform to either be *nix or Windows.

What I’m saying is that self_encryption may not need to depend on memmap anymore with the release of Rust 1.32.0.
So self_encryption may be able to get rid of this logic, leaving heap alloc vs mmap decisions to be made by default system allocator, removing the memmap dependency and making it, hopefully, possible to cross-compile for WASM.

Good question. I don’t know, I’m trying to think through it.
Say a CLI app that’s only purpose is to run batch data (ImmutableData) uploads to the network and be a short-lived process, probably would be fine to just grow memory.
However, a safe_app instance could be a part of a long-lived process, such as in a background service/daemon, a network browser, or a binary that serves a browser extension, as I’ve been researching lately.
I’d think we’d need smart memory mapping to avoid memory leaks in those cases.

According to wee_allocator doc:

WebAssembly does not have any facilities for shrinking memory, at least right now.

I’m confused then, where is it you are reading that WASM can use mmap-allocs?


#17

I’m no expert, but I think memory-mapping and allocating are distinct concepts. Swapping jemalloc for the system default does not influence memory-mapping. Memory-mapping is about virtual memory that is mapped to a file in a file system. Allocating is for the heap. So, on embedded systems you would allocate, but wouldn’t be likely to have memory-mapping at your disposal.


I think Tokio uses std::net internally. It just adds futures (tasking) so the code is cleaner in a way. Also, std::net was just an example. There is a whole lot more of standard library stuff that is stubbed for WASM. I’m afraid one way or the other a lot of dependencies depend on code that will be stubbed for WASM.

That means that even if you get everything to compile, it would not work when executing the code.


#18

Ahh ok, so you’re referring to “real” file-mmapping, i have started writing a 2. post about that, then deleted it :smiley:

I think streaming the file in chunks should be enough for self_enc as it shouldn’t need to access the whole file at once? That would save some allocated memory. But we would still need to live with the copy-overhead, at least as long as there is no wasm OS like nebulet, which can inline and zero-copy IO.

And im confused too then ;D

Why shouldn’t it not be able to use mmap-allocs? it just means that the memory is allocated lazily on memory access. Freeing is an other part of the system, not the mmap by it self?


#19

It’s using mio: click

Why do you think that? If we stub everything for WASM, then there is nothing lift to not work :smiley:

C deps can be a problem, as you already have pointed out in the other post, but how many c deps do we have? the c libsodium impl is critical in that regard, as that there is noone with the knowledge to port it to rust (crypto is hard!).


#20

I’m just learning, so if I’ve got it right, in the case of self_encryption, anonymous memory mapping is used, so it’s not backed by a file on disk and therefore not dependent on an existent file system. It’s backed by a zero-allocated address space in-memory.

The way the allocator influences memory mapping is if it is programmed, like malloc, to create a memory mapping instead of allocating to heap segment, above a certain threshold, and second to return freed memory to the system. As far as I’ve read, jemalloc does not follow the same behavior, which is why self_encryption previously needed to contain that logic which jemalloc doesn’t handle.