Passing a parameter to FFI, 're-construct' from callback

ffi
rust
api

#1

I am trying to pass user data through a SAFE FFI function and re-use the data, but I can’t get it to work reliably. This is pretty specific code but it might not be about the details. Perhaps I am misunderstanding how the user data is supposed to work.

Basically, I am trying to move a parameter to the FFI and re-construct it in the callback the FFI calls for me.

use ffi_utils::FfiResult;
use neon::prelude::*;
use safe_app::App;
use safe_app::test_utils::create_auth_req;
use safe_core::btree_set;
use safe_core::ipc::Permission;
use std::collections::HashMap;
use std::ffi::CString;
use std::ffi::CStr;
use std::os::raw::c_void;

fn app_pub_enc_key(mut cx: FunctionContext) -> JsResult<JsUndefined> {
    // Turn JS array buffer into App pointer
    let app = cx.argument::<JsArrayBuffer>(0)?;
    let app = cx.borrow(&app, |data| { data.as_slice::<u8>() });
    let app = u64::from_ne_bytes([app[0], app[1], app[2], app[3], app[4], app[5], app[6], app[7]]) as *const App;

    // Prevent dropping context, get pointer.
    let cx = Box::new(cx);
    let cx = Box::into_raw(cx) as *mut c_void;

    dbg!(cx); // check if pointer is same on the other end

    unsafe {
        safe_app::ffi::crypto::app_pub_enc_key(app, cx, o_cb);
    }
    extern "C" fn o_cb(
        user_data: *mut c_void,
        error: *const FfiResult,
        public_key_h: safe_app::ffi::object_cache::EncryptPubKeyHandle
    ) {
        // Construct the construct again
        let mut cx: Box<FunctionContext> = unsafe { Box::from_raw(user_data as *mut _) };

        dbg!(user_data); // Pointer is same as on other side! Woohoo!
        dbg!(cx.len());  // Sometimes correct, sometimes gibberish

        // SEGFAULT!!
        let f = cx.argument::<JsFunction>(1).unwrap();
    }

    Ok(JsUndefined::new())
}

Perhaps @nbaksalyar, @marcin or @Fraser could spot in a second what is wrong with this. If not, don’t worry, this is a very specific question and it’s probably because of my incompetence with Rust. I’ve spent a fair amount trying to get it to work, but my last resort for now is reaching out to some people that have more specific knowledge about Rust and FFI.


To explain a bit about the context of above code. I’m learning Rust and thought I’d try creating a Node.js addon that is a binding to safe_app::ffi. This is done with Neon.

The mechanism I used above does work with another function when I test it. But I think it’s because I’m lucky the memory is not overwritten or something (https://github.com/b-zee/safe_app_node/blob/master/native/src/lib.rs#L71). So, basically I have two functions made, both using the same mechanism to pass the context via FFI to the callback.

Running the function above from Node.js gives something likes this now:

[src/lib.rs:35] cx = 0x0000561cad3b34f0         # }
[src/lib.rs:47] user_data = 0x0000561cad3b34f0  # } same!
[src/lib.rs:48] cx.len() = -1388704320  # Sometimes correctly prints 2
Segmentation fault (core dumped)

#2

Hi @bzee! This happens because FunctionContext is a trait, and hence it is a fat pointer, so actually it carries some extra data with it – on 64-bit systems the pointer size will be 128 bits, and user_data is expected to fit in only the usual system pointer size.

What you’ll probably need to do here is to wrap it into 2 boxes: the first one will contain the fat pointer, and the second will point to it. I.e., something like this:

// Prevent dropping context, get pointer.
let fat_ptr = Box::new(cx);
let cx = Box::new(fat_ptr);
let cx = Box::into_raw(cx) as *mut c_void;

and then unpack it in a similar way.


#3

Thanks @nbaksalyar. Good catch! I have actually had that in before, but thought it wasn’t necessary. I took the code from others that were doing a similar thing. I read up on fat pointers, and it seems quite important to get this right.

Anyway, I changed the code, unpacking it like this:

let mut cx: Box<Box<FunctionContext>> = unsafe { Box::from_raw(user_data as *mut _) };

but it yields the same result. :frowning:


#4

Hmm, that’s weird :confused: Can I maybe look at the full source code?

I’ve made a simple example on the Rust Playground you might want to check – it seems to be working fine:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6adc8588298bad62821f910c6806ab8c


#5

The full source of the project is here: https://github.com/b-zee/safe_app_node

It uses Neon to compile (it calls Cargo internally). I included some information in the README.

There is an index.js file in the root, running it with Node.js: node index.js will call the app_pub_enc_key with the App from test_create_app.


#6

Interesting. That’s quite the same what I’m doing indeed. However, note that I have two functions, where only the second one is causing me trouble. Calling test_create_app works fine, basically using this method. Calling the second function app_pub_enc_key does not. I can’t figure out why.

By the way, I guess that the function call (say_hi) wouldn’t cause a segfault even if the underlying struct is freed/corrupt. I can be wrong about that. I say this because in my example calling the len() for FunctionContext actually does not cause a segfault (though it doesn’t give the right result 90% of the time). The segfault appears when calling another function (argument) that presumably tries to access memory that is freed/corrupt/inaccessible somehow.

Edit: I’m starting to think this might even be a bug in Neon. Perhaps the function context is freed by Neon and is not supposed to have a long live or something.


#7

Ah everything makes sense - my best guess is that the FunctionContext is dropped when you go out of the scope of the first function (now that we removed the part about Box::into_raw).

However, I tried to pack a fat pointer produced by Box::into_raw(Box::new(cx)) and then dereferencing it, and there was no luck – so it seems to be a thing with lifetimes in Neon, because when you move the code from the callback function into the upper scope, it all works without a hitch.

I’ll try to look into this more this evening.


#8

Dave Herman is really helpful and responsive. He may be able to help you quickly if this is a Neon-specific issue, over at the Slack channel.


#9

That was a great suggestion, @hunterlester. I immediately decided to ask around. These are the helpful responses I got:

@apendleton:

I think this: https://github.com/geovie/rfcs/blob/0f1963c1010253408229f5d0d3ee0cc7049765fa/text/0000-threadsafe-callback.md might be relevant, but isn’t done yet

@kjv:

@b-zee Have you looked at the neon Task API? You will need to handle the C ABI -> Rust callback, but that will handle the JS side of the callback:

https://github.com/neon-bindings/examples/blob/master/fibonacci-task/native/src/lib.rs

Unfortunately, there isn’t a Neon API to support async C callbacks. You would need to block and use up a LIB_UV_THREAD.
Non-libuv callbacks isn’t something that I’ve done before. I’m not sure off-hand how that works in node.

Perhaps unsurprisingly, the problem line in your code is the line with the unsafe
let mut cx: Box<FunctionContext> = unsafe { Box::from_raw(user_data as *mut _) };

That context is no longer valid as soon as neon returns control of the VM at the end of the method.

I think @apendleton is correct, that RFC would make this integration very nice and clear. However, it can still be accomplished.

You need to implement Task. You can create a one shot channel to block on while you wait for the callback.

If I get a chance, I’ll try to throw together a quick PoC to show what I’m thinking.

Actually, looks like the tests for safe_app show how you might do this synchronously, https://github.com/maidsafe/safe_client_libs/blob/9a2e4f4fe1fbbcac8d3d998eb3d122a60e4c188d/safe_app/src/ffi/crypto.rs#L960-L1005

So, perhaps with ‘Task’ and/or the use of mpsc to make this synchronous(?) it could work if I’m understand correctly.

Edit: using the call_1 helper wrapper I made it a blocking function. That at least works. Putting that blocking function into a ‘Task’ should allow me to make it ‘asynchronous’ from a JS perspective.
I guess my initial issue has to do with the single-threaded nature of Node.js.