Remote Node Control Experiment in Rust Using MaidSafe's qjsonrpc Crate

Hey all again!

So, as per this thread, it seems like the node cleanup policy from the CLI perspective is to just kill the node outright. This isn’t the most elegant, but it works. That said, @bochaco mentioned that, ideally, it might be useful to some sort of interface over which a node could receive commands (from the localhost or otherwise) to do such tasks like shutting down, pinging, etc. They pointed to the current implementation of sn_authd as an example of how the qjsonrpc library is currently used to control the authenticator daemon in just such a fashion.

This seems to me like a good opportunity to play with and provide some feedback on the qjsonrpc API in a slightly different context (since authd happens to be the only consumer of the library at the moment). I figured, I’d write a little program to implement a qjsonrpc client/server model for passing messages and report my findings.

The goal of this experiment is mainly to find and iron out any sticking points in the API and maybe end up with a chunk of example code for the library. But because I like to aim high, I’m also writing it so that, if it reaches sufficient maturity, it might be spliced into the existing code with relative ease. Maybe one day I can feel the sweet satisfaction of typing safe node shutdown and seeing my node elegantly shut down without being axed by my OS like a blood sacrifice to the computing gods.

Anyway, over the past few days I got a contrived example running, which you can see here. It’s still very much a work in progress, but it’s there if you’re curious.

Implementation

Since the node has no public API so to speak of (e.g. you can’t make a node do much other than node.run() even if you do have a reference to node), I figured the best approach would be to develop a message passing system. The program right now is comprised of three actors. There’s a client, the rpc interface manager, and a minimal/faked node service. The client sends a message to the manager, which fields the request, wraps it, and places it in a pipe for the faked node service to attend to. Once that query is serviced, the response is placed on a different pipe back to the manager, which fetches the cached connection and forwards the response back to the client. One can imagine we could replace the spoofed server/node process with an actual Safe node, and we’d have our desired interface. This way also wouldn’t require any api changes for the node itself, which is nice and flexible.

Currently, it only works for the single test case of a client sending a single ping and waiting for a response (I haven’t gotten around to implementing the exit logic yet, so any other configuration would lead to the manager exiting too early or never shutting down probably. It shouldn’t take too long to fix that, but I digress).

Update 1: Initial implementation is working (albeit maybe not pretty yet), and it works as follows. In a test case, a client pings the server, receives an ACK, then sends a shutdown, and receives a shutdown ACK, at which point we consider the test passed. It sounds straightforward, but a good bit goes on under the hood as you might imagine. At this stage, I think it’s a pretty nifty PoC!

Notes/Ideas I had While Using qjsonrpc

Here’s a few things I noticed along the way. Nothing major, but I figured I’d start a list here so I don’t forget later. After I do a bit of tweaking here and there on the WIP code, I’ll get back to this and take a swing at some of these items.

  • Error codes are a bit cumbersome right now since each error code is hardcoded (e.g. my implementation and the current implementation in authd use just a single error code to report every error under the sun to the client). It might be cool to have a custom #[derive] statement which associates each item in an Error enum with a unique error code. Maybe allowing for custom prefixes (e.g. all the 4xx constitute one error code category, etc.). That way, when reporting errors to the client, it would be quick to convert your Error to a qjsonrpc error code. It would also save on a lot of hardcoded typing/maintenance effort.
  • Type names are awfully similar sometimes. For example, I personally found the IncomingConn and IncomingJsonRpcRequest and similar formations weren’t intuitive for me. That said, after reading the source and writing my implementation I see why it’s called that. I don’t know if this is an “issue” (and I admittedly don’t have an alternative off the top of my head) but it’s worth pointing out.
  • Passing serde_json::Value parameters to send() and similar methods seems like it could be simplified perhaps? Maybe using some clever templating, but maybe it’s already been considered.
  • The Endpoint objects are still pretty low level. Calling bind(), listening for connections, and iterating requests feels like working with unix sockets. Since JSONRPC 2.0 is a client server model, providing an abstracted Client or Server type similar to reqwest::Client or others would reduce boilerplate in a lot of non-specialized cases.

Conclusion

I’ll keep updating this post as I go. Feel free to offer feedback if you want. I just wanted a place to organize my thoughts a little and share a bit of toy code I wrote :slight_smile: .

8 Likes

I did a lot of cleaning up and got an async rpc service using the qjsonrpc which abstracts out the gory details from the simple server process. I created a PR here that puts it as an example in the qjsonrpc library. Depending on feedback, maybe it’ll get pulled in, but we’ll see.

More thoughts on jsonrpc.

  • It looks like the type JsonRpcRequest doesn’t allow a request to have null id field, even though the JSON RPC 2.0 spec allows for it in the form of a Notification if I’m reading this properly. It shouldn’t be too hard I think, and would allow a wider range of client types for qjsonrpc.
  • Similar to the above, serializing a “batch” of requests isn’t yet implemented from what I can tell.

Finally, I did some cursory looks into the following two points above

I’m going to maybe toy around with some of the smaller improvement ideas first, but I think that, based on the example program I wrote, writing a generic implementation of the simple async qjsonrpc server should be possible, and may be a suitable abstraction for a Server.

The tricky part might be defining a sufficiently generic request/response type which could be used. I’ve not looked into it too deeply, but I think some traits and clever macros should make it doable. The tradeoff is maintaining macros isn’t always the simplest thing, so maybe best to wait for the API to mature a bit more before committing to maintaining a macro for it.

Anyway, that’s what I’ve got. Nothing major, mostly just my random musings, but it’s been a fun little side project. This could’ve been an update, but I guess there’s a limit on editing the OP.

7 Likes

This is very cool to see happening. It can hopefully be extended in the future to query stats from the node which should remove the need to parse them from the logs. The bitcoin jsonrpc api is pretty extensive and gives some indication of where this feature could go in the future.

4 Likes

I agree it’s a welcome and interesting innovation and at the same time I’m concerned!

What might be the security implications here? One might be DDoS but I think that isn’t specific to this case as nodes are exposed in other ways.

So I guess a risk might be in using whatever command processing is available to change node behaviour. That could be a threat to individual nodes or their owners’ devices, or en masse might open up a network attack. Again the same might be true in general, so I guess the issue is whether this capability is as secure or if there are ways it can open up vulnerabilities which aren’t otherwise present.

I honestly don’t know if these are real concerns but seems worth raising the possibility early on. I don’t want this to derail or discourage exploring and building though, because this can be a useful feature that makes it easier to manage nodes and enhance usability which is definitely desirable. I can’t help myself though!

Great work @Scorch!

4 Likes

As far as security is concerned, it’s using the same interface as authd does and connections are encrypted using TLS. The connection itself then, we can trust probably as far as we trust authd. Which I’d say is pretty far at this point, given that we trust authd already to manage app permissions, tokens, and safes, among other things. It’s also worth pointing out that it’s possible to include (and authd already does this) added information like passphrases etc. that could be used to seal the interface off.

More or less my thoughts exactly. The rest would depend on exactly what commands are available eventually. This would probably be a debate to be had on a per-feature basis, as people propose new functionality. Perhaps in the case of really sensitive information, like personal identifiers, I imagine something like conditional compilation would need to be used to ensure only certain binaries (e.g. maybe used in a test net for debug or something, where the stakes are low) would include more “dangerous” commands.

That last bit is a bit of speculation, but I guess I’m getting at the idea that having an interface is rather benign in and of itself (as pointed out by @mav earlier, bitcoin already has something similar). It’s the sort of API that’s exposed through it that would need to be scrutinized.

If this goes anywhere, it would be worth it to start a thread(s) to publicly ask for what types of functionality people can think of that would be useful/safe to expose, and debate it from there.

4 Likes

Resource usage and logging.
It’s supposed to be an autonomous network so unless there is a problem with resource hogging, we should just let it run - or stop it, gracefully for preference, especially if that would simplify code elsewhere.

I think we can secure the interface by requiring signatures to the commands, so perhaps when launching the sn_node you either set a PK or have the sn_node to give you a keypair that you use to sign each command you send to it?? the connection is already secured with TLS as mentioned, so it’s just about using any sort of authentication to accept the commands and invalid ones can be rejected quickly by the sn_node??.

I always imagined being able to run my sn_node on a remote location, and be able to not only monitor it but also change settings to it remotely, like reward wallet, amount of resources to share. If when setting it up I get a key pair that instance will then verify commands are signed with, I can simply carry the corresponding SK with me to get access to it remotely. Moreover, perhaps one day even tunneling these commands through Safe so I can do it from truly anywhere, as I could use its Safe URL, even create an NRS URL for it, then I could open Safe browser with safe://<my-sn-node-xorurl>/get-stats…?!..and you see something like: Validator 56628 - Open Source Ethereum Blockchain Explorer - beaconcha.in - 2024

5 Likes

I had imagined that these are parameters that Jim Collinson would work his magic on as part of the setup

2 Likes

Just a random little update. So the PR isn’t merged yet, but some of the comments have already led to some more ideas for qjsonrpc.

More Example Code

Since the original PR was also intended to be a proof of concept for the node rpc interface, it was a bit more complicated. As per some of the comments, I wrote an even simpler example and opened a new PR. Between these two examples, I think it’s a pretty comprehensive overview of qjsonrpc usage :slight_smile:

Type Safety

While writing this example, I was trying my best to abstract out some of the connection details from the consumers of the RPC interface. This way we can do all of our type-checking at compile time, instead of at runtime (e.g. Errors like when mistype a hard-coded method name and don’t realize until the server sends back an error).

The server process is pretty much entirely abstracted out, so it only needs to worry about structured data in the form of enum Query instead of working with the raw JsonRpcRequest and JsonRpcResponse types. Unfortunately, the client still has to do risky things like send("method-name", json!(value)).

In response, I’m in the process of playing with a new trait idea that would give access to a type-safe qjsonrpc API. The traits would be something like a StructuredRequest and StructuredResponse which provide reliable conversion to and from JsonRpc types, and the api might be something like send_structured() or send_checked() in addition to the existing send()/get_next() functions. It might add some more flexibility. Once the examples are merged, and I get a chance to do some testing/debugging, I’ll open a PR for this as well I think

3 Likes

I just reviewed, tested and merged this one, it’s very nice @Scorch , it shows very clearly and simple how to use the API, it’s so simple we will be able to even put that code in the README moving forward I believe.

3 Likes

Thanks, I’m glad it could be useful!

I’m almost thinking it’s not worth it to merge the other PR as an example (for the more complicated version). Instead, it could be rolled in with the Structured traits, and included as a convenient, library-provided abstraction. Something like RpcDaemon?

That would effectively move the RpcDaemon module out of the example and into the lib, and then leave the async_server example with just a main.rs, query.rs, and response.rs. Does this seem plausible you think?

1 Like

I was also thinking about that, I think we should draw the line of what this crate is and provides, which I guess it’s the foundations for JSON-RPC over QUIC. Then other layers like the RPC daemon could be part of the lib but as you say maybe more clearly as a second upper layer, perhaps within its own mod namespace and feature-gated? or even a separate crate and keep this crate to be just focused on the protocols and its messages rather than how the user creates a client or a server? …?..just thoughts as you can see

1 Like

That’s a good idea, agreed.

I’m wondering if think this might even be two feature flags (Actually, one flag and one crate).

The idea of a StructuredRequest and a StructuredResponse seems like a convenience for constructing & parsing JsonRpcRequest and JsonRpcResponse reliably and with some better compile-time guarantees. This doesn’t add any functionality or anything new, but it might stop you from shooting yourself in the foot. That sort of “feels” like a feature flag to me.

Especially because, even if there were no StructuredRequest trait, it could be easily recreated (and probably would be) by any consumer of qjsonrpc by implementing send_my_structured_type(client: &ClientEndpoint, req: MyStructuredType). Providing it would be pure convenience. Down the line, if we wanted to, It would also set the foundation for us to write a macro to automatically derive the trait, without changing code of existing implementers.

On the other hand, RpcDaemon seems to be built strictly on top of the existing structure. Using RpcDaemon is a way to explicitly dodge using the underlying library entirely. That seems more like a new crate with qjsonrpc as a dependency.

Update

Was a great idea, but, as much as I would like to, trying to make JSON objects look even vaguely like a strongly-typed object is an uphill battle. It’d be easier at that point to just implement binary-encoded messages instead. Probably going to drop the new crate idea for a bit and just work directly with qjsonrpc::Endpoint inside a node for future experiments. Sunk cost sucks, but better to figure it out now I guess :man_shrugging:

1 Like

Here’s a fun little hack I got going. I was messing around with my local node binary and was able to embed a qjsonrpc endpoint into the node (actually, it’s wrapped inside an Option, so it only runs if --rpc-port is supplied), and my little client was able to communicate with a node in a locally-run baby fleming net.

To get it to be compatible with run-baby-fleming, I also had to hack the sn_launch_tool a bit and recompile the latest sn_cli release to supply the proper --rpc-port args.

There’s nothing fancy yet like identity verification, no special methods (other than ping), and it’s by no means “production-quality”… But it works, which is always cool to see after tinkering around with it for a bit!

Note on qjsonrpc SocketAddr binding

(This is mostly for my own reference, so I don’t forget later.)

It also turns out that you can bind a qjsonrpc::Endpoint to an IPv4 address, but the client can’t actually talk to an IPv4 address it seems, which took a hot second to realize. At least that’s what I think is going on there, but I haven’t verified yet. I’m just saying that because, when I initially tried to bind the node-side Endpoint to an IPv4 localhost address, they weren’t able to communicate. If it turns out that is the case, it might be a good idea to try and patch that in the qjsonrpc lib (probably in ClientEndpoint::connect()).

Anyway, it was cool to see and wanted to share. Happy hacking :smiley:

4 Likes

Having fun with sandboxing and I’ve got updates abound :wrench:

Updates

  • Nodes now generate RPC public/private keys on startup in a node subfolder called rpc
  • I’ve added a NodeRpcClient to sn_api similar to the AuthdClient, which issues remote procedure calls
  • On sending procedure calls, a one-time passphrase is generated and signed by the client to verify the sender’s identity. Message integrity itself is still managed by TLS, so I think that covers the bases
  • Nodes can take the RPC port on the command line, and runs the service on localhost on the specified port. If not provided, disables the interface entirely.
  • sn_launch_tool was modified to assign a random port number between [34005, 36133) and default launches nodes with the rpc interface enabled. This is more of a temporary sorta hack to test things out.
  • Added a node subcommand in sn_cli to test this out (more on that in a second).

Baby’s First CLI Command

Inspired by a thought I had a few months ago, here’s something basic that is not particularly easy to do yet. You can’t ask a running node for its node_id (e.g. it’s public key on the Safe Net) or its reward_key. You can check the logs, but that’s not possible to do programmatically, so it’s annoying to do for multiple nodes at once… So the first command I decided to try was get-id, which just queries the node via the RPC interface for its node_id and its public reward_key.

This leads to the following flow:

> safe node run-baby-fleming
...
...
Launching genesis node (#1)...
running node with rpc on port 33496
...
...
Done!
> safe node get-id --rpc-port 33496 --cert-base-path <genesis_node_base>/rpc/
node_id: PublicKey::Ed25519(bb7139..)
reward_key: PublicKey::Bls(81d50f..)

Interested in Playing Around?

I created a few new branches so if you want to try this locally, you can now!

To clone the sn_node, run:

> git clone --branch node-rpc https://github.com/Scorch-Dev/sn_node
> cd sn_node
> cargo build --release

And similarly for sn_api, run:

> git clone --branch node-rpc-client https://github.com/Scorch-Dev/sn_api
> cd sn_api
> cargo build --release

At this point maybe rename your stable versions of the sn_node and safe executables to something like sn_node_stable and safe_stable. Then copy the built binaries to those locations

> mv ~/.safe/safe ~/.safe/safe_stable
> mv ~/.safe/node/sn_node ~/.safe/node/sn_node_stable

> cp sn_api/target/release/safe ~/.safe/
> cp sn_node/target/release/sn_node ~/.safe/node/

Things Left to Do

This is still definitely a work in progress, so there’s a few things left to iron out.

  • I mentioned this in my last post, but it wasn’t a bug in qjsonrpc that I had found, but just something I failed to notice. My machine resolves localhost to IPV6, so that’s why I had trouble when binding the node to IPV4. Currently, that hasn’t been fixed yet. If your machine resolves localhost to IPV4, this won’t work for you in all probability.
  • Code style needs work. Some floating constants/magic numbers are still present here and there.
  • The get-id doesn’t return structured data yet. It’s just a formatted String for now.
  • Currently the NodeRpcClient isn’t cached, so we just build a new one each time we send a request, which is wasteful.
  • Command line parameters to sn node get-id are a little ugly. The problem is that each node potentially runs from a different directory and on a different port, so I’m not sure what the best way to streamline this is yet…
  • This only works on localhost, similar to authd.

As of now, I’m not sure how far I will take this or not, but for now I’m just having a good time making my node do fun things. Perhaps I can patch it into my local node for the next test-net even. Might be cool to try.

Maybe @bochaco or @joshuef would have some info about this, but do you think MaidSafe might be interested in eventually pulling a feature like this into the upstream repo? It’s still too rough now for that, but I’m wondering about looking forward. Like mentioned earlier, there are security implications. I also don’t know if MaidSafe is in a spot where they want to accept bigger community patches/features at any point in the semi-near future. Both for the sake of stability and resource bandwidth (e.g. reviewing and working with PRs). Beyond that, it was also mentioned that such a feature could be tunneled through safe in the future, so not sure how that would gel with using qjsonrpc on the backend.

Anyway, that’s what I’ve got for now. Ideas, improvements, etc. are welcomed, and stay “Safe” all :wink:

4 Likes

super nice stuff @Scorch . I havent played with it yet i’m afraid, but conceptually great to see. I can totally imagine us getting this sort of thing in. We absolutely need an api for interacting with the node :+1: :surfing_man: :tada:

3 Likes

Very nice @Scorch , it sounds very good feature set to me. Personally I believe we should try to incorporate these features in our CLI, api and node, sooner or later. Some ideas below.

I think we should split concerns/challenges here, on one side and probably the first step we should aim at, it’s talking and monitoring just a single node, and locally. Which if you think about it, it will probably be the most common use case for a user joining the network, with a node farming in his/her PC and the CLI to manage it. So we could go for this feature first, have everything needed to send queries and commands to a single node which runs locally.

Once that’s working, then we can have an interface in the node which can return the infrastructure information, let’s say the list of all nodes that are in the same section as the node you are requesting the info to. We already have something similar for communications between clients and nodes when they need to join/connect to a section, so it’s just a matter of having a similar service on this sn_node interface. When the CLI received the list of nodes with the infrastructure query, it can then send the queries/commands to all the nodes. In summary, you query one node to give you the list of nodes, and then send the actual query/command to all of them.

Note this was just an idea, and not necessarily Maidsafe will end up implementing in the core functionality, we will all need to evaluate how thing evolve and if it’s a good thing to have. Either way, I think it’s fair to say that even if it’s tunneled through Safe, and perhaps qjsonrpc won’t be needed and the messages would need to be sent as an opaque payload to the node using sn_client communication mechanism and protocol. I think we are far from that yet, and perhaps for local monitoring it’s still preferable to use local qjsonrpc messages, I have no idea. Thus the only thing I would do regarding this is keep it in mind just to make sure the design can fit in easily if the time comes for such a feature. I hope I’m making sense and you get the idea.

1 Like

I think this can/should be exposed in node’s API, along with a many other things which I presume will be needed to send queries and commands to, like changing the reward key.

1 Like

I think that’s a good way to go about it from the CLI perspective. Running only on localhost, we can arbitrate a default port like 34001 (following in the tradition that authd takes ports in the 33000 range, perhaps node rpc runs in the 34000 range), and a default base path like ~/.safe/node/rpc. CLI will fill that in by-default when sending queries and simplify the whole process. I think it still makes sense to allow for optional parameters to query on other ports so that it’s flexible enough that multiple local nodes could be talked to (which I think is valuable in the case of a localized test-net). In any case, it simplifies the majority of use cases and lets us focus on only worrying about localhost for now.

A Tentative CLI API

So trying to put something together based on comments so far, the CLI API might look something like the following. This takes some inspiration from git, in that a single subcommand can be an accessor or mutator, depending on arguments supplied:

  • keys : prints the node id/public key & reward key.
  • keys --set-reward-key <reward_key>: sets rewards PK from argument prints out the new output of keys.
  • keys --set-reward-key [ -f, --from_file ] <reward_key_file>: Sets the rewards key from a file and prints out the new output of keys
  • storage [--detailed]: prints current used space space and max space. With --detailed, if the node is an elder or adult, also prints how the storage is spread among the chunk stores.
  • storage --set-max <maximum used space>: Sets the maximum capacity space if it’s greater than the current used space (As of the last time I looked at those files, reducing beyond that is another beast).
  • netcfg: prints out some networking related information, like the address of the node and of the rpc interface. Could be expanded with more stats later presumably.
  • logs [--from-top] [--offset=0] <num_lines=10>: Get at most num_lines log entries. If --from-top, then fetches the oldest num_lines starting at index offset. Without --from-top, the last log fetched is offset by the offset amount from the bottom.
  • status: a convenience that wraps some of the above to quickly grok what’s going on with the node. Prints a birds-eye of the age, node root path, node id, reward key, the current/max storage, the network address, and the rpc socket address.
1 Like

Perhaps these can be children of a rewards subcommand, e.g. $ safe node rewards set-key <reward-key>, and as you point out the rewards subcommand alone shows info about pk and current balance maybe…?..

Not fully sure but this is probably already covered by the networks subcommand, perhaps we need some additional fields that can be saved/mapped to each of the networks in CLI settings, not sure if you are familiar with it but specially with upcoming release of CLI, check https://github.com/maidsafe/sn_api/blob/7e00228a94a5e31b805e2f9d86f17bff4a4ac1b9/sn_cli/README.md#set-network-bootstrap-address

1 Like