Trying to build a global network


#1

I am trying to build a global network that I intend to be the seed of a future community network.

My vault is a fork of safe_vault and uses a fork of maidsafe_utilities, but my modifications are minimal and concerns only logging of vault data (because I want to be able to collect some data but I also want my vault to be inter-operable with original safe_vault, so that people are not forced to use my fork).

But I came across several problems:

  • Setting disable_external_reachability_requirement to false doesn’t work.

  • Release mode doesn’t compile.

  • I cannot create an account when the number of vaults is not exactly the min section size.

The first 2 problems are not blocking because I just leave disable_external_reachability_requirement to true and compile in debug mode, but the last one is blocking. I have set min_section_size to 5 and if 5 vaults are running then account creation works but when 6 vaults are running it doesn’t.

I reproduce this problem both when I try to create a manager account or a regular account:

When I try to create the manager account for invitations (./gen_invites --create) with 6 vaults I get this error:

Trying to create an account using given seed from file...
thread 'mainWARN 13:38:38.895597300 Core Event Loop [<unknown> <unknown>:188] Failed to receive response: Timeout
' panicked at 'WARN 13:38:38.896607000 Core Event Loop [<unknown> <unknown>:191] Could not put account to the Network: CoreError(Operation aborted - CoreError::OperationAborted)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!   unwrap! called on Result::Err                                              !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
safe_app/examples/gen_invites.rs:186,16 in gen_invites

Err(CoreError(Operation aborted - CoreError::OperationAborted))

', /home/tfa/.cargo/registry/src/github.com-1ecc6299db9ec823/unwrap-1.2.1/src/lib.rs:67:25
note: Run with `RUST_BACKTRACE=1` for a backtrace.

After having created the manager successfully (with 5 vaults), I put back the sixth vault and I try to create a regular account with an invite. I get this error in Peruse browser:

Core error: Blocking operation was cancelled

If I delete the last vault to have 5 vaults again, then I can create an account with an invite (another one because the network considers that the previous invitation is already claimed).

The global Peruse log file (with the failed creation, followed by the successful one) is:

T 18-12-05 15:15:08.293993 [<unknown> <unknown>:49] Creating unregistered client.
T 18-12-05 15:15:08.296993 [crust::main::service service.rs:556] Network name: Some("tfa")
T 18-12-05 15:15:08.327002 [crust::main::service service.rs:82] Event loop started
T 18-12-05 15:15:08.327993 [crust::main::config_refresher config_refresher.rs:44] Entered state ConfigRefresher
T 18-12-05 15:15:08.327993 [<unknown> <unknown>:537] Waiting to get connected to the Network...
T 18-12-05 15:15:09.359988 [crust::main::active_connection active_connection.rs:63] Entered state ActiveConnection: PublicId(name: ce06c7..) -> PublicId(name: 75744a..)
T 18-12-05 15:15:09.359988 [crust::main::active_connection active_connection.rs:110] Connection Map inserted: PublicId(name: 75744a..) -> Some(ConnectionId { active_connection: Some(Token(11)), currently_handshaking: 0 })
D 18-12-05 15:15:09.360978 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(ce06c7..) Received BootstrapConnect from 75744a...
D 18-12-05 15:15:09.360978 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(ce06c7..) Sending BootstrapRequest to 75744a...
D 18-12-05 15:15:09.366978 [routing::states::client client.rs:91] Client(ce06c7..) State changed to client.
T 18-12-05 15:15:09.366978 [<unknown> <unknown>:555] Connected to the Network.
T 18-12-05 15:34:37.964360 [<unknown> <unknown>:124] Attempting to log into an acc using client keys.
T 18-12-05 15:34:37.965361 [crust::main::service service.rs:556] Network name: Some("tfa")
T 18-12-05 15:34:38.011359 [crust::main::service service.rs:82] Event loop started
T 18-12-05 15:34:38.011359 [crust::main::config_refresher config_refresher.rs:44] Entered state ConfigRefresher
T 18-12-05 15:34:38.011359 [<unknown> <unknown>:537] Waiting to get connected to the Network...
T 18-12-05 15:34:39.066345 [crust::main::active_connection active_connection.rs:63] Entered state ActiveConnection: PublicId(name: 9cd623..) -> PublicId(name: 75744a..)
T 18-12-05 15:34:39.066345 [crust::main::active_connection active_connection.rs:110] Connection Map inserted: PublicId(name: 75744a..) -> Some(ConnectionId { active_connection: Some(Token(11)), currently_handshaking: 0 })
D 18-12-05 15:34:39.067346 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(9cd623..) Received BootstrapConnect from 75744a...
D 18-12-05 15:34:39.067346 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(9cd623..) Sending BootstrapRequest to 75744a...
D 18-12-05 15:34:39.073345 [crust::main::active_connection active_connection.rs:140] PublicId(name: 9cd623..) - Failed to read from socket: ZeroByteRead
I 18-12-05 15:34:39.073345 [routing::states::bootstrapping bootstrapping.rs:316] Bootstrapping(9cd623..) Connection failed: The chosen proxy node already has connections to the maximum number of clients allowed per proxy.
T 18-12-05 15:34:39.073345 [crust::main::active_connection active_connection.rs:227] Connection Map removed: PublicId(name: 75744a..) -> None
D 18-12-05 15:34:39.073345 [routing::states::bootstrapping bootstrapping.rs:365] Bootstrapping(9cd623..) Dropping bootstrap node PublicId(name: 75744a..) and retrying.
I 18-12-05 15:34:39.073345 [routing::states::bootstrapping bootstrapping.rs:141] Bootstrapping(9cd623..) Lost connection to proxy PublicId(name: 75744a..).
T 18-12-05 15:34:40.170331 [crust::main::active_connection active_connection.rs:63] Entered state ActiveConnection: PublicId(name: 9cd623..) -> PublicId(name: adaad7..)
T 18-12-05 15:34:40.170331 [crust::main::active_connection active_connection.rs:110] Connection Map inserted: PublicId(name: adaad7..) -> Some(ConnectionId { active_connection: Some(Token(20)), currently_handshaking: 0 })
D 18-12-05 15:34:40.170331 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(9cd623..) Received BootstrapConnect from adaad7...
D 18-12-05 15:34:40.170331 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(9cd623..) Sending BootstrapRequest to adaad7...
D 18-12-05 15:34:40.189330 [routing::states::client client.rs:91] Client(9cd623..) State changed to client.
T 18-12-05 15:34:40.189330 [<unknown> <unknown>:555] Connected to the Network.

Of course running 5 vaults permanently isn’t a workaround because I want the network to grow and when it becomes public I won’t control the number of vaults. So, what should I do to ensure that this kind of error doesn’t happen when the network is live?

I suppose my firewall configuration is correct because the setup with 5 vaults is working, but maybe not, so here are the ports allowed for inbound connections:

  • TCP/22 (for ssh)
  • TCP/2376, TCP/2377, UDP/4789, UDP/7946, TCP/7946 (for docker, I use it with an overlay network to log data and the internet network for safe exchanges on port 5483)
  • TCP/5483, UDP/5484 (for safe vault)

Failure to connect to internal network
#2

Just for more info:

What is it you are expecting it to do and what do you observe instead ?

I just did a fresh compilation in release - safe_vault master (at dd7ef5b6...) builds fine. Try cargo update just in-case you have some stale files around. What compilation errors do you get ?

was that with safe_vault master ? Also which version/commit of safe_cleint_libs did you use ?

Lastly did you try completely disabling firewall just to see if that’s fine ?


#3

That did it. Thanks.

I am not expecting anything, I don’t know what disable_external_reachability_requirement parameter does. I thought this parameter was a security parameter to be disabled in a local network (like many others). So I just inverted it for a global network.

Edit: The error I get (with the docker network) is:

E 18-12-08 18:33:01.480232 Bootstrapper has no active children left - bootstrap has failed
I 18-12-08 18:33:01.480494 Bootstrapping(b9bafc..) Failed to bootstrap. Terminating.

Edit 2: Now i get:

E 18-12-08 20:27:26.010797 Failed to Bootstrap: (FailedExternalReachability) Bootstrappee node could not establish connection to us.
I 18-12-08 20:27:26.012825 Bootstrapping(427e3c..) Failed to bootstrap. Terminating.

I reproduced the problem in a local network. This time I used these elements for the vaults:

  • no invites, no resource proof, … (standard setup for a local network, see config file at the end of the post)
  • current Maidsafe safe_vault crate in master branch (no forks of mine),
  • firewall completely disabled on the host
  • a docker bridge network (which exposes all ports to the containers connected to it).

Min section size is still 5 and results are (depending on the total number of vaults running in the local network):

  • first test: 5 nodes OK, 6 nodes NOK, 7 nodes OK, 8 nodes NOK
  • second test: 5 nodes OK, 6 nodes NOK, 7 nodes NOK, 8 nodes OK

5 nodes are always OK and 6 nodes are always NOK, above 6 the results are varying.

I created a program that creates an account with a random seed. OK means the account was successfully created after a few seconds, NOK means that program was seemingly blocked and I stopped it after about 1 minute.

Here is the source code of the program:

extern crate maidsafe_utilities;
extern crate rand;
extern crate safe_authenticator;
#[macro_use]
extern crate unwrap;

use rand::{thread_rng, Rng};
use safe_authenticator::Authenticator;

fn main() {
    unwrap!(maidsafe_utilities::log::init(true));
    println!("\nTrying to create an account using a random seed...");
    let seed = generate_random_printable(32);
    let _ = unwrap!(Authenticator::create_acc_with_seed(seed.as_str(), || ()));
    println!("Success !");
}

fn generate_random_printable(len: usize) -> String {
    thread_rng().gen_ascii_chars().take(len).collect()
}

Note that I didn’t succeed in putting it in an independent crate referencing safe_client_libs, I had to add it as an example directly in safe_client_libs. I am not sure how a sub-crate of a multi-crates project should be referenced and I tried this for Cargo.toml:

[package]
name = "create_account"
version = "0.1.0"

[dependencies]
rand = "~0.3.18"
maidsafe_utilities = "~0.16.0"
safe_authenticator = { git = "https://github.com/maidsafe/safe_client_libs" }

But I get two errors like this one:

error: cannot find macro `wait_for_response!` in this scope
   --> /root/.cargo/git/checkouts/safe_client_libs-e3345e9360262bab/ea8e3f7/safe_authenticator/src/client.rs:188:27
    |
188 |             .and_then(|_| wait_for_response!(routing_rx, Response::PutMData, msg_id))
    |                           ^^^^^^^^^^^^^^^^^

Is there a place where the way to use the authenticator in a rust program is explained?

Appendix:

  • safe_vault.crust.config
{
  "hard_coded_contacts": [
    "172.19.0.2:5483",
    "172.19.0.3:5483",
    "172.19.0.4:5483",
    "172.19.0.5:5483",
    "172.19.0.6:5483"
  ],
  "whitelisted_node_ips": null,
  "whitelisted_client_ips": null,
  "tcp_acceptor_port": 5483,
  "force_acceptor_port_in_ext_ep": false,
  "service_discovery_port": null,
  "bootstrap_cache_name": null,
  "network_name": "local",
  "dev": {
    "disable_external_reachability_requirement": true
  }
}
  • safe_vault.routing.config
{
  "dev": {
    "allow_multiple_lan_nodes": true,
    "disable_client_rate_limiter": true,
    "disable_resource_proof": true,
    "min_section_size": 5
  }
}
  • safe_vault.vault.config
{
  "dev": {
    "disable_mutation_limit": true
  }
}

#4

I also reproduced the problems in a global network without using docker at all (but using my safe_vault fork).

I solved the problem of disable_external_reachability_requirement == false not working, by setting force_acceptor_port_in_ext_ep to true with this safe_vault.crust.config file (with masked IP addresses):

{
  "hard_coded_contacts": [ "...:5483", "...:5483", "...:5483", "...:5483",
    "...:5483", "...:5483", "...:5483", "...:5483" ],
  "whitelisted_node_ips": null,
  "whitelisted_client_ips": null,
  "tcp_acceptor_port": 5483,
  "force_acceptor_port_in_ext_ep": true,
  "service_discovery_port": null,
  "bootstrap_cache_name": null,
  "network_name": "tfa",
  "dev": {
    "disable_external_reachability_requirement": false
  }
}

But I still get the problem of account creation working with 5 nodes but not working with 6 nodes.


#5

external-reachability if enabled (disabled==false) means that the proxy will try and connect back to the bootstrapping node and only if that succeeds will it allow the bootstrap to be successful. It’s a way to ensure there are more nodes on the network which can be reached without holepunch etc. (i.e., more nodes in the network which can be directly reached - so Public or port-f/w’ded or … )


#6

hey @nbaksalyar , can you see if you can reproduce this, thanks !


#7

I have recorded a session with asciinema that demonstrates the problem on a local network, without docker and with crates from Maidsafe exclusively.

asciicast


#8

Hi @tfa, we were able to reproduce this behaviour and we’re looking into it.

Thanks for the report!


#9

Hi @tfa,
Could you please try one thing: change the routing config file for the client apps/browser to be the same as on the vaults side (i.e. change the dev section in <app name>.routing.config). This should do the trick, because clients use routing too, and min_section_size should be the same on the both sides.


#10

Probably worth updating the README.md in SCL to say that @nbaksalyar


#11

I was using "dev": null in the client routing config file and copying the file from the vault corrects the problem. Thank you very much and sorry for the disturbance.


#12

Cool use of asciicinema!

Just some general things I observed

  • In general (but not always) it’s probably more reliable to to build from the latest tagged version, not from master (although in this case some of the master releases seemed to be the same as the tagged version). eg git checkout 0.17.2
    I realise sometimes master (or some non-tagged commit) is needed though…

  • Using --release during build is going to give an optimized binary and faster performance rather than leaving it out and getting a debug build.


#13

I am sorry again, but I still have some problems.

Now with routing config file modified as you indicated I can create accounts, both a manager account with gen_invites and regular accounts with invitations in Peruse but I can’t get WHM working:

My configuration is the following:

  • min section size = 5
  • network has 8 nodes
  • Peruse version is 0.7.0
  • WHM version is 0.5.0
  • OS is Windows 10

I can authorize both Peruse and WHM but then I am stuck in this screen in WHM:

Then there is this final screen:

WHM log is:

T 18-12-16 20:54:32.292830 [<unknown> <unknown>:124] Attempting to log into an acc using client keys.
T 18-12-16 20:54:32.350830 [<unknown> <unknown>:537] Waiting to get connected to the Network...
D 18-12-16 20:54:33.439816 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(ef1dd8..) Received BootstrapConnect from d219e3...
D 18-12-16 20:54:33.439816 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(ef1dd8..) Sending BootstrapRequest to d219e3...
I 18-12-16 20:54:33.458815 [routing::states::bootstrapping bootstrapping.rs:316] Bootstrapping(ef1dd8..) Connection failed: The chosen proxy node already has connections to the maximum number of clients allowed per proxy.
D 18-12-16 20:54:33.458815 [routing::states::bootstrapping bootstrapping.rs:365] Bootstrapping(ef1dd8..) Dropping bootstrap node PublicId(name: d219e3..) and retrying.
I 18-12-16 20:54:33.459816 [routing::states::bootstrapping bootstrapping.rs:141] Bootstrapping(ef1dd8..) Lost connection to proxy PublicId(name: d219e3..).
D 18-12-16 20:54:34.560800 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(ef1dd8..) Received BootstrapConnect from 8147e5...
D 18-12-16 20:54:34.560800 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(ef1dd8..) Sending BootstrapRequest to 8147e5...
I 18-12-16 20:54:34.637801 [routing::states::bootstrapping bootstrapping.rs:316] Bootstrapping(ef1dd8..) Connection failed: The chosen proxy node already has connections to the maximum number of clients allowed per proxy.
D 18-12-16 20:54:34.636800 [crust::main::active_connection active_connection.rs:140] PublicId(name: ef1dd8..) - Failed to read from socket: ZeroByteRead
D 18-12-16 20:54:34.637801 [routing::states::bootstrapping bootstrapping.rs:365] Bootstrapping(ef1dd8..) Dropping bootstrap node PublicId(name: 8147e5..) and retrying.
I 18-12-16 20:54:34.637801 [routing::states::bootstrapping bootstrapping.rs:141] Bootstrapping(ef1dd8..) Lost connection to proxy PublicId(name: 8147e5..).
D 18-12-16 20:54:35.905782 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(ef1dd8..) Received BootstrapConnect from 56305b...
D 18-12-16 20:54:35.905782 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(ef1dd8..) Sending BootstrapRequest to 56305b...
D 18-12-16 20:54:36.023781 [crust::main::active_connection active_connection.rs:140] PublicId(name: ef1dd8..) - Failed to read from socket: ZeroByteRead
I 18-12-16 20:54:36.023781 [routing::states::bootstrapping bootstrapping.rs:316] Bootstrapping(ef1dd8..) Connection failed: The chosen proxy node already has connections to the maximum number of clients allowed per proxy.
D 18-12-16 20:54:36.023781 [routing::states::bootstrapping bootstrapping.rs:365] Bootstrapping(ef1dd8..) Dropping bootstrap node PublicId(name: 56305b..) and retrying.
I 18-12-16 20:54:36.024781 [routing::states::bootstrapping bootstrapping.rs:141] Bootstrapping(ef1dd8..) Lost connection to proxy PublicId(name: 56305b..).
D 18-12-16 20:54:37.289764 [routing::states::bootstrapping bootstrapping.rs:266] Bootstrapping(ef1dd8..) Received BootstrapConnect from 1211bf...
D 18-12-16 20:54:37.289764 [routing::states::bootstrapping bootstrapping.rs:332] Bootstrapping(ef1dd8..) Sending BootstrapRequest to 1211bf...
D 18-12-16 20:54:37.420763 [routing::states::client client.rs:91] Client(ef1dd8..) State changed to client.
T 18-12-16 20:54:37.420763 [<unknown> <unknown>:555] Connected to the Network.
T 18-12-16 20:54:37.423763 [<unknown> <unknown>:304] GetMDataValue for e5c028..
D 18-12-16 20:54:38.026309 [routing::states::client client.rs:209] Client(ef1dd8..) NotEnoughSignatures
D 18-12-16 20:54:38.274653 [routing::states::client client.rs:209] Client(ef1dd8..) NotEnoughSignatures
T 18-12-16 20:54:57.435422 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 1, timer_token: 4, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:55:17.435885 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 2, timer_token: 5, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:55:37.436635 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 3, timer_token: 6, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:55:57.438302 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 4, timer_token: 7, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:56:17.438839 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 5, timer_token: 8, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:56:37.453844 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 6, timer_token: 9, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:56:57.454602 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 7, timer_token: 10, expires_at: Some(Instant { t: 230712746786 }) }
T 18-12-16 20:57:17.461707 [routing::states::client client.rs:397] Client(ef1dd8..) Timed out waiting for Ack(791f..): UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 8, timer_token: 11, expires_at: Some(Instant { t: 230712746786 }) }
D 18-12-16 20:57:17.461707 [routing::states::client client.rs:406] Client(ef1dd8..) Message unable to be acknowledged - giving up. UnacknowledgedMessage { routing_msg: RoutingMessage { src: Client { client_name: ef1dd8.., proxy_node_name: 1211bf.. }, dst: NaeManager(name: e5c028..), content: UserMessagePart { 1/1, priority: 3, cacheable: false, fcb3a8.. } }, route: 8, timer_token: 11, expires_at: Some(Instant { t: 230712746786 }) }
D 18-12-16 20:57:37.425656 [<unknown> <unknown>:35] **ERRNO: -17** CoreError(Request has timed out - CoreError::RequestTimeout)


#14

This could be caused by the config files not being picked up for some reason. I’ll try to reproduce the same steps and will get back to you with the results soon.

And thanks for confirming that the routing config change worked for you!


#15

I have the same problem with min_section_size=8 and a network of 8 nodes.

But if I put back “dev”: null in peruse config file then WHM works again.

So I seem to be in an inextricable situation because I needed to set “dev” to the same value as of a vault config file to correct the earlier problem of accounts that couldn’t be created in a network with 6 vaults and min_section_size=5.

I want to try if I get the same problem with 9 vaults and min_section_section=8 but I don’t succeed in getting the ninth vaults starting. I will investigate this later.


Solved: Requested service is not found
#16

This is interesting, I was going to chime in on this thread and it seems I didn’t. I’m having some of the same issues, I have considered scaling the network up to 16-20 nodes to see if the issues continue.
On a related note, re: “config files not being picked up”
Which file does the browser officially use? When I download the latest release (yesterday at least) it looks like:

/peruse
/peruse.crust.config
/resources/Peruse.crust.config

(I don’t recall which seem to work, I generally add the files in both folders)
I update them to look like:
/persue
/peruse.crust.config
/peruse.routing.config
/resources/Peruse.crust.config
/resources/Peruse.routing.config

I assume one of these is correct as the vault routing files follow the same format (in particular, appname.crust.config and appname.routing.config) but since the release doesn’t appear to have a default routing file, is there a chance that it is looking for another name?


#17

So, finally this wasn’t a good idea. The only configuration that works for me:

  • leave "dev": null in routing config files for clients (gen_invites and Peruse)
  • use "min_section_size": 8 for the network

Under these conditions I can create accounts and use WHM with a network having strictly more nodes than min_section_size (though I tested only up to 10 nodes because VPS begin to be costly to pass the resource proof)

To be precise:

  • 6 nodes with min section size 5 don’t work
  • 5 nodes with min section size 5 work
  • 8, 9 or 10 nodes with min section size 8 work.

Edit:
I also tested 8 nodes with min section size 7 and they don’t work either (same symptoms: cannot create account or WHM not working depending on content of routing config file). This time I did the test with newly released deliveries (SAFE Browser v0.11.0 and WHM v0.5.1).

The only workable configuration seems to be min section size 8. The problem with that value is a too expensive resource proof. There is a big threshold between min section size 7 and 8:

  • a 5 €/month VPS is enough to pass it with min section size 7
  • a 40 €/month VPS is needed to pass it with min section size 8

As I didn’t succeed in doing it, my question is: is there a way to get "min_section_size":7 working?


#18

If this works, it’d actually point to the config files getting mucked over likely as @nbaksalyar mentioned. For this permutation, you could essentially leave min_section_size out for vaults as well.

Main issue here is when the vaults and clients min_section_size do not match, then accumulation of messages goes up for a toss and thereby the issues of timeouts/NotEnoughSignatures and so on as the vaults would consider a message accumulated while the clients do not see it the same way. Also network startup conditions would interfere here with minimal nodes thus giving you some strange output specific to number of nodes in the network with accumulation thresholds)

leaving config file override as null/not-present or setting explicitly min_section_size:8 is net the same thing as 8 is the default min_section_size when there is no override provided. routing::MIN_SECTION_SIZE. If you alter the default there and have this recompiled for both clients and vaults, then you’d not be relying on a config override at all but it’d be a pretty cumbersome and annoying process I’d imagine and would rather opt to find out why the config files arent detected.

Some of these electron apps package two sets of config files one for the main app binary and one for the Electron Helper/AppName Helper.*.config too. I’d assume thats where either in the browser(authentication related) or WHM itself, a config override isnt getting applied as expected by the config_file_handler crate when it ends up looking for a config override. In mobile I recall another set_additional_search_paths API was also utilised to provide an excess lookup path for config/log configs, I’m assuming those arent used in desktop (@Krishna /@bochaco can hopefully confirm if we do anything but the default lookups hopefully).

If you try find . -type f -iname "*config" in the browser package folder after an initial run it should show you all the config files in the package (browser for example should show the Helper binary ones too).

Can you maybe provide the binary packages you’re using for both WHM and the browser and we can try it locally to see if the overrides are getting applied cos I’m assuming cos of these crappy binary names/multiple occurrences of config files, something is being a miss somewhere.


#19

Don’t woz about passing binaries. Could reproduce it with the latest release binaries itself. Browser 0.11.0 and WHM 0.5.1

Routing config override I used:

{
  "dev": {
    "allow_multiple_lan_nodes": true,
    "disable_client_rate_limiter": true,
    "disable_resource_proof": true,
    "min_section_size": 5
  }
}

min_section_size is what we’re interested in, the rest dont really impact clients, they were just needed to keep the config schema intact ofc.

so such a config I had to provide to both the browser and WHM.

Browser (relative path from .app pkg): ./SAFE Browser.app/Contents/Frameworks/SAFE Browser Helper.app/Contents/MacOS/SAFE Browser Helper.routing.config

WHM (relative path from .app pkg): ./web-hosting-manager.app/Contents/Frameworks/web-hosting-manager Helper.app/Contents/MacOS/web-hosting-manager Helper.routing.config

Note paths might not match exact on other platforms and especially with these electron “helper” binaries, the names of config files becomes a nightmare to monitor and maintain. One approach I generally take is strip the package/folder of any config files(move them to preserve originals) and then give the apps a dry run and scan with a find command(find . -type f -iname "*config") to see where and what the default empty config files get created as and then just modify them accordingly. Even this isn’t a great approach as with the browser, you’d see it create an empty crust and routing config but for other apps such as WHM only the crust config gets created by default since they use a BootstrapConfig provided by the auth via a diff ctor and while they can be overridden, they dont seem to create an empty routing config, but still just with the default crust config, I could get the expected config file name and path and just created the file myself too.

That was it, providing the expected config override, then sorts the invalid accumulation issue out as clients are using the same expected grp_size as vaults accordingly.

Side notes from just testing this right now:

  • We really need a better config management process. Mucking about with the filenames and scanning for presence of these files is a pain made worse when needing to edit/work with local networks. Gets more convoluted with these bundled in signed packages/other lookup/override methods.
  • Client grp_size expectation can just get provided via bootstrap response and removed from the overrides all-together. Clients can then choose to have self limits for sigs and either pick a diff proxy/not accordingly as per prev discussions. Just noting this as Its just something that wasn’t implemented at A2 stage.
  • Release packages needs better resource bundling. Currently browser package has around 4 crust config files with various names in multiple locations thereby needing other means to identify the valid config file. Should really be pruning dev config files/… before release packaging.
  • Should probably log at the various libraries config overrides to indicate the successful overrides when applied. This doesn’t seem to be part of std lib logs at selected thresholds of log.toml provided.
  • QA check scope needs to include dev config options too as non happy path testing seems a bit off. I’m assuming such dev options arent tested as rigorously as mock/prod setups.

cc @StephenC @Krishna @ustulation ^^


#20

I raised the following issue for this, although on Linux I only see two crust.config files in the browser package and one of them could be removed: https://github.com/maidsafe/safe_browser/issues/497

I also gave it a try to run a local network (currently trying also with min_section_size 5 and running 6 nodes on the same PC), using the latest from master of safe_vault repo. It seems to work fine although I see that some operations take a few minutes (or several seconds at least) until they are completed, e.g.:

  • when launching each of the nodes, it takes a few minutes before I see the log message that a new node has joined the network, this happened to me with every node I launched so far.
  • when the browser is launched, untill it’s able to connect to the local network it also takes a few minutes
  • creating the account is also stuck for a couple of minutes until is finally created
  • I was also able to get the WHM v0.5.1 to connect to the local network using the same routing.config, but it also was stuck after receiving the auth response for a moment before it finally connected to the network.
  • I was able to create a public id and upload a website, but it wasn’t that fast as I expected either, it was taking several seconds for some of the files it was uploading.
  • having a webapp to connect to the network also took a couple of minutes after it received the auth response

Does anyone know why I’m experiencing these delays?
Are you also seeing these delays when interacting wth the local network?
I’m not seeing my CPU or memory being stressed at all when these delays are happening.