Trying to build a global network


#21

Not really, maybe you’re running debug versions?

not something I observed as a common occurrence locally. I have seen spikes for activity but not consistent delays like what you mention.


#22

I’m indeed running debug versions, I’ll try with building for release tomorrow and see what happens, thanks @Viv


#23

I’m seeing the same delays with the release builds, no difference really, at least with launching vaults.

When launching each vault I see the first two minutes pass after the following log entry before anything else is logged:

T 18-12-27 10:50:34.579878 send method=Get, url="http://192.168.1.254/RootDevice.xml", client=Client { redirect_policy: FollowAll, read_timeout: None, write_timeout: None, proxy: None }
vault log of first two minutes
Running safe_vault v0.18.0
==========================
T 18-12-27 10:50:34.573006 Network name: Some("bochaco")
T 18-12-27 10:50:34.579878 send method=Get, url="http://192.168.1.254/RootDevice.xml", client=Client { redirect_policy: FollowAll, read_timeout: None, write_timeout: None, proxy: None }
T 18-12-27 10:50:34.579962 host="192.168.1.254"
T 18-12-27 10:50:34.580001 port=80
D 18-12-27 10:50:34.580063 http scheme
T 18-12-27 10:52:45.737913 registering with poller
T 18-12-27 10:52:45.738008 registering with poller
T 18-12-27 10:52:45.738138 Event loop started
T 18-12-27 10:52:45.738227 wakeup thread: sleep_until_tick=18446744073709551615; now_tick=0
T 18-12-27 10:52:45.738294 Entered state ConfigRefresher
T 18-12-27 10:52:45.738310 sleeping; tick_ms=100; now_tick=0; blocking sleep
T 18-12-27 10:52:45.738385 setting timeout; delay=30.000512674s; tick=300; current-tick=0
T 18-12-27 10:52:45.738442 advancing the wakeup time; target=300; curr=18446744073709551615
T 18-12-27 10:52:45.738466 unparking wakeup thread
T 18-12-27 10:52:45.738515 inserted timeout; slot=44; token=Token(0)
T 18-12-27 10:52:45.738522 wakeup thread: sleep_until_tick=300; now_tick=0
T 18-12-27 10:52:45.738571 sleeping; tick_ms=100; now_tick=0; sleep_until_tick=300; duration=30000
T 18-12-27 10:52:45.738885 registering with poller
D 18-12-27 10:52:45.739693 Node(c7a3a9..()) State changed to node.

And I can see that when launching another vault the same type of delay and log is seen more than once before the vault finally joins the network:

another vault when trying to join also shows 2 mins delays in between some log entries
...
T 18-12-27 10:55:28.581990 registering with poller
T 18-12-27 10:55:28.582023 unlinking timeout; slot=210; token=Token(2)
T 18-12-27 10:55:28.582047 setting timeout; delay=21.052219323s; tick=211; current-tick=11
T 18-12-27 10:55:28.582063 inserted timeout; slot=211; token=Token(2)
T 18-12-27 10:55:28.582155 Graceful event loop exit.
T 18-12-27 10:55:28.582475 Network name: Some("bochaco")
T 18-12-27 10:55:29.587475 send method=Get, url="http://192.168.1.254/RootDevice.xml", client=Client { redirect_policy: FollowAll, read_timeout: None, write_timeout: None, proxy: None }
T 18-12-27 10:55:29.587550 host="192.168.1.254"
T 18-12-27 10:55:29.587591 port=80
D 18-12-27 10:55:29.587622 http scheme
T 18-12-27 10:57:40.649921 registering with poller
T 18-12-27 10:57:40.650024 registering with poller
T 18-12-27 10:57:40.650283 Event loop started
T 18-12-27 10:57:40.650285 wakeup thread: sleep_until_tick=18446744073709551615; now_tick=0
T 18-12-27 10:57:40.650423 sleeping; tick_ms=100; now_tick=0; blocking sleep
T 18-12-27 10:57:40.650590 Entered state ConfigRefresher
T 18-12-27 10:57:40.650779 setting timeout; delay=30.000875289s; tick=300; current-tick=0
... (more log entries here)
T 18-12-27 10:57:41.663378 setting timeout; delay=121.013483612s; tick=1210; current-tick=11
T 18-12-27 10:57:41.663397 inserted timeout; slot=186; token=Token(1)
T 18-12-27 10:57:41.664149 Node(55b35d..()) Got routing message RoutingMessage { src: Section(name: 55b35d..), dst: Client { client_name: 55b35d.., proxy_node_name: c7a3a9.. }, content: NodeApproval { {VersionedPrefix { prefix: Prefix(), version: 0 }: {PublicId(name: 55b35d..), PublicId(name: c7a3a9..)}} } }.
I 18-12-27 10:57:41.664284 Node(55b35d..()) Resource proof challenges completed. This node has been approved to join the network!
T 18-12-27 10:57:41.664338 Node(55b35d..()) Node approval completed. Prefixes: {Prefix()}
T 18-12-27 10:57:41.664207 registering with poller
T 18-12-27 10:57:41.664440 unlinking timeout; slot=210; token=Token(2)
T 18-12-27 10:57:41.664460 setting timeout; delay=21.014565346s; tick=210; current-tick=11
T 18-12-27 10:57:41.664480 inserted timeout; slot=210; token=Token(2)
T 18-12-27 10:57:41.664520 unlinking timeout; slot=186; token=Token(1)
T 18-12-27 10:57:41.664535 setting timeout; delay=121.014640592s; tick=1210; current-tick=11
T 18-12-27 10:57:41.664555 inserted timeout; slot=186; token=Token(1)
I 18-12-27 10:57:41.664735  -------------------------------------------- 
I 18-12-27 10:57:41.664757 | Node(55b35d..()) - Routing Table size:   1 |
I 18-12-27 10:57:41.664780 | Exact network size: 2                      |
I 18-12-27 10:57:41.664802  -------------------------------------------- 
T 18-12-27 10:57:41.664826 Node(55b35d..()) Scheduling a SectionUpdate for 30 seconds from now.
T 18-12-27 10:57:43.656207 wakeup thread: sleep_until_tick=30; now_tick=30
T 18-12-27 10:57:43.656303 setting readiness from wakeup thread
T 18-12-27 10:57:43.656373 wakeup thread: sleep_until_tick=18446744073709551615; now_tick=30
T 18-12-27 10:57:43.656420 tick_to; target_tick=30; current_tick=11
T 18-12-27 10:57:43.656458 sleeping; tick_ms=100; now_tick=30; blocking sleep
T 18-12-27 10:57:43.656488 ticking; curr=Token(18446744073709551615)

Another thing I’m seeing in the 5th vault I launched is that it was retrying to join at least more than once, I see the following logged after ~15mins that I launched it and still cannot join:

...
T 18-12-27 11:28:54.258668 ticking; curr=Token(18446744073709551615)
T 18-12-27 11:28:54.258684 unsetting readiness
T 18-12-27 11:28:54.258700 advancing the wakeup time; target=3610; curr=3901
T 18-12-27 11:28:54.258720 unparking wakeup thread
T 18-12-27 11:28:54.258746 wakeup thread: sleep_until_tick=3610; now_tick=3601
T 18-12-27 11:28:54.258782 sleeping; tick_ms=100; now_tick=3601; sleep_until_tick=3610; duration=900
I 18-12-27 11:28:55.128755 JoiningNode(4edb33..()) Failed to get relocated name from the network, so restarting.
D 18-12-27 11:28:55.128817 State::JoiningNode(4edb33..()) Terminating state machine
W 18-12-27 11:28:55.128844 Restarting Vault
T 18-12-27 11:28:55.131012 Graceful event loop exit.
T 18-12-27 11:28:55.131675 Network name: Some("bochaco")
T 18-12-27 11:28:55.134810 send method=Get, url="http://192.168.1.254/RootDevice.xml", client=Client { redirect_policy: FollowAll, read_timeout: None, write_timeout: None, proxy: None }
T 18-12-27 11:28:55.134849 host="192.168.1.254"
T 18-12-27 11:28:55.134859 port=80
D 18-12-27 11:28:55.134875 http scheme
T 18-12-27 11:31:05.641679 registering with poller
T 18-12-27 11:31:05.641747 registering with poller
T 18-12-27 11:31:05.641861 Event loop started
...

I’m not sure why that last one cannot join but the jumps in the timetamps of the logs seem to all be due to waiting a response to that even trying to get "http://192.168.1.254/RootDevice.xml" ?

I’ll share the logs of this run with the vault team.


#24

Ok, thanks a lot @Viv, just to confirm here that the problem I was having when experiencing those delays are definitely related to the LAN I was using with my home WiFi (something going on when I use that router), as soon as I switched to my cellphone hotspot LAN the problem was completely gone and I’m able to run 5 vaults and also 6 vaults, which join the network quickly, and everything else is also quick, i.e. creating an account, uploading a couple of webapps with the WHM, and using those webapps from the browser generating some data as well.

This is what I’m using for the vaults, browser and WHM as routing config:

{
  "dev": {
    "allow_multiple_lan_nodes": true,
    "disable_client_rate_limiter": true,
    "disable_resource_proof": true,
    "min_section_size": 5
  }
}

I will try to also run a network with more vaults as well as with min_section_size 8 as it was mentioned above to see if I encounter any issues.

EDIT: I can confirm that it works for me also with min_section_size 8, with 8, 9, and even 10 vaults, this is the crust config I’m using:

{
  "hard_coded_contacts": [],
  "tcp_acceptor_port": null,
  "force_acceptor_port_in_ext_ep": false,
  "service_discovery_port": null,
  "bootstrap_cache_name": null,
  "whitelisted_node_ips": null,
  "whitelisted_client_ips": null,
  "network_name": "bochaco",
  "dev": null
}