What can I do to help?

Do we have a bug in that when we have only a few nodes (<7), we get this error msg consistently

Error: 
   0: ClientError: Problem finding sufficient elders. A supermajority of responses is unobtainable. 4 were known in this section, 7 needed. Section pk: PublicKey(0033..599d)
   1: Problem finding sufficient elders. A supermajority of responses is unobtainable. 4 were known in this section, 7 needed. Section pk: PublicKey(0033..599d)

despite the fact that the REAL no of nodes is probably between 4 and 7?

@Neik has three running, I have one (and more to come when I get the AWS security groups right as well as what I can get later with podman locally)and you @folaht have at least one. THen there are the attempts by @JPL and @Josh

1 Like

Can you see how many individual URLs you have in your logs? I found 7 although two are only mentioned a couple of times each

I can only see four .
83.163.103.119:12000 through 83.163.103.119:12003 and my own static IP 92.blah25:12000 out of logs sn_node.log.2022-01-05-16 through to -21

2 Likes

Guess that’s the problem then. Nodes not being promoted to elders - or people dropping out

1 Like

How many did we get?
I joined at age 68.
IIRC genesis node is age 256 and then the next node is 96(?) and subsequent nodes are 2 less. So if I was age 68 (96-68)/2 By this possibly flawed logic I was the 14th node so we should have had plenty Elders.
Assuming that the second node is always age 96 but Im sure its something like that. Wish Id stored the old logs outside of ~/.safe though… or I could check this theory

I regretting rm -rf ~/.safe this afternoon

➤ join {network_genesis_key=PublicKey(00ee..460f) target_section_key=PublicKey(00ee..460f) recipients=[Peer { name: ac3964(10101100).., addr: 83.163.103.119:12000, connection: Some(Connection { id: 93825010610288, remote_address: 83.163.103.119:12000, .. }) }]}
	 ➤ Aggregating received ApprovalShare from Peer { name: f7df98(11110111).., addr: 83.163.103.119:12002, connection: Some(Connection { id: 93825011281840, remote_address: 83.163.103.119:12002, .. }) }
 INFO 2022-01-05T16:21:52.528932Z [sn/src/routing/routing_api/mod.rs:L196]:
	 ➤ 47f5ce.. Joined the network!
 INFO 2022-01-05T16:21:52.528947Z [sn/src/routing/routing_api/mod.rs:L197]:
	 ➤ Our AGE: 68
 INFO 2022-01-05T16:21:52.528963Z [sn/src/routing/routing_api/dispatcher.rs:L87]:
	 ➤ Starting to probe network
 INFO 2022-01-05T16:21:52.528968Z [sn/src/routing/routing_api/dispatcher.rs:L115]:
	 ➤ Writing our PrefixMap to disk
 INFO 2022-01-05T16:21:52.528971Z [sn/src/routing/core/mod.rs:L212]:
	 ➤ Writing our latest PrefixMap to disk
 INFO 2022-01-05T16:21:52.529468Z [sn/src/node/node_api/mod.rs:L87]:
	 ➤ Node PID: 420446, prefix: Prefix(), name: 47f5ce(01000111).., age: 68, connection info: "92blah5:12000"
 DEBUG 2022-01-05T16:21:52.694292Z [sn/src/routing/core/msg_handling/mod.rs:L554]:
	 ➤ Relocation: Ignoring unexpected join response message: Approval { genesis_key: PublicKey(00ee..460f), section_auth: SectionAuth { value: SectionAuthorityProvider { prefix: Prefix(), public_key_set: PublicKeySet { public_key: PublicKey(0033..599d), threshold: 2 }, elders: {55257a(01010101)..: 83.163.103.119:12000, 8d1afb(10001101)..: 83.163.103.119:12001, 8f7393(10001111)..: 83.163.103.119:12003, f7df98(11110111)..: 83.163.103.119:12002} }, sig: KeyedSig { public_key: PublicKey(0033..599d), signature: Signature(0955..81ab) } }, node_state: SectionAuth { value: NodeState { name: 47f5ce(01000111).., addr: 92.blah5:12000, state: Joined, previous_name: None }, sig: KeyedSig { public_key: PublicKey(0033..599d), signature: Signature(018c..9a17) } }, section_chain: PublicKey(00ee..460f)->PublicKey(016f..ae0f)->PublicKey(0033..599d) }
 DEBUG 2022-01-05T16:21:52.787932Z [sn/src/routing/core/msg_handling/mod.rs:L554]:

I only got 19.1MB chunks.
I think we need to start with a critical mass of 7+ nodes for future testnets.
All credit to @folaht though, I am looking harder at podman now.

2 Likes

For Folaht…

This is not quite working yet but I think is heading in the right direction

#!/bin/bash

#usage()
#  {
#  echo "Usage: [-a address] [-p port] [-n network_dir] [-s storage_dir]"
#}

LOC_IP='10.0.0.5'
LOC_PORT_BASE=15000
NUM_NODES=10
LOC_PORT_MAX=$(( $LOC_PORT_BASE + $NUM_NODES))
PUB_IP='52.215.175.60'
PUB_PORT=15000
LAN_IP='192.168.100.100'
LAN_PORT_BASE=19000
LAN_PORT_MAX=$(( $LAN_PORT_BASE + $NUM_NODES))
NETWORK_NAME=troon123






node_count=1
while [ $node_count -le $NUM_NODES ]
do 
  LOC_PORT=$(( $LOC_PORT_BASE + $node_count))
  LAN_PORT=$(( $LAN_PORT_BASE + $node_count))
  #echo  $LAN_IP:$LAN_PORT:$LOC_PORT/udp
  #echo "--log-dir /root/.safe/node/node_dir_"$node_count 

 
  ((node_count++))
done  
#echo  $LAN_IP:$LAN_PORT_BASE-$LAN_PORT_MAX:$LOC_PORT_BASE-$LOC_PORT_MAX/udp
#echo "--publish "$LAN_IP:$LAN_PORT_BASE-$LAN_PORT_MAX:$LOC_PORT_BASE-$LOC_PORT_MAX"/udp"



CLI_DIR=~/.local/share/safe/cli/

while getopts 'apns::?h' c
do
  case $c in
    a) LOC_IP=$OPTARG ;;
    p) LOC_PORT=$OPTARG ;;
    n) NETWORK_NAME=$OPTARG ;;
    d) CLI_DIR=$OPTARG ;;
    s) STORAGE_DIR=$OPTARG ;;
    h|?) usage ;;
  esac
done

if [ ! -d $CLI_DIR ]
then
  mkdir -p $CLI_DIR
  podman unshare chown root:root -R ~/.local/share/safe/cli
fi

sudo podman run \
--name $NETWORK_NAME \
--restart unless-stopped \
--publish $LAN_IP:$LAN_PORT_BASE-$LAN_PORT_MAX:$LOC_PORT_BASE-$LOC_PORT_MAX/udp \
--env LOC_IP=$LOC_IP_BASE \
--env LOC_PORT=$LOC_PORT_BASE \
--env PUB_IP=$PUB_IP \
--env PUB_PORT=$PUB_PORT \
--mount type=bind,source=/home/$USER/.local/share/safe/cli/,destination=/root/.safe/cli/ \
--ip $LOC_IP \
-d ghcr.io/safenetwork-community/rf-rootnode-ipv4:main
# echo "we grabbed  ghcr.io/safenetwork-community/rf-rootnode-ipv4:main"
sudo podman exec $NETWORK_NAME safe networks add sjefolaht
sudo podman exec $NETWORK_NAME safe networks switch sjefolaht


node_count=1
while [ $node_count -le $NUM_NODES ]
do 
  echo $node_count
  sleep 1
  LOC_PORT=$(( $LOC_PORT_BASE + $node_count))
  LAN_PORT=$(( $LAN_PORT_BASE + $node_count))
  #echo  $LAN_IP:$LAN_PORT:$LOC_PORT/udp
  sudo podman exec -d $NETWORK_NAME sn_node \
  --idle-timeout-msec 5500 \
  --keep-alive-interval-msec 4000 \
  --skip-auto-port-forwarding \
  --local-addr $LAN_IP:$LAN_PORT \
  --public-addr $PUB_IP:$LAN_PORT \
  --log-dir /root/.safe/node/node_dir_$node_count \
  --root-dir /root/.safe/node/node_dir_$node_count
 
  ((node_count++))
done  


exit

Here is the error I get when running this script as above…

willie@gagarin:~/projects/maidsafe/ELK$ ./sjefolaht.sh 
ERRO[0000] error loading cached network config: network "podman" not found in CNI cache 
WARN[0000] falling back to loading from existing plugins on disk 
Error: error configuring network namespace for container 093f7468bbba0e4df2f3fee7e7a6154a35b93cbda5955ef2ffa01a7bff4bde68: error adding pod troon123_troon123 to CNI network "podman": failed to allocate all requested IPs: 10.0.0.5
Error: can only create exec sessions on running containers: container state improper
Error: can only create exec sessions on running containers: container state improper
1
Error: can only create exec sessions on running containers: container state improper
2
Error: can only create exec sessions on running containers: container state improper
3
Error: can only create exec sessions on running containers: container state improper
4
Error: can only create exec sessions on running containers: container state improper
5
Error: can only create exec sessions on running containers: container state improper
6
Error: can only create exec sessions on running containers: container state improper
7
Error: can only create exec sessions on running containers: container state improper
8
Error: can only create exec sessions on running containers: container state improper
9
Error: can only create exec sessions on running containers: container state improper
10
Error: can only create exec sessions on running containers: container state improper
willie@gagarin:~/projects/maidsafe/ELK$

As you can see I have a bit to learn about podman syntax so I am hoping between us we can crack this.

1 Like

I believe for LOC_PORT or CON_PORT as I call it now, should be 10.88.x.x.

I’ll update my shell script with your loop improvements tonight.

2 Likes

At least one of the errors you’re getting was due to a bug in the image Dockerfile after the name changes,
so you’ll need to pull the image again.