As do I. I feared that by mentioning the nitpick that it would overshadow how much I agreed with @mav and the good work he did.
And if I can mention so its not forgotten in all this that people adding nodes will desire and have an expectation that their nodes will become a useful part of the network within a reasonable amount of time. You know farming etc.
So there has to be consideration for security foremost but also the ability of new nodes to progress to the point they are farming. Really a “keep the node supplier” happy. (assuming their node meets min specs)
Even though the network may not need new nodes we perhaps should have a “not need” and “do not add more”
So there is a point where no more nodes are needed, then then a point where we cannot securely have any more nodes. This gives a range, so after reaching the point where we do not need new nodes, we are still adding (more slowly the more added) new nodes till we cannot securely add any more.
Thus the network is adding farmers (nodes) even though they are not strictly needed.
Yes, don’t get me wrong the network will not stop peers being added per se’ what it does/should do/ is limit the number of infants to really slowly let adults join. So a mass join effort by bad actors all of a sudden will be thwarted, but slowly accepting nodes means it accepts over a period and not all at once. So a bad actor cannot wait for a quiet time and flood the network, but slowly gets peers added. The longer the duration then the more likely other peers are being added by either good folk or other bad actors (which is OK). So there is a balance, but the limit is to really prevent mass join all at one time by a single entity.
@bart, I wanted to play with your simulator, but results are not correct with always only one section. Is there anything special to do to make it work or is your repository currently in a bad state?
@mav, you have entirely rewritten the test in Go, that’s very impressive.
I had originally developed the simulation for a few different purposes before the ageing changes came out. I have some big plans for the simulation codebase. Hopefully those results will come to fruition early next year. So far ageing has taken a good chunk of time away from the original intentions (but in the best possible way).
@tfa, Yes - that’s because it simulates an older set of rules that would cause such behaviour. We have changed the rules since then, but there wasn’t time to update the simulator. I’ll probably get to that at some point, but for now we have some tasks with higher priority. Sorry it’s not as useful as it could be!
It’s in the Block States > Dead(PeerId) > Trigger for voting:
“we take the Hash of the event H. Then if H % 2^age == 0 for any peer (sorted by age ascending) in our section, we relocate this node to the neighbour that has the lowest number of peers.”
The bold text indicates younger nodes are preferred for relocation. But rfc0045 says prefer older nodes, so there is a conflict.
I think I have found the problems in the algorithm:
The sections that have a shorter prefix should be chosen in priority as target of relocations, even if they have more nodes. This is the main problem and it is only one line to modify.
There is a mysterious hard coded rule that suppress node ageing when the event concerns a non-adult node in a section with a prefix length > 4 that doesn’t do any good.
With these modifications the simulation is beginning to fly (with 75 sections created). It is not yet perfect because there are still a lot of young node rejections (70%), but the situation is unblocked.
I also think that relocations implementing a change of owning sections during a split or a merge shouldn’t trigger other relocations in cascade to avoid destabilizing newly created sections. This means that the Gone and Live events generated by a split or merge operations shouldn’t count for node ageing. The result is almost the same: 71 sections but there are less merge events (280 instead of 377) which demonstrates a more stable network. This one is debatable.
I have just uploaded my modifications in my fork, but I haven’t analyzed all the consequences. If you want to study them.
Thanks for the clarification. Instinctively, I would think that you would want to relocate an elder rather than a young node to an unhealthy neighbouring section. The reason for this is that the if a neighbour is already in trouble, you don’t want to send an untrusted adversary its way. This seems like a preferred option as long as the section from which the chosen elder is leaving has enough elders to remain healthy. I may have my thinking backward on this, but this is the sentiment that first comes to mind…
I didn’t explain why this is needed: what we observe currently during the simulations are successive splits that create a completely unbalanced network, for example () => 0 & 1, 0 => 00 & 01, 01 => 010 & 011, 010 => 0100 & 0101, 0101 => 01010 & 01011. Here there are 6 sections: 1, 00, 011, 0100, 01010 and 01011 with distinct prefix lengths instead of similar ones. At one time the shorter prefix needs a merge which triggers a generalized merge of all the sections that recreates the section with zero length prefix ().
And then the cycle begins again. This creates a oscillating phenomenon from 1 section to a few ones.
By selecting the target section for relocation with the shorter prefix, we insure that a section with a long prefix won’t be split and so an unbalanced network is not created.
Of course, if there are several sections with a short prefix then the smaller section must be selected among them.
Nice catch, Thierry. Avoid, at all costs, a cascade merging, the greatest fear, seems the most reasonable option.
And adding a step in the random relocation, forcing the new node to find a key in the neighbouring section with shorter prefix, could be the best of both worlds.
Hey @tfa, thanks for this! @bart is away for a couple of days, so I’ll be looking into this in the meantime. Let me get back to you once I have something.
Your modifications certainly made things better, but it still isn’t enough. For example, we are still seeing sections of ~1000 nodes, which is way too much (ideal section size is between 20 - 30 nodes). We are currently trying different approach and will update as soon as we have some results.
Yes, the algorithm must be improved because sections are still too big and there are too many node rejections, but I don’t observe as big sections as you do. With default settings (no arguments given) I get 71 sections with 154 nodes on average (min size is 33 and max is 431).
This is the result with my fork, but you have made some modifications since the fork, I will try to merge them when I have time, to see how things evolve.
My random seed is: AGE_SEED="[1070056145, 2737779209, 2756500248, 3832683655]"
I have rebased my code on Maidsafe new version and I got similar results. The number of sections is now only 65, which is not very far from previuos result (71). Maybe the difference is due to the new drop probability distribution, which I suppose is a better model of outside world (not under vaults control).
I also noticed that the age penalty for a rejoining node is now less severe (minus 1 instead of divided by 2). Maybe this is a good thing for people who need to restart their vault.
After that I made an experiment and added 1 to the number of trailing zeros so that a relocation is done at each event and not just every other event. The results are better (but still not perfect):
57843 rejections (instead of 70874)
102 sections (instead of 65)
Then I replaced boolean norejectyoung parameter by a numerical one that indicates the number of allowed young peers in a section. Modifying this parameter can generate slightly better results:
Max Young
Rejections
Sections
1
57843
102
2
56780
119
3
56242
104
4
56002
83
We can observe that the optimum is to allow 2 young nodes (age = 1) in a section.
I have updated my repository with this code and the specific command generate this configuration is:
Cheers for the testing, just a small point.
I suspect age start at 4 is actually here we want to be. 1-3 is just work for the network, but 4 means a node does some work and the network does minimum.
I tested with init_age = 4 and different values for max_young parameters. The results were not good.
Then I realized that, in my modification to insure a relocation is done for every event, I must add init_age instead of simply 1. With this modification the results are much better:
Rejections: 44357
Sections: 2073
Average section size: 19.33
But in return the simulation is now much longer to execute (about 53 mn), so I have only tested with max_young=1. I will test other values later. But I think that a great step has already been made.