Data Chains - Deeper Dive


#61

Very very good results. With more than 94% in the same prefix, the tree created is very balanced.

I wonder if some small modification in the relocation could balance even more reducing the Min-Max difference. Something like:

If exist, in the neighbouring, a section with number_of_nodes < X choose that section
else choose the shorter prefix.

If we aid first the weakers section we would have a healthier network.


#62

We do already gave such a mechanism in place, however I think that as the network grows the balance happens naturally. So the option is more complex code to force/help balance or allow the natural balance which in some ways is nice as that is also how data distributes in a balanced manner.


#63

The aim of the “control system” implemented in the vaults is to insure the safety and efficiency of the network. This includes, among other things, the need to get a well-balanced prefix tree, which means few distinct values for the prefix length (ideally 1 or 2).

In this configuration (young_max=6) the result is good because we have 3 values but with few sections having the min or the max prefix length (1.33% of the sections with the min prefix length and 4% with the max prefix length).

Of course, the target is not specifically 12 and the value depends on the size of the network. For example, after 4000 iterations (instead of 100000), the node distribution is:

Prefix len Count Average Min Max Standard dev
7 123 21.37 18 28 1.71
8 10 16.40 12 23 3.34
All 133 21.00 12 28 2.28

And after 20000 iterations:

Prefix len Count Average Min Max Standard dev
9 222 26.75 18 31 1.98
10 580 13.30 8 23 2.13
All 802 17.02 8 31 6.37

The results with max_young=10 after 100000 iterations is:

Prefix len Count Average Min Max Standard dev
11 70 30.63 19 46 7.77
12 3898 17.99 10 32 3.30
13 116 14.01 11 26 2.54
All 4084 18.10 10 46 3.84

(not much different from max_young=6)


#64

Thanks @tfa I see now


#65

Amazing work again @tfa

One thing important for the limit of young nodes per section is to help with the situation when the network is relatively quiet. That is a possible attack vector for large adversaries who would try and flood the network at that time. So the limit effectively stops the flood from being too large and if there was growth then many other nodes would jump on the join queue (even other attackers, which is ok). This is the same protection really that states the network should only accept nodes when in needs them.

Both points for sure debatable and I think @mav has already made good arguments against that proposal. Personally I am on the fence, but slightly in favour of this protection.


#66

In your mind, what would be a good value for max_young parameter?

Here is a simulation with max_young=4 (which seems a safe candidate IMO, compared to 8 elders).

After 4000 iterations:

Prefix len Count Average Min Max Standard dev
7 124 19.90 17 25 1.44
8 8 17.50 13 24 3.46
All 132 19.75 13 25 1.70

After 20000 iterations:

Prefix len Count Average Min Max Standard dev
9 293 25.19 14 29 1.35
10 438 12.30 9 21 1.55
All 731 17.47 9 29 6.49

After 100000 iterations:

Prefix len Count Average Min Max Standard dev
11 39 27.33 15 40 7.84
12 3953 15.32 10 31 2.78
13 130 14.01 11 22 2.46
All 4122 15.39 10 40 3.09

With network summary at this stage:

Metrics Values
Adds 90015
Drops 6931
Rejoins 3054
Relocations 140885
Rejections 21000
Churns 386332
Sections 4122
Section nodes 63441
Left nodes 1521

The number of rejections is a bit high, but this is maybe the price to pay for more safety.


#67

I would not exceed 3 or even 2.

With 4125 Section and GROUP_SIZE = 8 we have 33000 Elders.

According to the distribution you have given, about 26412 elders with age>5 and 6588 with age=5. Maybe it’s too easy to be an elder.

By the way, is not it planned to reduce the number of Elders in each section to even only 4?


#68

I suspect 1 is the most secure but am not 100% convinced just yet, These types of security limitations are always pretty deep. I am running your code just now with 1000000 iterations. I think it may be wise to run forever but print out only the final results every 10,000 runs. Just working out the info we want printed on each 10,000 iters

[Edit most output should be logging and set to info, the distributions should be just min max and average, not a list I think that would work best. What do you think @tfa ?]


#69

With max_young=1 the results after 100000 iterations are:

Metrics Values
Adds 89795
Drops 7116
Rejoins 3089
Relocations 126415
Rejections 44357
Churns 355262
Sections 2073
Section nodes 40061
Left nodes 3083
Prefix len Count Average Min Max Standard dev
11 2023 19.51 13 28 0.88
12 50 11.86 9 17 2.17
All 2073 19.33 9 28 1.50

They are not as good (the number of sections is halved and the number of rejections has doubled).

I agree with this, but I don’t have time to do it right now. But next weekend will be OK for me.

When the relocation algorithm is correct, the list is limited to 2 or 3 elements (1 element = 1 prefix length). I think it is important to check the distribution of every prefix length and pollution is minimal.

Unless you were talking about the section list that was previously displayed? In that case I have already removed it in the Display trait that I have added to Network structure. I also generate markdown tables which are compatible with discourse forum. When I display the network summary, I use "{}" instead of "{:?}" to call this trait instead of Debug trait.


#70

As curiosity I build the @tfa repo with different GROUP_SIZE.

With GROUP_SIZE=7, max_young=1 after 100000 iterations.

Metrics Values
Adds 89936
Drops 6976
Rejoins 3088
Relocations 125356
Rejections 44543
Churns 353319
Sections 2295
Section nodes 40495
Left nodes 2859
Prefix len Count Average Min Max Standard dev
11 1802 19.70 10 22 0.86
12 491 10.12 8 16 0.96
13 2 11.00 11 11 0.00
All 2295 17.64 8 22 4.03

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

With GROUP_SIZE=6, max_young=1 after 100000 iterations.

Metrics Values
Adds 90105
Drops 6954
Rejoins 2941
Relocations 124502
Rejections 44159
Churns 352886
Sections 3480
Section nodes 40814
Left nodes 2944
Prefix len Count Average Min Max Standard dev
11 618 21.22 8 25 2.16
12 2858 9.68 6 17 1.49
13 4 9.25 7 12 2.22
All 3480 11.73 6 25 4.70

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

With GROUP_SIZE=5, max_young=1 after 100000 iterations.

Metrics Values
Adds 89938
Drops 7056
Rejoins 3006
Relocations 122841
Rejections 43845
Churns 350291
Sections 4184
Section nodes 41092
Left nodes 2831
Prefix len Count Average Min Max Standard dev
11 4 18.50 13 24 5.32
12 3997 9.81 6 22 1.98
13 181 9.90 6 17 1.91
14 2 10.00 10 10 0.00
All 4184 9.82 6 24 2.00

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

With Group_Size=9 max_young=1 after 100000 iterations.

Metrics Values
Adds 89985
Drops 7080
Rejoins 2935
Relocations 127295
Rejections 44534
Churns 357107
Sections 2079
Section nodes 39458
Left nodes 3251
Prefix len Count Average Min Max Standard dev
11 2017 19.13 13 33 1.10
12 62 13.98 10 27 3.11
All 2079 18.98 10 33 1.49

#71

Apparently the smaller the number of elders is, the more heterogenous the network is. In configuration with GROUP_SIZE=5, prefix length goes from 11 to 14. This doesn’t sound good for network stability. In conclusion, better keep 8 elders.

With GROUP_SIZE=4 the result is even worse:

Metrics Values
Adds 89856
Drops 7013
Rejoins 3131
Relocations 240781
Rejections 0
Churns 590476
Sections 8363
Complete 0
Section nodes 85020
Left nodes 3883
Prefix len Count Average Min Max Standard dev
12 1 17.00 17 17 None
13 8022 10.24 7 21 1.09
14 332 8.51 5 16 1.90
15 8 7.50 6 9 0.93
All 8363 10.17 5 21 1.19

Which means that no sections reach the complete state, hence 0 rejections.


#72

Your repo @tfa Just putting this here for the record. Results are nice to have in a single place.

Params { init_age: 4, split_strategy: Complete, max_young: 1, growth: (90, 7), structure_output_file: None, drop_dist: Exponential, inc_age: false }


Age distribution:
1	0
2	0
3	0
4	12888
5	50997
6	126065
7	94682
8	55324
9	30124
10	15765
11	7430
12	2824
13	730
14	114
15	7

Drops distribution by age:
1	0
2	0
3	0
4	9585
5	21589
6	25388
7	9663
8	2817
9	782
10	178
11	60
12	10

Here are the section numbers every 10,000

Seed: [854126829, 497889719, 278581660, 1742524827]
Iteration 0...
Number of sections: 1 (complete: 0)
Iteration 10000...
Number of sections: 259 (complete: 259)
Iteration 20000...
Number of sections: 517 (complete: 517)
Iteration 30000...
Number of sections: 778 (complete: 778)
Iteration 40000...
Number of sections: 1041 (complete: 1041)
Iteration 50000...
Number of sections: 1039 (complete: 1039)
Iteration 60000...
Number of sections: 1577 (complete: 1577)
Iteration 70000...
Number of sections: 2073 (complete: 2073)
Iteration 80000...
Number of sections: 2079 (complete: 2079)
Iteration 90000...
Number of sections: 2077 (complete: 2077)
Iteration 100000...
Number of sections: 2071 (complete: 2071)
Iteration 110000...
Number of sections: 2180 (complete: 2180)
Iteration 120000...
Number of sections: 3090 (complete: 3090)
Iteration 130000...
Number of sections: 3896 (complete: 3896)
Iteration 140000...
Number of sections: 4165 (complete: 4165)
Iteration 150000...
Number of sections: 4171 (complete: 4171)
Iteration 160000...
Number of sections: 4165 (complete: 4165)
Iteration 170000...
Number of sections: 4164 (complete: 4164)
Iteration 180000...
Number of sections: 4160 (complete: 4160)
Iteration 190000...
Number of sections: 4153 (complete: 4153)
Iteration 200000...
Number of sections: 4150 (complete: 4150)
Iteration 210000...
Number of sections: 4163 (complete: 4163)
Iteration 220000...
Number of sections: 4367 (complete: 4367)
Iteration 230000...
Number of sections: 5100 (complete: 5100)
Iteration 240000...
Number of sections: 6068 (complete: 6068)
Iteration 250000...
Number of sections: 6893 (complete: 6893)
Iteration 260000...
Number of sections: 7639 (complete: 7639)
Iteration 270000...
Number of sections: 8138 (complete: 8138)
Iteration 280000...
Number of sections: 8328 (complete: 8328)

Iteration 290000...
Number of sections: 8349 (complete: 8349)
Iteration 300000...
Number of sections: 8350 (complete: 8350)
Iteration 310000...
Number of sections: 8346 (complete: 8346)
Iteration 320000...
Number of sections: 8342 (complete: 8342)
Iteration 330000...
Number of sections: 8335 (complete: 8335)
Iteration 340000...
Number of sections: 8330 (complete: 8330)
Iteration 350000...
Number of sections: 8325 (complete: 8325)
Iteration 360000...
Number of sections: 8321 (complete: 8321)
Iteration 370000...
Number of sections: 8318 (complete: 8318)
Iteration 380000...
Number of sections: 8309 (complete: 8309)
Iteration 390000...
Number of sections: 8304 (complete: 8304)
Iteration 400000...
Number of sections: 8299 (complete: 8299)
Iteration 410000...
Number of sections: 8296 (complete: 8296)
Iteration 420000...
Number of sections: 8291 (complete: 8291)
Iteration 430000...
Number of sections: 8360 (complete: 8360)
Iteration 440000...
Number of sections: 8632 (complete: 8632)
Iteration 450000...
Number of sections: 9197 (complete: 9197)
Iteration 460000...
Number of sections: 9989 (complete: 9989)
Iteration 470000...
Number of sections: 11008 (complete: 11008)
Iteration 480000...
Number of sections: 12049 (complete: 12049)
Iteration 490000...
Number of sections: 12972 (complete: 12972)
Iteration 500000...
Number of sections: 13867 (complete: 13867)
Iteration 510000...
Number of sections: 14694 (complete: 14694)
Iteration 520000...
Number of sections: 15415 (complete: 15415)
Iteration 530000...
Number of sections: 15989 (complete: 15989)
Iteration 540000...
Number of sections: 16387 (complete: 16387)
Iteration 550000...
Number of sections: 16574 (complete: 16574)
Iteration 560000...
Number of sections: 16642 (complete: 16642)
Iteration 570000...
Number of sections: 16664 (complete: 16664)
Iteration 580000...
Number of sections: 16665 (complete: 16665)
Iteration 590000...
Number of sections: 16664 (complete: 16664)
Iteration 600000...
Number of sections: 16665 (complete: 16665)
Iteration 610000...
Number of sections: 16664 (complete: 16664)
Iteration 620000...
Number of sections: 16659 (complete: 16659)
Iteration 630000...
Number of sections: 16659 (complete: 16659)
Iteration 640000...
Number of sections: 16654 (complete: 16654)
Iteration 650000...
Number of sections: 16653 (complete: 16653)
Iteration 660000...
Number of sections: 16650 (complete: 16650)
Iteration 670000...
Number of sections: 16646 (complete: 16646)
Iteration 680000...
Number of sections: 16643 (complete: 16643)
Iteration 690000...
Number of sections: 16639 (complete: 16639)
Iteration 700000...
Number of sections: 16635 (complete: 16635)
Iteration 710000...
Number of sections: 16631 (complete: 16631)
Iteration 720000...
Number of sections: 16629 (complete: 16629)
Iteration 730000...
Number of sections: 16629 (complete: 16629)
Iteration 740000...
Number of sections: 16628 (complete: 16628)
Iteration 750000...
Number of sections: 16626 (complete: 16626)
Iteration 760000...
Number of sections: 16622 (complete: 16622)
Iteration 770000...
Number of sections: 16620 (complete: 16620)
Iteration 780000...
Number of sections: 16618 (complete: 16618)
Iteration 790000...
Number of sections: 16610 (complete: 16610)
Iteration 800000...
Number of sections: 16607 (complete: 16607)
Iteration 810000...
Number of sections: 16600 (complete: 16600)
Iteration 820000...
Number of sections: 16591 (complete: 16591)
Iteration 830000...
Number of sections: 16591 (complete: 16591)
Iteration 840000...
Number of sections: 16604 (complete: 16604)
Iteration 850000...
Number of sections: 16649 (complete: 16649)
Iteration 860000...
Number of sections: 16749 (complete: 16749)
Iteration 870000...
Number of sections: 16930 (complete: 16930)
Iteration 880000...
Number of sections: 17289 (complete: 17289)
Iteration 890000...
Number of sections: 17744 (complete: 17744)
Iteration 900000...
Number of sections: 18413 (complete: 18413)
Iteration 910000...
Number of sections: 19189 (complete: 19189)
Iteration 920000...
Number of sections: 20043 (complete: 20043)
Iteration 930000...
Number of sections: 20962 (complete: 20962)
Iteration 940000...
Number of sections: 21872 (complete: 21872)
Iteration 950000...
Number of sections: 22820 (complete: 22820)
Iteration 960000...
Number of sections: 23761 (complete: 23761)
Iteration 970000...
Number of sections: 24719 (complete: 24719)
Iteration 980000...
Number of sections: 25618 (complete: 25618)
Iteration 990000...
Number of sections: 26498 (complete: 26498)

#73

To complete must change this line too…


#74

Good catch. In my defense the hard coded constant is not in my code but in original MaidSafe code.


#75

David has been know to make devs eat Haggis for using “magic numbers” like that. Somebody may be lucky it wasn’t production code!


#76

Or somebody likes Haggis…


#77

In the first tests I made the same mistake. Only when the results seems so strange ,I dig deeper into the code.

Basically, changing the Group_size, the only significant difference, in the metrics, is the number of sectors.

For example, move from Group_size 8 to 5, double the number of sectors and the average prefix len goes from 11.02 to 12.04.

Of course all these data are with a strong growth. Will have to test in situations of more stability.


#78

Hi, @tfa , just raised a PR into your repo to sort out the print outs little bit.

And, just wondering maybe that https://github.com/Thierry61/ageing_sim/blob/master/src/network/section.rs#L191
shall be let node_to_age = self.choose_for_relocation(trailing_zeros + params.init_age - 1); ?
OR
https://github.com/Thierry61/ageing_sim/blob/master/src/network/section.rs#L164 to be .filter(|n| n.age() < age) ?
Otherwise a relocation will always happen on a churn?


#79

Thank you for this.

I have then uploaded a new version with better cells formatting so that tables are indented and justified also in the console window and not just in markdown editor.

Now, if you select a console height of about 23 lines, you see a fixed panel as long as you have 2 or 3 distinct prefix lengths (tested on a Cygwin terminal).

This is something I added 3 days ago to improve network growth:

With trailing_zeros + params.init_age – 1, init_age=4 and max_young=1, the result after 100_000 iterations is:

Metrics Values
Adds 89894
Drops 7012
Rejoins 3094
Relocations 57141
Rejections 64631
Churns 215930
Sections 1038
Section nodes 18901
Left nodes 2853
Prefix len Count Average Min Max Standard dev
10 1010 18.37 14 24 0.92
11 28 12.29 8 23 3.02
All 1038 18.21 8 24 1.43

Number of sections is 1038 instead of 2073 and number of rejections is 64631 instead of 44357. This is a lot, but I can do this if you confirm this is needed for the network security.


#80

The way this is designed is that older nodes will seldom relocate, (relatively) we will reject a lot, but I feel that is fine actually. I am not sure new nodes will always be waiting at every section. We have another sim in the background which Adam is working on. This will allow us to create many random numbers from each Live event by repeatedly hashing the event (settable). This will simulate and may replace using mutation events (like Put) as well as Live events. These nudge the network to grow as there is a need (when trigger is put especially).

So the Hash(Event) % 2^age == 0 applied to each peer in decreasing age and relocating the first one that matches the test will possibly slow down relocation (i.e. create more rejections) but is likely more secure but may delay nodes joining for perhaps even a few hours (or more). I am not sure this is bad as it is more a test of the nodes up time and is probably still faster to join than downloading a block chain (for now anyway).

Hope this helps.