Data Chains - Deeper Dive


#41

“Excellent” (with the finger tapping)


#42

@bart, I wanted to play with your simulator, but results are not correct with always only one section. Is there anything special to do to make it work or is your repository currently in a bad state?

@mav, you have entirely rewritten the test in Go, that’s very impressive.


#43

I had originally developed the simulation for a few different purposes before the ageing changes came out. I have some big plans for the simulation codebase. Hopefully those results will come to fruition early next year. So far ageing has taken a good chunk of time away from the original intentions (but in the best possible way).


#44

@tfa, Yes - that’s because it simulates an older set of rules that would cause such behaviour. We have changed the rules since then, but there wasn’t time to update the simulator. I’ll probably get to that at some point, but for now we have some tasks with higher priority. Sorry it’s not as useful as it could be!


#45

What part of the OP were you referring to? The “Split” section, “Static group and quorum sizes”, somewhere else?


#46

It’s in the Block States > Dead(PeerId) > Trigger for voting:

“we take the Hash of the event H. Then if H % 2^age == 0 for any peer (sorted by age ascending) in our section, we relocate this node to the neighbour that has the lowest number of peers.”

The bold text indicates younger nodes are preferred for relocation. But rfc0045 says prefer older nodes, so there is a conflict.


#47

I think I have found the problems in the algorithm:

  • The sections that have a shorter prefix should be chosen in priority as target of relocations, even if they have more nodes. This is the main problem and it is only one line to modify.

  • There is a mysterious hard coded rule that suppress node ageing when the event concerns a non-adult node in a section with a prefix length > 4 that doesn’t do any good.

With these modifications the simulation is beginning to fly (with 75 sections created). It is not yet perfect because there are still a lot of young node rejections (70%), but the situation is unblocked.

I also think that relocations implementing a change of owning sections during a split or a merge shouldn’t trigger other relocations in cascade to avoid destabilizing newly created sections. This means that the Gone and Live events generated by a split or merge operations shouldn’t count for node ageing. The result is almost the same: 71 sections but there are less merge events (280 instead of 377) which demonstrates a more stable network. This one is debatable.

I have just uploaded my modifications in my fork, but I haven’t analyzed all the consequences. If you want to study them.


#48

Thanks for the clarification. Instinctively, I would think that you would want to relocate an elder rather than a young node to an unhealthy neighbouring section. The reason for this is that the if a neighbour is already in trouble, you don’t want to send an untrusted adversary its way. This seems like a preferred option as long as the section from which the chosen elder is leaving has enough elders to remain healthy. I may have my thinking backward on this, but this is the sentiment that first comes to mind…


#49

Agreed. I would maybe rephrase ‘untrusted adversary’ to a ‘lesser trusted adversary’ but otherwise I think the reasoning is ok.

Also there is this reason:


#50

I didn’t explain why this is needed: what we observe currently during the simulations are successive splits that create a completely unbalanced network, for example () => 0 & 1, 0 => 00 & 01, 01 => 010 & 011, 010 => 0100 & 0101, 0101 => 01010 & 01011. Here there are 6 sections: 1, 00, 011, 0100, 01010 and 01011 with distinct prefix lengths instead of similar ones. At one time the shorter prefix needs a merge which triggers a generalized merge of all the sections that recreates the section with zero length prefix ().

And then the cycle begins again. This creates a oscillating phenomenon from 1 section to a few ones.

By selecting the target section for relocation with the shorter prefix, we insure that a section with a long prefix won’t be split and so an unbalanced network is not created.

Of course, if there are several sections with a short prefix then the smaller section must be selected among them.


#51

Nice catch, Thierry. Avoid, at all costs, a cascade merging, the greatest fear, seems the most reasonable option.

And adding a step in the random relocation, forcing the new node to find a key in the neighbouring section with shorter prefix, could be the best of both worlds.


#52

Hey @tfa, thanks for this! @bart is away for a couple of days, so I’ll be looking into this in the meantime. Let me get back to you once I have something.


#53

Hey @tfa,

Your modifications certainly made things better, but it still isn’t enough. For example, we are still seeing sections of ~1000 nodes, which is way too much (ideal section size is between 20 - 30 nodes). We are currently trying different approach and will update as soon as we have some results.


#54

Yes, the algorithm must be improved because sections are still too big and there are too many node rejections, but I don’t observe as big sections as you do. With default settings (no arguments given) I get 71 sections with 154 nodes on average (min size is 33 and max is 431).

This is the result with my fork, but you have made some modifications since the fork, I will try to merge them when I have time, to see how things evolve.

My random seed is: AGE_SEED="[1070056145, 2737779209, 2756500248, 3832683655]"

Complete result is:

Prefix Count
0000000 67
0000001 102
000001 201
000010 238
000011 222
000100 140
0001010 92
0001011 78
0001100 79
0001101 120
000111 224
001000 184
001001 247
001010 136
001011 153
0011000 55
0011001 121
001101 204
001110 153
0011110 120
0011111 99
010000 243
010001 212
010010 119
010011 120
010100 149
010101 177
010110 127
010111 126
011000 134
011001 169
011010 123
0110110 42
0110111 66
011100 132
0111010 33
0111011 88
01111 306
1000000 43
1000001 64
100001 137
1000100 96
1000101 136
100011 256
100100 164
1001010 122
1001011 132
100110 138
100111 146
101000 135
101001 138
10101 428
101100 136
101101 137
101110 215
101111 198
110000 168
110001 230
11001 431
110100 136
110101 136
110110 145
110111 204
111000 132
111001 136
111010 169
111011 219
111100 135
111101 134
111110 177
111111 213
Average 154,75

#55

I have rebased my code on Maidsafe new version and I got similar results. The number of sections is now only 65, which is not very far from previuos result (71). Maybe the difference is due to the new drop probability distribution, which I suppose is a better model of outside world (not under vaults control).

I also noticed that the age penalty for a rejoining node is now less severe (minus 1 instead of divided by 2). Maybe this is a good thing for people who need to restart their vault.

After that I made an experiment and added 1 to the number of trailing zeros so that a relocation is done at each event and not just every other event. The results are better (but still not perfect):

  • 57843 rejections (instead of 70874)

  • 102 sections (instead of 65)

Then I replaced boolean norejectyoung parameter by a numerical one that indicates the number of allowed young peers in a section. Modifying this parameter can generate slightly better results:

Max Young Rejections Sections
1 57843 102
2 56780 119
3 56242 104
4 56002 83

We can observe that the optimum is to allow 2 young nodes (age = 1) in a section.

I have updated my repository with this code and the specific command generate this configuration is:

AGE_SEED="[1070056145, 2737779209, 2756500248, 3832683655]" ./target/debug/ageing_sim --max_young=2

Complete results for this configuration are:

Prefix Count
0000000 129
0000001 260
0000010 193
0000011 286
0000100 139
0000101 269
000011 562
0001000 131
0001001 244
0001010 120
0001011 196
00011000 19
00011001 70
0001101 117
0001110 130
0001111 144
0010000 141
0010001 212
001001 468
0010100 120
0010101 245
0010110 226
0010111 236
0011000 178
0011001 203
0011010 155
0011011 168
001110 437
001111 562
01000000 78
01000001 112
0100001 244
0100010 151
0100011 244
010010 392
01001100 74
01001101 121
0100111 192
0101000 142
0101001 219
0101010 187
0101011 179
0101100 172
0101101 231
0101110 97
0101111 112
0110000 122
0110001 144
0110010 170
0110011 173
0110100 197
0110101 187
011011 440
011100 438
0111010 94
0111011 94
0111100 149
0111101 191
0111110 139
0111111 154
1000000 132
1000001 201
100001 560
10001000 56
10001001 77
1000101 173
1000110 130
1000111 136
100100 420
1001010 138
1001011 192
1001100 131
1001101 170
1001110 77
1001111 114
1010000 141
1010001 217
1010010 175
1010011 280
1010100 61
1010101 124
1010110 120
1010111 166
1011000 131
1011001 158
101101 467
1011100 192
1011101 332
101111 495
110000 392
1100010 168
1100011 192
1100100 177
1100101 163
11001100 13
11001101 48
1100111 69
1101000 131
1101001 160
110101 401
1101100 144
1101101 174
110111 468
111000 444
1110010 205
1110011 225
1110100 189
11101010 110
11101011 151
1110110 154
1110111 228
1111000 142
1111001 148
11110100 65
11110101 133
1111011 193
111110 438
1111110 206
1111111 292
Average 200

#56

Cheers for the testing, just a small point.
I suspect age start at 4 is actually here we want to be. 1-3 is just work for the network, but 4 means a node does some work and the network does minimum.


#57

No problem, just ask :smile:.

I tested with init_age = 4 and different values for max_young parameters. The results were not good.

Then I realized that, in my modification to insure a relocation is done for every event, I must add init_age instead of simply 1. With this modification the results are much better:

  • Rejections: 44357

  • Sections: 2073

  • Average section size: 19.33

But in return the simulation is now much longer to execute (about 53 mn), so I have only tested with max_young=1. I will test other values later. But I think that a great step has already been made.


#58

Good numbers and excellent work.

If possible, can list the nodes by final age?


#59

Some more numbers:

I have tested with different values for max_young parameter and the results are very good:

Max Young Sections Rejections
2360 0
1 2073 44357
2 3642 31845
3 4042 25093
4 4122 21000
5 4108 17795
6 4125 15389
7 4098 13608
8 4088 11936
9 4080 10745
10 4084 9651

There is large plateau with more than 4000 sections. This means we don’t need to be very precise on the number of young nodes allowed in a section, though we cannot completely remove the control, as demonstrated by the first line.

Then I have specifically analyzed the results with max_young=6 (meaning that 6 young nodes are allowed in a section). Here are different metrics computed from the number of nodes in each section, group by prefix length:

Prefix len Count Average Min Max
11 55 27.62 15 48
12 3902 16.58 10 32
13 168 14.10 10 22
All 4125 16.62 10 48

Count is the number of sections having the prefix length (for example there are 55 sections with prefix length 11), and average, min, max are aggregates values computed on the number of nodes of these sections (for example the average size of these 55 sections is 27.62 nodes).

Lastly, for @digipl, the nodes by final age in this configuration:

Age Count
1 0
2 0
3 0
4 14422
5 17373
6 16861
7 9551
8 5422
9 2702
10 1429
11 626
12 149
13 28
14 3
15 1

#60

Excellent work again :slight_smile:

Just wondering what the plateau situation is like in comparison. Is it all length 12? Is there some “control system” mechanism that drives the optional prefix length to be 12 and sections eventually become mostly prefix length 12.

I think it might be beneficial to see that analysis for all above max_young == 6 and at least for max_young == 10