David has been know to make devs eat Haggis for using “magic numbers” like that. Somebody may be lucky it wasn’t production code!
Or somebody likes Haggis…
In the first tests I made the same mistake. Only when the results seems so strange ,I dig deeper into the code.
Basically, changing the Group_size, the only significant difference, in the metrics, is the number of sectors.
For example, move from Group_size 8 to 5, double the number of sectors and the average prefix len goes from 11.02 to 12.04.
Of course all these data are with a strong growth. Will have to test in situations of more stability.
Hi, @tfa , just raised a PR into your repo to sort out the print outs little bit.
And, just wondering maybe that https://github.com/Thierry61/ageing_sim/blob/master/src/network/section.rs#L191
shall be let node_to_age = self.choose_for_relocation(trailing_zeros + params.init_age - 1);
?
OR
https://github.com/Thierry61/ageing_sim/blob/master/src/network/section.rs#L164 to be .filter(|n| n.age() < age)
?
Otherwise a relocation will always happen on a churn?
Thank you for this.
I have then uploaded a new version with better cells formatting so that tables are indented and justified also in the console window and not just in markdown editor.
Now, if you select a console height of about 23 lines, you see a fixed panel as long as you have 2 or 3 distinct prefix lengths (tested on a Cygwin terminal).
This is something I added 3 days ago to improve network growth:
With trailing_zeros + params.init_age – 1
, init_age=4 and max_young=1, the result after 100_000 iterations is:
Metrics | Values |
---|---|
Adds | 89894 |
Drops | 7012 |
Rejoins | 3094 |
Relocations | 57141 |
Rejections | 64631 |
Churns | 215930 |
Sections | 1038 |
Section nodes | 18901 |
Left nodes | 2853 |
Prefix len | Count | Average | Min | Max | Standard dev |
---|---|---|---|---|---|
10 | 1010 | 18.37 | 14 | 24 | 0.92 |
11 | 28 | 12.29 | 8 | 23 | 3.02 |
All | 1038 | 18.21 | 8 | 24 | 1.43 |
Number of sections is 1038 instead of 2073 and number of rejections is 64631 instead of 44357. This is a lot, but I can do this if you confirm this is needed for the network security.
The way this is designed is that older nodes will seldom relocate, (relatively) we will reject a lot, but I feel that is fine actually. I am not sure new nodes will always be waiting at every section. We have another sim in the background which Adam is working on. This will allow us to create many random numbers from each Live
event by repeatedly hashing the event (settable). This will simulate and may replace using mutation events (like Put) as well as Live events. These nudge the network to grow as there is a need (when trigger is put especially).
So the Hash(Event) % 2^age == 0
applied to each peer in decreasing age and relocating the first one that matches the test will possibly slow down relocation (i.e. create more rejections) but is likely more secure but may delay nodes joining for perhaps even a few hours (or more). I am not sure this is bad as it is more a test of the nodes up time and is probably still faster to join than downloading a block chain (for now anyway).
Hope this helps.
David, have you tried tackling this optimisation using a Genetic Algorithm? I can see difficulties: you might need a Connection Machine to do it in a reasonable time, and designing the fitness function is non trivial, but I wonder if it is a suitable problem.
With a method like NSGA-II you can use multiple objectives directly rather than trying to cook up a meaningful scalar fitness. Things get a bit tough to visualize after 3 dimensions though.
Thanks. I am sooo out of date on this
Rust impl, worth forking.
Examples of grid compatibility
Comparisons
http://www.ias.ac.in/article/fulltext/sadh/037/06/0675-0694
It is hard to swallow because in the name of security we both set max young to 1 and relocate only on every other churn event. Globally these two elements divide the number of sections by 4 (from 4122 to 1038) and multiply the number of rejections by 3 (from 21000 to 64631).
But I think I understand why: the rejection rate is a security factor for a relatively small network. It can only be reduced if the network grows, which happens naturally because more sections means more available slots for young nodes.
In addition to the rejections (which are accumulated from the start of the network), I will add the instantaneous rejection rate in the displayed metrics. I guess it is the number of sections that cannot accept a young nodes divided by the total number of nodes.
Yes and I agree it feels a bit strict. It is more secure AFAIK and I feel this algo will develop over time where it should become more lenient. Ultimately we do want even very short lived nodes to be of value and earn. Initially though this is possibly safer.
This will be very interesting as well. Thanks again @tfa
The rejection rate has been added in my repo.
I have run a simulation with 890_000 iterations but the result is disappointing: I had hopes that the rejection rate would slowly diminish with the size of the network but that’s not the case and it remains stable with small oscillations around 70%.
Iteration | Sections | Rejection rate |
---|---|---|
0 | 1 | 0% |
10000 | 133 | 73% |
20000 | 262 | 70% |
30000 | 320 | 66% |
40000 | 530 | 73% |
50000 | 526 | 74% |
60000 | 615 | 68% |
70000 | 924 | 68% |
80000 | 1052 | 73% |
90000 | 1045 | 72% |
100000 | 1038 | 71% |
110000 | 1033 | 71% |
120000 | 1189 | 69% |
130000 | 1552 | 67% |
140000 | 1768 | 71% |
150000 | 2036 | 69% |
160000 | 2102 | 73% |
170000 | 2098 | 73% |
180000 | 2090 | 73% |
190000 | 2086 | 72% |
200000 | 2079 | 72% |
210000 | 2074 | 74% |
220000 | 2068 | 72% |
230000 | 2084 | 70% |
240000 | 2246 | 68% |
250000 | 2612 | 68% |
260000 | 3018 | 68% |
270000 | 3321 | 69% |
280000 | 3551 | 70% |
290000 | 3707 | 70% |
300000 | 3903 | 71% |
310000 | 4111 | 70% |
320000 | 4191 | 72% |
330000 | 4199 | 70% |
340000 | 4199 | 72% |
350000 | 4194 | 72% |
360000 | 4188 | 72% |
370000 | 4177 | 73% |
380000 | 4171 | 72% |
390000 | 4166 | 72% |
400000 | 4158 | 71% |
410000 | 4150 | 72% |
420000 | 4140 | 72% |
430000 | 4138 | 71% |
440000 | 4135 | 73% |
450000 | 4132 | 72% |
460000 | 4134 | 72% |
470000 | 4182 | 72% |
480000 | 4300 | 70% |
490000 | 4531 | 69% |
500000 | 4851 | 68% |
510000 | 5229 | 68% |
520000 | 5608 | 68% |
530000 | 5979 | 69% |
540000 | 6330 | 68% |
550000 | 6586 | 70% |
560000 | 6870 | 69% |
570000 | 7081 | 69% |
580000 | 7266 | 69% |
590000 | 7451 | 69% |
600000 | 7689 | 70% |
610000 | 7892 | 70% |
620000 | 8111 | 71% |
630000 | 8272 | 71% |
640000 | 8355 | 71% |
650000 | 8395 | 71% |
660000 | 8393 | 71% |
670000 | 8391 | 72% |
680000 | 8393 | 72% |
690000 | 8388 | 71% |
700000 | 8379 | 72% |
710000 | 8368 | 73% |
720000 | 8360 | 73% |
730000 | 8347 | 72% |
740000 | 8341 | 72% |
750000 | 8333 | 72% |
760000 | 8327 | 71% |
770000 | 8318 | 71% |
780000 | 8311 | 73% |
790000 | 8302 | 73% |
800000 | 8299 | 72% |
810000 | 8294 | 72% |
820000 | 8293 | 72% |
830000 | 8286 | 72% |
840000 | 8274 | 72% |
850000 | 8272 | 71% |
860000 | 8272 | 73% |
870000 | 8266 | 72% |
880000 | 8262 | 72% |
890000 | 8258 | 72% |
This is good to know. I really suspect though that we are trying very hard to add nodes, Initially this may be the case, but then I suspect fewer nodes will try and join, However as the network needs more nodes it should get them. This is where the mutation events will help us in ageing nodes faster. So we will need to dive in there a little, the rate of trying to add will diminish over time.
We need to think about that a bit more, as nodes are needed due to the data getting large in a section then we want it to split, so more new nodes / sections will distribute the data better. I think we are missing a tiny bit here. Even with a 70% rejection rate though it is probably not to bad, it is just a few attempts to start (3 or 4) and requesting to join is not all that arduous I suppose.
(still thinking a bit about this one). There are a few of us spending a weekend down at my place this weekend to get space to dive into this a lot deeper, but its a full agenda for the weekend, republish, sgx etc. (https://github.com/baidu/rust-sgx-sdk - interesting to add security, but limited as well there is also arm trust zone etc.), secure gossip, upgrades and some more. We will spend a while on this as well in a focussed way. I am keen to look a bot more at the assumptions of this sim and the updated one we should publish tomorrow as well, these probabilities of add/rejoin/join etc. I think are close but possibly not as real life as it could be.
[edit, the join rate will also be affected by safecoin, when the network needs nodes it will increase payments and when it does not it will decrease them, so perhaps that also should be factored in somehow, well if it can]
I don’t want to send the Data Chains thread off topic… but just in case you haven’t seen it yet, this recent paper may provide some additional food for thought:
Malware Guard Extension: Using SGX to Conceal Cache Attacks
" Our attack is the first malware running on real SGX hardware, abusing SGX protection features to conceal itself. "
According to the whitepaper, one way to prevent an SGX attack is to use side-channel resistant crypto implementations. Libsodium, fortunately, is implemented, from the beginning, to be extremely resistant to these attacks.
But every day it is more evident that modern CPUs have more holes than a Gruyère cheese.
This factor is, in my view, fundamental. The success of bitcoin comes more from an intelligent application of game theory than from technical implementations.
In the same way, the design of both, the farming rate and the initial distribution, must be adjusted as a reinforcement in security, especially at the beginning when the network is most vulnerable.
From the point of view of an engineer may seem a sin, but, I am convinced that a good advertising campaign and a distribution that incites greed is, for the security of the network, much more important than an ultra-optimized algorithm.
I agree with this principle, but you need to lower the rejection rate to implement it. I also think that a greater number of sections is a better security measure than a greater rejection rate.
To facilitate experiments, I have added a parameter to accelerate or slowdown the relocation rate. Base current formula was: Hash(Event) % 2^(age - init_age + 1) == 0. This parameter modifies it to Hash(Event) % 2^(age - init_age + 1 - relocation_rate) == 0 where relocation_rate = 0 for standard rate and 1 for aggressive rate.
-
Standard rate keeps base behavior with a relocation triggered every other event and with the following probabilities of relocation: 50%: no relocation, 25%: init_age, 12.5%: init_age+1, 6.25%; init_age+2, …
-
Aggressive rate triggers a relocation every event and each probability is shifted one step to the right: 50%: init_age, 25%: init_age+1, 12.5%: init_age+2, 6.25%; init_age+3, …
IMO, there are 2 main models: maximizing or not the number of sections. To have a clear view of the big differences between them, I have launched 2 simulations with 890_000 iterations:
-
the first simulation allows only one young and uses standard relocation rate
-
the second one allows 4 young nodes and uses aggressive relocation rate
Here are the results:
Metrics | Simulation 1 | Simulation 2 |
---|---|---|
Adds | 800814 | 810134 |
Drops | 62523 | 62976 |
Rejoins | 26663 | 26890 |
Relocations | 503815 | 1285360 |
Rejections | 578686 | 195233 |
Churns | 1912122 | 3507968 |
Sections | 8258 | 33405 |
Section nodes | 161447 | 564058 |
Left nodes | 26287 | 14370 |
Rejection rate | 72% | 26% |
4 times more sections will make things harder for an attacker to control a section because he needs much more nodes. This is better than having a greater rejection rate because the attacker has just to retry to connect each node more often but the number of nodes he needs is not impacted.
Another argument is that a greater rejection will deter some casual users from connecting a vault, which will lower the total number of nodes and so will increase the proportion of attacking nodes.
Additional info: section distribution after 890_000 iterations for simulation 1 is:
Prefix len | Count | Average | Min | Max | Standard dev |
---|---|---|---|---|---|
13 | 8126 | 19.67 | 14 | 26 | 0.84 |
14 | 132 | 11.91 | 9 | 20 | 2.10 |
All | 8258 | 19.55 | 9 | 26 | 1.31 |
for simulation 2:
Prefix len | Count | Average | Min | Max | Standard dev |
---|---|---|---|---|---|
15 | 32141 | 16.96 | 10 | 37 | 2.31 |
16 | 1244 | 14.99 | 9 | 26 | 2.77 |
17 | 20 | 13.55 | 11 | 19 | 2.42 |
All | 33405 | 16.89 | 9 | 37 | 2.36 |
I suppose the downside of sim 2 and the greater number of sections is the much higher churn (ie network overhead)?
There are more churns in simulation 2 simply because the network is bigger, and it is bigger because there are less nodes that cannot connect. So, no downside in this area.
The number of churns is even smaller if we look at the figures when simulation 2 had a similar number of nodes:
Metrics | Simulation 1 at step 890000 | Simulation 2 at step 260000 |
---|---|---|
Adds | 800814 | 234114 |
Drops | 62523 | 18053 |
Rejoins | 26663 | 7833 |
Relocations | 503815 | 376067 |
Rejections | 578686 | 56978 |
Churns | 1912122 | 1021403 |
Sections | 8258 | 8358 |
Section nodes | 161447 | 163394 |
Left nodes | 26287 | 3997 |
Rejection rate | 72% | 27% |
Simulation 2 is much better than simulation 1, except in the number of distinct prefix lengths which varies from 3 to 4: the values are better (I mean prefix lengths are greater) but they are slightly more dispersed than in simulation 1.
So, I improved the program to make sure that when a node is relocated to a section it is placed in the section half having less adults. The simulation is running right now. Results are expected in a few hours.
Result with the new algorithm to balance the sections surpasses all expectations: there is only one prefix length, except during the period of transition from one prefix length to the next one where there are 2.
As a consequence, the number of section is now a stepped curve equaling exactly a power of 2 between each transition (in green):
The rejection rate has large oscillations with minimum values at transitions:
The number of nodes is slightly better:
But the real gain of the new algorithm is the homogeneity of the network.