Data Chains - Deeper Dive

Back in the days of bitcoin mining when the total network hashrate was of the order of gigahashing it was hard to believe in a few years the network would be exhahashing as it is now (I was part of mining when it was on the order of terahashes and the continual and sustained growth I see in mining is mind boggling).

So I don’t think 1B adults is unlikely. Not at all.

But what I do think is if that happens then asics will be developed for these sorts of operations (assuming a delayed relocation actually somehow causes loss of income), for generating keypairs and definitely for signature operations. But it may take some time, and maybe the network won’t become that large. It’s all speculation.

Sure, but if it goes the other way up by a factor of ten beyond what was expected then it becomes 5h, then another factor of ten is 2d, and this sort of growth is common in these networks. So I think it warrants some consideration. 2d to relocate a node? I’d be worried.

I suppose the risks of delayed relocation need to be further explored too; maybe delays are relatively benign.

I suppose it should also be considered that the name must match, ie sha3(pubkey), so the hashing slows the search down considerably. I don’t think the /dev/random vs /dev/urandom performance matters in this case so sourcing entropy shouldn’t be a factor in the performance. I’m not saying there is definitely a problem, but I do think it warrants careful thought since I could see it becoming an issue.

6 Likes

We have also tested the distribution of public keys and sha3 hashes, they are pretty much equivalent, so the name will be just the public key and not that hash of that. This means we can ignore a hash cost. Just more info, not disputing we do need to be careful. If it were to become an issue then I feel there are a lot of ways around this that we could adopt. I am looking at some tricks here regarding that very issue for a different reason right now mind you. Anyway hope this info helps.

4 Likes

This means a vault can be both an adult and an elder at the same time. So the category of ‘non-elder adults’ is a distinct category different from ‘adults’ according to these definitions. ie there are 4 distinct categories: infants, non-elder adults, elders, adults (which is non-elder adults plus elder adults).

‘non-elder adults’ is a bit awkward so I propose using four category names

  • infants: age <4 (same as original)
  • adults: age >=4 (same as original)
  • elders: groupsize oldest peers in section (same as original, can be infants and/or adults)
  • bystanders: non-elder adults (open to suggestions for bystander!)

Vaults can be in multiple categories at the same time, eg adult bystander.

I think ‘adults’ is used in a way to mean ‘non-elder adults’, right? ie bystanders in the above categories?

Please can the use of terms be settled and used consistently. Let’s not use ‘adult’ interchangeably for ‘non-elder adults’ and ‘non-elder adults plus elder adults’. This inconsistency is taking a lot of needless effort to interpret the ‘adult’ category when it’s used differently in context-specific ways.

If I’ve missed something and it’s simpler than I make out please let me know because I’m feeling like reading ‘adult’ is harder than it needs to be.

1 Like

Not sure what you mean here. An Adult is an Elder iff it is one of the GROUP SIZE oldest Adults, but it cannot be both at the same time. It may be the definition is worded in a confusing way though.

Here what we are saying that the section has come down to GroupSize Elders and there is no Adult to promote if we were to lose an Elder, so we are in a danger zone and will merge now.

Hope that helps.

[Edit] I think I see the issue and yes it is possibly unclear wording. The reason the definition says ;

Is that, when the network starts there may only be Infants that will make up the Elder group. At this stage though we have an incomplete group and all peers may be Elders. If we have a complete group though then only non infants would be Elders

5 Likes

Ah I see so ‘adult’ is

Adult: in a section with a complete group a peer with age >4 that is not an elder. They are only connected to their section Elders.

Thanks for the clarification.

There’s a difference between ‘adult’ and ‘vault age >4’… this is my misunderstanding. I had thought they were the same but they are not.

7 Likes

Some exploration of the ageing process…

Purpose

The purpose of ageing is to “ensure that nodes are relocated and therefore dilute their impact on the network across the address range” so “that an attacker or large participant in the network has an amount of work that must be carried out that is prohibitively expensive” (rfc-0045 node ageing).

I would paraphrase this as ‘balancing inclusiveness vs restrictions’.

There was an issue with the recently proposed ageing mechanism (ie the first post in this topic) that “made the network accept barely any new peers” (2017-12-14 update).

This post explores one potential cause and solution for the issue.

Cause

The simulation below indicates the issue is “it’s too difficult to age” rather than “too many nodes are disallowed”; the disallowing is a symptom, not a cause.

The ‘problem’ must be looked at from the perspective of ‘a prohibitively expensive amount of work’ required to ‘impact the network’. Extreme difficulty in ageing is counterproductive. Even though it makes it expensive to impact the network it also makes it expensive to participate in the first place. Ageing protections must not come at the cost of inclusiveness.

Making it expensive to join the network goes against the ‘everyone’ part of ‘Secure Access For Everyone’.

The aim of ageing is to make it expensive to impact the network without making it expensive to participate in the network. Maybe this is not the aim? Or not possible to achieve? Anyway, that’s a different discussion perhaps. Let’s look at the ageing mechanism.

Simulation

One of the major changes from the rfc and the new proposal is that originally the oldest possible vault would be aged, but now the youngest possible vault is aged.

The following tables show the difference between a preference toward younger vs older vaults.

A simulation of these two different preferences is shown below. It involves creating a network of 100K vaults (500K joins interleaved with 400K departures).

YOUNGEST PREFERRED

age vaults
  1  29712
  2  67461
  3   2827

100000 total vaults
  6186 total sections
OLDEST PREFERRED

age vaults
  1   4274
  2  16337
  3  38242
  4  21057
  5  10171
  6   4759
  7   2677
  8   1366
  9    691
 10    287
 11    113
 12     22
 13      3
 14      1

100000 total vaults
  1607 total sections

Caveats

Youngest-preferred never hits the disallow rule, since there is never a Complete Section formed.

For oldest-preferred it was necessary to remove the disallow rule because the simulation got stuck at a single Complete Section which disallowed every new vault. I think the disallow rule needs tweaking or removing.

I suspect if the youngest-preferred eventually forms a Complete Section the disallow rule would also need to be tweaked or removed.

Thoughts

The rfc calls for an “exponentially increasing period between such relocations” which youngest-preferred does not achieve (but oldest-preferred does).

Youngest-preferred means an old vault can only age if its section has both a) no younger vaults and b) it ‘matches’ the hash of the network event. This is an almost-impossible condition to meet for old vaults (>3) because young vaults continue to accumulate in the section. It’s much harder than exponentially difficult.

Oldest-preferred seems to better match the intended difficulty of ageing and leads to an ‘expected’ age distribution. Ageing is exponentially more difficult as vaults get older, but is still common for younger vaults to age (ie exponentially easier the younger the vault is).

The key concept being affected by the younger-preferred mechanism is ‘difficulty’. Old (>3) vaults are nearly-impossible to age, rather than merely exponentially difficult to age.

I think perhaps most important is to keep the intended purpose in mind. Simulations show the intuitive approach to ‘age youngest first’ does not achieve the intended purpose. I’m not saying oldest-preferred is necessarily the best solution, but it does seem to me that youngest-preferred does not function as intended because it’s far too hard to become an adult.

Other possible things to try (but I think would not replace the older-preferred mechanism) are

  • age all vaults in the section that match the network event rather than just one, but still only relocate one vault to avoid massive churn.

  • tweak the disallow algorithm to have a gradually-stricter-rate-of-disallowing rather than a strict wall. This allows more chances for relocations and thus more chances for ageing.

It’s complex. But it always comes back to balancing inclusiveness vs restrictions. I think the simulations show youngest-preferred fails to achieve this balance, but oldest-preferred does a good job.

Just something to mull over during the holiday season… looking forward to seeing what comes from maidsafe in the new year.

14 Likes

Good work there again @mav

I might nitpick one sentence. The reason for pointing this out is that it can alter one view on things

The S.A.F.E. is geared for access to use the network and its facilities. Where as being a part of the network itself is not access but a “worker”

The reason for making a distinction is that it is valid to have some who cannot join as a node since they will slow things down and if too many the network is not so useful and slow.

Mind you your posts deals with a much greater problem and is certainly good.

Good work @mav

4 Likes

Nice work again @mav and good pick up @rob Yes this is more farming that participating so a valid point. The network will and probably should refuse any new peer if it does not need it. This is a security issue really to prevent flooding a bunch of new nodes on the network at any point in time by a bad actor. Ageing really helps out there.

I agree with @mav s simulation though, the age youngest slipped in recently and the rfc was forgotten about. It did need more thought before attempting to change the rfc. The answer I feel will be to balance age throughout the network/sections. So something along the lines of

  • Age oldest first.
  • Allow age number of peers of the same age per section

The second point is where the final pieces need to fit really. It sounds like the network wont allow many new nodes, but that is OK, unless it needs them. So then we need to consider this need. This is really defined (IMO) as Peers must be able to hold and process the data clients want to use This is actually subtle in node ageing as the relocation process first ensures a peer can hold the data required for a section. It must also be able to process it somewhat as it needs to keep supplying “stuff” to clients or be killed from a section (and the network), to start again.

So @mav nailed this one (again) and we should not change an RFC component as easy as we did during that last update. Seems obvious but time is our enemy and we need to be 100% before accepting an RFC and then more so if we wish to change a parameter. Nice on Ian, man we must owe you several beers by now :wink:

8 Likes

As do I. I feared that by mentioning the nitpick that it would overshadow how much I agreed with @mav and the good work he did.

And if I can mention so its not forgotten in all this that people adding nodes will desire and have an expectation that their nodes will become a useful part of the network within a reasonable amount of time. You know farming etc.

So there has to be consideration for security foremost but also the ability of new nodes to progress to the point they are farming. Really a “keep the node supplier” happy. (assuming their node meets min specs)

Even though the network may not need new nodes we perhaps should have a “not need” and “do not add more”

So there is a point where no more nodes are needed, then then a point where we cannot securely have any more nodes. This gives a range, so after reaching the point where we do not need new nodes, we are still adding (more slowly the more added) new nodes till we cannot securely add any more.

Thus the network is adding farmers (nodes) even though they are not strictly needed.

6 Likes

Yes, don’t get me wrong the network will not stop peers being added per se’ what it does/should do/ is limit the number of infants to really slowly let adults join. So a mass join effort by bad actors all of a sudden will be thwarted, but slowly accepting nodes means it accepts over a period and not all at once. So a bad actor cannot wait for a quiet time and flood the network, but slowly gets peers added. The longer the duration then the more likely other peers are being added by either good folk or other bad actors (which is OK). So there is a balance, but the limit is to really prevent mass join all at one time by a single entity.

6 Likes

“Excellent” (with the finger tapping)

4 Likes

@bart, I wanted to play with your simulator, but results are not correct with always only one section. Is there anything special to do to make it work or is your repository currently in a bad state?

@mav, you have entirely rewritten the test in Go, that’s very impressive.

5 Likes

I had originally developed the simulation for a few different purposes before the ageing changes came out. I have some big plans for the simulation codebase. Hopefully those results will come to fruition early next year. So far ageing has taken a good chunk of time away from the original intentions (but in the best possible way).

6 Likes

@tfa, Yes - that’s because it simulates an older set of rules that would cause such behaviour. We have changed the rules since then, but there wasn’t time to update the simulator. I’ll probably get to that at some point, but for now we have some tasks with higher priority. Sorry it’s not as useful as it could be!

2 Likes

What part of the OP were you referring to? The “Split” section, “Static group and quorum sizes”, somewhere else?

It’s in the Block States > Dead(PeerId) > Trigger for voting:

“we take the Hash of the event H. Then if H % 2^age == 0 for any peer (sorted by age ascending) in our section, we relocate this node to the neighbour that has the lowest number of peers.”

The bold text indicates younger nodes are preferred for relocation. But rfc0045 says prefer older nodes, so there is a conflict.

I think I have found the problems in the algorithm:

  • The sections that have a shorter prefix should be chosen in priority as target of relocations, even if they have more nodes. This is the main problem and it is only one line to modify.

  • There is a mysterious hard coded rule that suppress node ageing when the event concerns a non-adult node in a section with a prefix length > 4 that doesn’t do any good.

With these modifications the simulation is beginning to fly (with 75 sections created). It is not yet perfect because there are still a lot of young node rejections (70%), but the situation is unblocked.

I also think that relocations implementing a change of owning sections during a split or a merge shouldn’t trigger other relocations in cascade to avoid destabilizing newly created sections. This means that the Gone and Live events generated by a split or merge operations shouldn’t count for node ageing. The result is almost the same: 71 sections but there are less merge events (280 instead of 377) which demonstrates a more stable network. This one is debatable.

I have just uploaded my modifications in my fork, but I haven’t analyzed all the consequences. If you want to study them.

5 Likes

Thanks for the clarification. Instinctively, I would think that you would want to relocate an elder rather than a young node to an unhealthy neighbouring section. The reason for this is that the if a neighbour is already in trouble, you don’t want to send an untrusted adversary its way. This seems like a preferred option as long as the section from which the chosen elder is leaving has enough elders to remain healthy. I may have my thinking backward on this, but this is the sentiment that first comes to mind…

1 Like

Agreed. I would maybe rephrase ‘untrusted adversary’ to a ‘lesser trusted adversary’ but otherwise I think the reasoning is ok.

Also there is this reason:

I didn’t explain why this is needed: what we observe currently during the simulations are successive splits that create a completely unbalanced network, for example () => 0 & 1, 0 => 00 & 01, 01 => 010 & 011, 010 => 0100 & 0101, 0101 => 01010 & 01011. Here there are 6 sections: 1, 00, 011, 0100, 01010 and 01011 with distinct prefix lengths instead of similar ones. At one time the shorter prefix needs a merge which triggers a generalized merge of all the sections that recreates the section with zero length prefix ().

And then the cycle begins again. This creates a oscillating phenomenon from 1 section to a few ones.

By selecting the target section for relocation with the shorter prefix, we insure that a section with a long prefix won’t be split and so an unbalanced network is not created.

Of course, if there are several sections with a short prefix then the smaller section must be selected among them.

5 Likes