Secure Random Relocation

rob · January 10, 2018, 12:42am

In point 4 of the drawbacks

You seem to be missing part of this sentence. “with the an amount of time”

rob · January 10, 2018, 1:10am

Excellent work again @mav

I think this is an area of your excellent proposal that needs consideration.

In stats as you likely know there will be in a large enough sample sections, there will be sections that end up getting few relocations to it.

So I feel we would need to consider methods to quickly relocate vaults that end up in very healthy sections. So maybe after a short time (so many section events) in a section with more than enough nodes the vault is relocated again.

Thus there is a tendency for vaults to end up in the sections actually needing more nodes.

So there would need a mechanism to stop the vault being relocated too many times because of this.

dirvine · January 10, 2018, 1:24am

HDKeypairs is an excellent idea, it satisfies another area we have not covered yet. So very cool indeed.

I think part 2 is extremely promising as a randomising step for relocate. I need to think further though (trying to sleep of the flu just now). The depth can be calculated based on the age actually (the age will increase by 1 on a relocate, potentially allowing age to be a depth factor).

The trigger is important, we already have the mod division trick with PeerLost type events and that is fine, but does not allow the network to grow properly. Put and Update of data though is interesting. @Viv and I were discussing this earlier today and it does allow much more/faster ageing, but less likely to be game-able, as long as we do not use mod division of an event hash for that part. I mean not allow a client to do an action that it knows the trigger value will age one of its own peers. However for data we have all the items and versions already so no need for mod division or traversing the chain.

Note: On that traversing part, just imagine for now data floats to the top of the chain, so each node joining gets all the data and must sign it, thereby we know its resources are good enough and we have its sigs for the new elders (if it were to be an elder). So resource proof is actually testing if a peer cna do the appropriate work, pretty nice. Anyway it means all the data is signed by the current elders and not spread through the chain. That allows many things like efficient pruning of the chain etc.

From our current perspective though it means we can say we have X data items and Y versions (which we can add for instance) to figure out the trigger for Put based relocate triggers, which would be in addition to churn triggers (which are good for DOS mitigation amongst others).

So with these triggers the hash of a churn event or Put/Edit request (which costs safecoin, another nice effect) we have a random relocate, however as with the current routing master (i.e. tested in the field) it would be nice perhaps to also allow balanced relocate. This gives the best of both worlds for new/small networks, but less so as the network grows.

This last part is the point of discussion here I think, whether to drop balanced relocate or not. There are strong arguments for doing so from this proposal, but we have seen issues with small network in testnets. However perhaps we need to start much larger testnets before allowing churn (i.e. we invest and start several thousand nodes instead of 100 or so).

I hope these initial thoughts help, when my head is more clear I will make much more sense, honest

Anyway extremely good proposal Ian, I like it a lot and as they say in Galway, it not a million miles away from Mary Heurons

Great when the proposals come from folk who understand eventual consistency, its like a breath of fresh air. I suspect this is very close to what is implemented actually, but the HDkeys do need a bit of investigation, but it is defo something like this I was needing for another bit further down the road, so thanks a lot for that one.

jlpell · January 10, 2018, 1:29am

I’m still not up to speed on all the details of how the network implements relocation.
However, based on my intuition @mav’s description along with @rob’s comments sounds similar to poisson disk sampling. In terms of geometry, poisson disk sampling has a lot of nice properties, and might be interesting to consider from an XOR space perspective.

https://www.jasondavies.com/maps/random-points/

mav · January 10, 2018, 6:05am

Predictable depths such as age (or constant 0 as per the original post) are not going to be secure for this because it allows brute-forcing not just of the current section but all future sections!

So there must be randomness in the depth. This adds a lot of strength since brute-forcing becomes much harder.

Predictable depth using age for depth (bad)
Age  Derivation Path  Possible Keys
1    m/1              1
2    m/1/2            1
3    m/1/2/3          1
...
n   m/1/2/3.../n      1

Random depth r between 0 and R (good)
Age  Derivation Path  Possible Keys
1    m/r1             r
2    m/r1/r2          r^2
3    m/r1/r2/r3       r^3
...
n    m/r1/r2/r3../rn  r^n

R can be chosen depending how much work is desired to prevent brute-forcing the future possible sections.

tfa · January 10, 2018, 9:36am

We can have both if the destination section generated from your proposal (either 1 or 2) changes the relocated address to another one belonging to a neighbouring section of the destination section more in need of a new node. I think something similar is currently implemented in routing code.

So the choice is not limited to relocate a node to a either a neighbouring unhealthy section or a random distant section. We can have both if the destination section changes the address to the least healthy of its neighbours (and itself).

happybeing · January 10, 2018, 10:30am

Superb @mav, wish I had something relevant to add but I do love reading this stuff and I guess some of it sticks. Well done yet again.

It’s also reassuring to me - and others I’m sure - when we have this level of dialogue with David and the team. It’s more valuable than any white paper, in judging the health of the project.

dirvine · January 10, 2018, 10:57am

So back out of bed for a wee while. From what I can see we have a few options, like much of routing design these days its choosing 1 of a few, most of which reach a good level of security. I will try and break it down a wee bit to keep the convo on track. @mav I don’t want to burden you here so don’t feel any pressure to back off or get more involved, just be where you are comfy, but your input is superb, really superb.

Relocation Trigger

This should alert the section that a peer of a certain age should relocate. So far we have simulated using churn only and this is not enough really to allow growth of the network as it ties in membership with age and tends towards stagnation. So adding in something else that is non deterministic but random needs to be found. This is not a case of the group creating random data as that is too hard and needs a lot of rounds of agreement plus deciding who goes first creating that etc. Basically it is much better to find randomness in normal operation, that is fast secure and reliable.

For churn randomness we use a mod division trick, so mod div the churn events hash with node age (oldest to youngest). This is good and means we do not need to traverse huge chunks of the chain to look at how many events each node has been around for and make calculations on the exact count etc. So the mod div woks a treat her, especially as nodes are unlikely to win anything by self sacrifice in favour of another node that is an attacking partner in a section. So nice and neat, but not enough events to allow age to grow.

So we add in another event series to allow this, the obvious one is Get but that is game-able as an attackers clients can Get stuff a lot to have age grow for attack nodes in a group (not as simple really, but can happen). Therefore we choose mutation events. These are good as they cost safecoin to happen. Now though if we use the mod div rule here the attacker would manipulate a data item to cause the relocate, so we cannot do that. Instead we actually count data items stored and the version numbers of them all. This gives us a nice hard to game solution.

There are some small issues here though as this is ever growing so needs a small trick to deal with young nodes who will have an age less than total data, so this count has to be from a node appears until it ages and not simply total count of data if you see what I mean. Anyway that is a small (but important) issue.

Relocation mechanism

This is where there are a few ideas that complete/compliment and I think we can break them down as this.

Use the relocation trigger hash as random input to Algorithm.

Algorithm A - Use HDkeys

Here the node will be given some data and have to use this to create (very easily) a new public key based on this input data. This removes work from the key generation,

Pro - less work for node [ but we are not sure the work required is all that difficult anyway ]
Con - Does not allow balanced relocate [ We are not certain balanced relocate is required in large enough networks ]

Algorithm B - Use normal keys and force a range

Here we tell the node to create a key in a range of a section we will relocate them to. As we are defining the destination the network can target the section the node goes to.

Pro: Can allow balanced relocate [ again in large network may not be required]
Con: May be difficult when network is huge [ is this good or bad? ], also gives attacker some “space” to create a key in, they may find a pattern there that benefits them, such as selecting keys that will all split together (but we defeat that right now using the internal group range to locate the node to)]

These last 2 points are the one that we need to dig deeper into really, I don’t see any of them weaker than the other right now so we should dive deeper. I suspect the more scope an attacker has to create keys then that is the worst, but that opinion could rightly be challenged.

I hope this helps.

[Edit, in terms of key generation work, if we do switch to dalek rust lib for this (which I want to) then we can create approx 32 million keys per second on a commodity pc, which I think we can improve on, so I do not think that the “work” is going to be a blocker in any case. ]

mav · January 10, 2018, 10:05pm

I have added step 3a to the HDKeypair proposal

3a (option targeting step). Instead of generating one key at the random depth (eg key 185) the section can generate ten keys starting at the random depth (eg keys 185, 186, 187… 194). The ten sections corresponding to these pubkeys can be checked for their health and the one with the most need for a relocation can be selected.

Can Neighbours Be Removed?

The HDKeypair proposal requires eliminating the idea of neighbours. Is that even possible?

The good side of neighbours

Neighbours is a simple way to determine ‘the right to negotiate’. It’s easy for a section to check if a proposed relocation is valid, since it will come from a section in that neighbourhood. This neighbourhood check is very simple and fast (one bit different in the prefix).

Relocation requests can’t be spoofed because they’re signed by the neighbouring sections elders. Since all neighbour elders are already known by every neighbouring section it’s easy to verify their signatures without needing any extra network activity.

The bad side of neighbours

The problem with neighbours is relocations require brute-forcing a keypair that matches the neighbour section. There’s no simple operation to an existing privkey that allows it to produce a neighbouring pubkey. So brute force is required which scales exponentially and may cause problems for relocations on large networks.

Replacing Neighbours with HDKeypairs

Determining ‘the right to negotiate’ is very important, since it filters out the need for a lot of network activity. Only valid requests need to undergo the negotiation process. Random relocation means the neighbour check for validity of a relocation is no longer relevant; the request may come from anywhere on the network. There must be a fast local-only way to judge whether an incoming relocation has the right to negotiate.

This can be achieved with HDkeypairs. The request to relocate will contain the target vault’s current pubkey, current hdpubkey, and random depth chosen by the section. This allows the new section to derive the new pubkey and confirm it belongs to the section. If that derivation doesn’t fit the section, ‘the right to negotiate’ is lost and the request is rejected. If the derivation matches, the section can confirm the random depth is chosen correctly (maybe involving a datachain lookup) and the elder signatures match the elders of that section (via network lookup) and this would confirm the relocation validity.

This makes it hard to submit fake relocation requests to a section, even though the section has no idea who the elders are in the requesting section. If an attacker does decide to brute force a relocation request that’s valid but for a non-existing vault, the request would still be discarded after some network lookups to discover if the request contains data for vaults that don’t exist. The cost to create a fake request is much higher than the cost needed invalidate it (ie the cost of network lookups).

So I think whilst neighbour based relocation is great because it’s impossible to perform spoofed relocation attacks, random relocation via hdkeypairs is ‘very hard’ to the point where it should be unviable (needs further investigation of cost / benefit to this attack though).

Scaling performance as the network grows

I’ve been advocating very strongly to make it ‘easy to join’, but here I’m taking a step back and considering the other side (just leading with this disclaimer to avoid any confusion from mixed messages).

The number of neighbours grows with network size, and thus the number of connections to maintain also grows with network size.

There are also more hops required for every request to get to their destination. This takes more work to process each request.

So I think there is a strong case that as the network grows it should require more work before being able to join and participate, since there is more work required for completing the day-to-day operations. The added difficulty of brute-forcing longer and longer prefixes is actually not that bad, since vaults need to have higher performance just to do regular network operations.

Even though the individual actions that provide end-user operation don’t change in difficulty, the sum of those activities across the network does increase. So the individual actions must always be getting faster as the network grows to provide a consistent level of performance to users.

If vaults never improve their performance the network gets slower as it grows.

I do feel there’s a need for measured performance improvements of vaults as the network grows. This may come via punishment for not performing well enough, but maybe can be a little more proactive than that.

That’s not to say performance needs to improve exponentially (as required to brute-force pubkeys). It’s also not to say the fundamental operations should have built-in unscalable design.

My current thinking is the best balance for this problem is to implement constant-time fundamentals (eg HD key generation) with adjustable difficulties pinned to those operations (eg adjustable-difficulty PoW or PoStake or PoSignature or proof-of-X).

I’m proposing a decoupling, but I haven’t yet decided on words for the concepts being decoupled. This is still a primitive idea in my mind. But I can agree performance improvements need to be considered alongside improving inclusiveness.

As @Viv said in the other forum, “Allowing everyone to join with ease while it sounds great might also not be the best for the network if these nodes didn’t help much but increase the hops for general operation.”

dirvine · January 10, 2018, 10:29pm

This is interesting and I agree, to add the neighbour based mechanism also does something else though. It makes secure message xfer … well secure. As a message goes from group to group (log n groups) then we can transfer messages across the network securely. This can include the members of a remote group amongst others (basically anything). So just to note that.

100% agree.

This is something we are aware of and I advocate as well. We talk a lot of small nodes joining and I agree, but also caveat that a small node in 5 years would be a supercomputer today (with moors law etc.). So I see your notion/dilemma/observation here but just a note that this network cannot design or accommodate fro today, but for tomorrow. So if the network grows huge in a few years the minimum requirement may seem high today, but tiny then. Like a RP3 is very powerful today compared with the rpi1. Never mind the efficiency savings code will make. Again though just a note.

So upgrades to hardware will follow network growth, if that makes sense.

In the near future I see dynamic group sizes with varying requirements on ability, for now though I think this decoupling would be something we strive for. Not to delay launch as it wont, but to make the system simpler. I a hugely behind breaking things down and decoupling (even really enticing coupled algorithms).

Anyway, this is not so much an answer as a bunch of notions, that I hope helps.

rreive · January 11, 2018, 1:18am

poisson in French means fish. adapted to the SAFE network, what might works is we might want to think about “schooling” or containing the boundaries of the randomness generated by the poisson random distribution with advice (current and forward trending/predicted state of areas of the network) gleaned router information which considers a larger statistical sample of current network sickness or health in a randomized sampling of areas, ie- a couple of newly define areas which show up as sick areas would be also included in the bigger school with more than a couple of healthy areas with all of them treated together as the overall collective target (school) of possible candidates for relocation of the vault, so depending on the degree of sickness or health you can have the network randomly skew “the spew” of new vaults to the newly created school of targets, then have the network randomly and shortly thereafter take another random set of area samples of sick/health, which do include nearby areas as well sick and healthy to create a new set of targets for random vault relations

The key is to keep changing up the schools… in random time intervals

We did this to great effect @ Cloakware when I worked there to essentially create new builds of target mobile devices with different hardware and software associated locks for every release randomly distributing the distribution of the devices to different geo locations and through different channels, all the time at random intervals, so that even if one device got cracked (and it never did) is was would have to be with a huge supercomputer… Cloakware is now Iredeto (a Dutch/Chinese firm with the brains in Ottawa , Canada) but they have kept the Cloakware brand name, back in 2007 and 2008 we were doing this for the biggest company names that start with A and N for millions of mobile devices and collecting hack free bonuses per quarter doing it) The basic premise is the same only it would be done in real network time for SAFE, so this Poisson approach could definitely work for SAFE with some clever integration into what the routers know at the time visa vis network sickness/health to create a random fair relocation of the vault keeping anyone from dominating to win more Safecoin… again my 2 cents, nice suggestion! it surely would fit with what @mav is suggesting R2

Fraser · January 11, 2018, 9:38am

An interesting point for sure.

From my own perspective, I have a notion that this might better suit vaults with a track record rather than newly-joined ones. What I mean is that if we’re able to have the network maintain health metrics on sections of the network, it’d be well placed to identify optimal vaults to relocate so that the best balance is met (i.e. it doesn’t relocate an exceptionally healthy vault to a weaker area if that would cause the source section to become sicker than the target one)

Relocating vaults in any non-random way though always makes me twitchy! (in-house, I’m less a fan of targeted relocation than some of the rest of the team I think) It opens the door to attackers just that little bit wider

digipl · January 11, 2018, 11:02am

I’m wrong or are you proposing to completely eliminate the concept of neighbourhood in the Safe network? (Basically change an essential part of the Disjoin Sections RFC)

Because the concept of neighbourhood, as I see it, is more than “the right to negotiate”. It is the cornerstone that structures the entire network and, among other things, allows:

.-An uniform distribution and an structuration that allows to know the distance between groups, the right directions to send messages and the number of hops.
.-An split or join, of sections, with clear rules
.-A growth of the route table known in advance (one more section each time the network doubles in size).
.-An approximate idea of the total size of the network

I’m concerned about the possible negative consequences of such a fundamental change.
In fact, I still do not understand why can not use HDKeypairs for relocations (brilliant idea) while maintaining the concept of neighbourhood.

rob · January 11, 2018, 12:03pm

I saw it as removing the neighbourhood (limitation) for relocating a node. This is what the proposal is about.

I didn’t see any need to remove neighbourhood for the other functionality/workings.

Qi_Ma · January 11, 2018, 1:07pm

@mav : a superb post again

just wondering regarding the brutal force key generation part.
If the generating time is concerned, a vault could generate key-pairs in background thread during free time and pick-one that fits the relocation requirement.
So basically, with small sized network, the key-pairs is generated on-demand; with large sized network, key-pairs are prepared in advance.
This way, the storage capacity is also somehow get evaluated.

Also, as the vault needs to prepare key-pair each time got a prefix length changed, or after a relocation. So the vault is also under-taken measured performance improvements as the network grows while carry out normal network duty ?

Fraser · January 11, 2018, 1:56pm

HDKeypairs is news to me, so I have a bunch of research to do there, but on the face of it sounds like a great solution. I’m personally in favour of random relocation for much the same reasons as @mav (reduced target address space and key generation expense), but as with many topics there’s loads of debate in-house about it. I agree with the keygen calculations in Data Chains - Deeper Dive - #28 by mav and actually wrote a quick test to confirm these.

Perhaps as @tfa suggests, a combination of random then targeted would be OK, although any deviation from pure random makes me uneasy.

There was an idea a while back (not even sure if it was discussed in-house, let alone properly documented) to have the section agree a random 256 bit number and this would be XORed with the relocating peer’s public key to provide its new address. This has the drawback of probably requiring the random portion to be part of the peer’s name, doubling the name’s size, but otherwise is conceptually similar in many ways to @mav’s proposal.

I’m less convinced about this though:

I think a single malicious vault would be able to wait for all its peers to present their random bytes, then craft its own bytes to suit whatever target address it wants. As @dirvine mentioned, the hash of a churn event or Put/Edit request is probably a better source.

Viv · January 11, 2018, 2:26pm

We were discussing this in the other forum too couple days back. but in that post was also meaning recursion for balance after the first hop.

Still think this might have to almost be an AND of those factors rather than OR personally to prevent stalling, but want to see how the simulation results come out to give that some context. Hopefully @bart gets those soon-ish

mav · January 11, 2018, 10:53pm

No. Sorry I missed a vital bit of context to the post:

“The HDKeypair proposal requires eliminating the idea of neighbours for relocations.”

Thanks for bringing up this ambiguity.

The current secure messaging using neighbours would still apply to all other messaging.

I admit to slightly overreaching on the relocation request idea. Using a different messaging mechanism is probably not necessary and it’s probably better to just use the existing messaging. But I wanted to put the idea out because who knows what it might influence in other ways.

Agreed. Key generation can happen in the background which is a nice property to have, especially since unused keys may be kept and used later if needed. No work need ever be discarded.

But there’s no bound on what keys are needed, since neighbours may be found recursively so it’s not enough just to generate a key for each current neighbour section.

This means there should always be a constant background thread finding keys. That seems a bit ugly.

It eventually leads to the unpleasant idea that if the network becomes very large there’ll probably be a ‘market for keys’ pop up where you’ll be able to buy a key matching the relocation section from someone who has brute forced it on powerful specialized hardware. This breaks security because the private key is no longer secret. Who would buy a key?! The same people who set their bitcoin transaction fees to 0 then wonder why ‘it doesn’t work’… I think someone just wanting to earn safecoin would gladly buy a precomputed key if it means they can remove a roadblock to earning more safecoin. Overall I think the pubkey-brute-force concept potentially comes with some very ugly incentives.

My main point with hdkeypairs is not so much ‘can it be made cheaper’ but ‘we should use constant time instead of linear or exponential time algorithms’ where we can. It’s a scalability thing, not an economics thing.

True, but the performance for ‘generate a keypair’ isn’t really meaningful to the day-to-day requirement of ‘managing data’. So I think separating ‘measuring performance’ from ‘network operations’ is a good idea. It’s a bit arbitrary to use keypair generation as measure for performance.

Good point. Agreed.

I have also done a quick test using iancoleman/ed25519_brute_force (which uses rust_sodium).

2014 laptop  i7-4500U @ 1.80GHz - 5851 keys per second

2017 desktop i7-7700  @ 3.60GHz - 9119 keys per second

Might tidy up the reporting a bit and put it on the main forum to crowdsource some data.

dirvine · January 11, 2018, 11:42pm

I suspect this point is critical in your proposal and not really understood.

We did a bit of testing today and 10,000 keys per second with dalek (nightly) seems reasonable. Be nice to test that on a rpi3, I may run one up.

Also using the hash of mutation events and churn events will give us the random data for depth. This will be an agreed (consensus) ransom data element.

rreive · January 11, 2018, 11:54pm

Its all about recycling the randomly generated distribution map for the vaults needing redeployment in a randomly timed way and randomly determining geography for the new vault deployment, so the layers of the onion skin are always changing…