How are key collisions handled?

DaBrown95 · March 22, 2018, 6:27pm

I have a relatively simple question with most likely a very complicated answer… How does the SAFE Network handle address collisions.

Just to set the scene, when data goes through self-encryption the datamap contains the post encryption hashes of the chunks, these hashes correspond to the address of where the chunks will be stored on the network and is how they are accessed. What happens if there is a hash collision between two chunks of data? Where are the two chunks of data then stored?

Thanks in advance

hunterlester · March 22, 2018, 10:14pm

I’m under the impression that this is handled witth data deduplication.

In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk.

See the self_encryption readme as well.

Someone please correct me here (@Nadia )

dirvine · March 22, 2018, 11:07pm

This is all about probabilities, so secure hash means likelihood fo a collision. The more secure the less likely of a collision. Avoidance is easy, but expensive, it means hash everything and compare, but it si not-needed with secure hashing (As we use).

tl;dr

We could if we wished, but we imagine it is not needed.

digipl · March 23, 2018, 10:43am

Let’s do some numbers…

SHA3 256 offer 128 bits or 3.4e+38 collision resistance.

If each human store 10000 chunk of data each day for 100 years the total chunks stored is about 3e+18 so you have, more or less, one chance of 1e+20 to find a collision.

Good chance…

And if the network begin to grow exponentially, can switch, in any moment, to SHA3 512 who offer 1.15e+77 collision resistance. More or less the numbers of atoms in the universe.

DaBrown95 · March 24, 2018, 11:14am

Thanks for your replies everyone.

It sure is.

jlpell · March 24, 2018, 11:37am

Is it really possible to switch to SHA3 512 at any moment later? I was under the impression that whatever is chosen at network launch would be fixed for all time (until SAFE 2.0) . I tried to get some clarity on this in the regular forum via (Why are 256bits "enough"? - Development - Safe Network Forum), but maybe someone could address this here. I’m a big fan of SHA3 512 for the logical consistency alone. Seemed like MaidSafe was originally doing SHA512 then switched to SHA256 at some point in time… for some reason I have yet to identify…

DaBrown95 · March 24, 2018, 11:59am

Yeah I am curious too has to how feasible this is.

@jlpell your post on the regular forum is fabulous. I am also very curious as to why the choice was made to stick with SHA3-256 vs a ‘stronger’ algorithm. NIST do recommend that SHA3-512 is the minimum for “Digital signatures and hash-only applications”

rob · March 24, 2018, 3:13pm

Nah, more likely we don’t know for sure. I have a number of things asked with no answer too and I’ve been around a while. (edit: maybe I read you wrong here)

As to changing. I guess like any system running dual sizing there will be a version identifier and which code is used will depend on the identifier. Maybe even a conversion will happen when data is recopied so that eventually the shorter version will be phased out.

digipl · March 24, 2018, 5:04pm

According to an Irvine old post, when switch from 512 to 256, yes. But how the network could handle this, and other upgrades, is unknown. At least, following the github, the downgrading did not seem too complex to perform.
https://github.com/maidsafe/safe_client_libs/commit/c665c9ba393da9198d3959c4d12f66aa8be5f702

jlpell · March 24, 2018, 5:27pm

Wouldn’t it be far easier to just go with SHA3 512 from the beginning and forget about it? Given marketing optics alone you end up with the situation of a 256bit Safe Network, and some competitive forker’s 512bit “Safer Network”. I could be wrong but unless the devs are doing XOR addressing different than what I’m familiar with, it shouldn’t really matter for routing performance. The only concern I can think of with regard to SHA3 256 vs. SHA3 512 is software performance on mobile/Arm devices. However, there are asics now that implement all this in hardware at hyper speed. And mobiles are quite a bit faster now than ~2 years ago when the change was made. I think it is reasonable to prioritize/optimize for a hardware platform that consists of x86 desktop/laptop/server tech. I don’t see why you/we would want to reduce network capacity because of the shortcomings of a raspberry pi in 2016. Instincts tell me that network latency is going to be the biggest performance bottleneck for everyday mobile clients, not hash rate.

Thanks for the link to the github changes. Good to know when and where the change was made. Yes, seems like changes in central code were easy. However, I don’t think the same goes for a complete network upgrade in the future. Someone please point out if my apprehensions are misconceptions.

DaBrown95 · March 24, 2018, 5:30pm

I could totally see this being a factor that competitors could pick at, its quite low hanging fruit.

Especially when NIST themselves recommend the usage of SHA3-512 I find it difficult to understand why it wouldn’t be used from the start.

jlpell · March 24, 2018, 5:45pm

Scenario A -

Evil genius waits for SAFE Network to launch.
Evil genius clones all code from MaidSafe repo on Github.
Evil genius runs the “find/replace ::sha256 → ::sha512” command in his/her Rust IDE of choice.
Evil genius launches The SAFER Network. (Secure Access For Everyone, Really)
Evil genius laughs/cackles in an evil way and asks their robot butler to warm some tomato soup with the laser cannon to celebrate.

If only I was good at “Austin Powers” memes…

Joking aside, there may have been a really good reason for the change. It would be great to get some clarification on this from the dev ( @AndreasF ? ) who actually made the changes.

dirvine · March 24, 2018, 5:58pm

Reasons right now are performance and message sizes. Prior to launch proper we will probably be using https://github.com/multiformats or something very similar. Some issues with old data usign old hash, but if we see the length or id bit corrosponds to 256 or 512 we then use that algorithm. Right now all that concerns us is getting to launch and these decision become very small, but important.

rob · March 25, 2018, 1:43am

No disagreements since I don’t know the reasoning behind the switch. EDIT: just read @dirvine’s post