It appears that the answer to my question begins here, https://github.com/maidsafe/routing/blob/master/src/routing_table/mod.rs#L523, where all sections of a routing table are iterated over, https://github.com/maidsafe/routing/blob/master/src/routing_table/mod.rs#L309, to find the closest nodes to a given XorName.
Sections, https://github.com/maidsafe/routing/blob/master/src/routing_table/mod.rs#L132, are made up of
BTreeMap's, where each section
Prefix key corresponds to a tuple value composed of a version and a
BTreeSet of section member
I’m reading about BTreeMap performance:
In theory, a binary search tree (BST) is the optimal choice for a sorted map, as a perfectly balanced BST performs the theoretical minimum amount of comparisons necessary to find an element (log2n). However, in practice the way this is done is very inefficient for modern computer architectures.
Currently, our implementation simply performs naive linear search. This provides excellent performance on small nodes of elements which are cheap to compare. However in the future we would like to further explore choosing the optimal search strategy based on the choice of B, and possibly other factors. Using linear search, searching for a random element is expected to take O(B logBn) comparisons, which is generally worse than a BST. In practice, however, performance is excellent.
It appears that the answer is that we are using linear search and that it performs excellently. I was originally assuming that all possible ID’s in the 256-bit range were being traversed, to find the closest nodes to a given
XorName. On the contrary, known names are being compared, section by section.
Further reading, not for deep research but just to become basically acquainted:
- Big O notation: these appear to express performance costs of operations?
- Binary Search Tree: BST’s seems to be compared to
BTreeMap's in the Rust std collections docs. Are they simply a different type of data structure?
Using this hash collision calculator, http://everydayinternetstuff.com/2015/04/hash-collision-probability-calculator/, based on the article that @oetyng posted above:
With a 256-bit hash id range, we hit the first probability of
4.440892098500626e-16 for a hash collision when 10 decillion ID’s,
10_000_000_000_000_000_000_000_000_000_000 have been created.
It’s currently estimated that 2.5 Exabytes of data are created on the internet every day. Incorrectly assuming that this will remain constant, because I don’t understand the factors for the rate of increase over time, it would take 4 quintillion years,
4_000_000_000_000_000_000, to reach 10 decillion ID’s. The assumption is that each 1 Mb chunk of data has it’s own ID. I’m sure there is something I’ve missed or some miscalculation, but at the very least I can see the vast amount of time it would take to even consider hash collisions.