Suggestion: implement range queries for keys


#1

SAFE currently just has a way to exactly get a value by the hash.

Since there is a structure to the network and how the keys are looked up, I was thinking it should be possible to utilize this to be able to get keys in a range somehow.

A pure key/value store only has support for getting a key by a hash. Column-stores typically allows you to have ordered keys.

There is really two things that would be very useful here. One is to have a method for getting all keys in a range, the other is to have a method for getting the n keys closest to a key.

These two methods would allow lots of new querying capabilities, semantic hashing, using SAFE more like a column-store rather than just key/value store. Space-filling curves can be used to take multiple values to construct keys where semantically close values will give similar keys. All in all, lots of interesting and useful querying capabilities could be built on top of this on the application layer.

These methods could be used for getting MDs and there should be a range of type tags that support this kind of queries, while type tags outside of this range would support only the standard get by key.

So the methods would be something like

GetMDKeysByRange(fromKey, toKey, typeTag)

GetClosestsMDKeys(key, numHits, typeTag)

I found a couple papers dealing with related topics.

http://www.cslab.ece.ntua.gr/~dtsouma/index_files/comcom_chaz.pdf

Is there any reason something like this wouldn’t work?


#2

Hi @intrz, there is no reason why it wouldn’t work, in principle. The problem is how to define the order of the keys though. In practice, the keys are often encrypted, so ordering them for example lexicographically is meaningless.


#3

For public mutable data it would make sense to make unencrypted keys that could be ordered lexicographically. These MDs would be specifically made for querying anyways and could also point to other MDs or or immutable data. An app could use different ways of ordering and defining keys with different type tags.

The GetMDKeysByRange method might not be needed though, I think it could just be implemented on the APP side by using GetClosestsMDKeys.

If you for example want to find all employees between 25 and 60 years old and you have the two keys employees-20 and employees-50, then employees-20 would be an MD with all employes between 20 and 50 years old, plus the key of the next MD in the range, i.e. employees-50, which would have all employees older than 50.

The query would just then be find the first range closest to 25, which would be 20, then in that MD find the first employee that’s 25 years old and then follow the link to the next MD in the range and find all employees until finishing with the ones that are 60 years old.

In reality it would be a bit more tricky than this though, since it’s an XOR distance. The XOR distance between 25 and 20 is 13 and the XOR distance between 20 and 50 is 38, so it should work in the example above. However, the XOR distance between 27 and 28 is 7, while the XOR distance between 27 and 25 is 2. In this case if your query included 27, you would want to get 28, but would instead get 25, so it might not work for all applications and the keys and ways of representing queries would need to be thought through thoroughly.

There would be a lot of stuff to figure out on the app side, but something like GetClosestMDKeys could at least open up for experimenting with many different ways to query data.