Content addressing RDF

happybeing · June 17, 2020, 10:55am

The following paper describes a way to efficiently create a canonical RDF serialisation so that any combination of triples RDF becomes inherently content addressable (which would increase deduplication).

https://openengiadina.net/papers/content-addressable-rdf.html

They author posted about this in the Solid forum:

The use case stated in the paper is to improve reliability by decentralising access to an RDF resource based on content addressing.

If SAFE core or client libraries were to use a method which canonicalises RDF, or provide this as a built in option, it would be interesting to understand the use cases and benefits. The obvious one is deduplication, but I wonder if there are others.

I’m not sure if this would not work with much of the current SAFE RDF scheme as is, because of the structured way RDF is currently stored but wonder if the approach from the could be worked into the SAFE libraries in a way that achieves the same ends. Thoughts?

Greater deduplication is a minimum benefit, and I hope further use cases I haven’t the foggiest about yet. Any thoughts?

danda · June 24, 2020, 3:33am

hmm, I looked over the proposal briefly. In general I like the idea of making RDF content addressable as it seems more compatible with the SAFE Network. Though I didn’t understand something fundamental in the proposal I think:

Given a Fragment Graph (a grouping of RDF statements), the canonical representation can be found. We can compute the hash of the canonical representation. This hash can be used as an identifier for the statements in the Fragment Graph.

Ok, so how is the grouping of statements chosen? And also those statements have their own identifiers, which in the examples seem to be regular URIs. So it seemed like we took regular RDF statements with regular URIs, (arbitrarily?) chose some subset in a turtle serialization, hashed it to come up with an identifier, and then applied more facts to the hashed identifier, also serialized in turtle.

Probably I need to re-read in greater detail.

Anyway, a thought I had while reading is that a SAFE XorUrl is already a content addressed URI. So we can make a SafeUrl from the same group of RDF statements (fragment graph) and then we don’t really need the blake2b hash…

Also, the SafeURL and related content should never go away, if SAFE Network is working as intended.

pukkamustard · June 30, 2020, 9:37am

Hi @danda,

Author of the write-up here.

Thanks! I’m very interested in learning more about SAFE Network works.

You’re right, there is a bit of circularity involved. There are two use-cases on how RDF can be made content-addressable. Maybe by more clearly differentiating the two use-cases, the idea of Fragment Graph becomes more clear. First a short clarification on what a Fragment Graph is.

Given a set of RDF triples that form a graph, a Fragment Graph is defined for a base subject. The base subject can be an arbitary URI that does not have a fragment part. Examples of possible base subjects are:

https://example.com/a/b
urn:sha256:abcd1231322

Following can not be base subjects of a Fragment Graph (as they have a URI fragment part):

https://example.com/a/b#something
urn:sha256:abcd1231322#something-else

A Fragment Graph for a base subject are all triples where the base subject is the subject and all triples where fragments of the base subject are subject (i.e. triples with https://example.com/a/b#something in subject position are also part of the Fragment Graph).

So the definition of a Fragment Graph - the grouping of statements. Is dependent on a base subject and the set of all triples available. It does not depend on the type of URI - it works for URLs or URNs.

The choice of grouping makes it easy to compute a hash-based URN for the base-subject (and thus also the fragment parts) that can replace the original URIs.

This already indicates the use-case “make existing RDF data content-addressable” - replace an existing URL.

The other use-case (which I believe is not so well described in the write-up) is new content.

When creating new RDF content that should be content-addressed we do not need to define an initial base-subject URI. With a suitable Fragment Graph container we can just add statements (without stating the base-subject). Once all statements have been added we finalize the Fragment Graph by computing the hash of the canonical representation and use this hash URN as base subject.

An implementation (in JavaScript) of such a suitable Fragment Graph container: examples/web-demo/src/rdf/fragment-graph.js · master · openEngiadina / js-eris · GitLab

Thanks for pointing out this unclarity in the write-up. I will see how I can address this in a revised version. I hope I could clarify your question?

As mentioned I am very interested in technical details of how SAFE Network works. Unfortunately I have had a hard time finding the right resources to read into. Could you point me in the right direction? In particular I’m interested in the content-addressing part (SafeURL?) and how RDF is handled.

happybeing · June 30, 2020, 10:19am

Hi @pukkamustard, great to have you here. Thanks for explaining more.

Until @danda has a chance to respond I suggest trying the primer to get up to speed on the technical side of the network. Some implementation details have changed, but it’s still a good introduction to the technology and how it works.

The content addressing is fundamental to this, both for chunks of data and to refer to higher level structures such as an immutable file. The network provides a very large virtual address space, of ‘xor addresses’ which you can read about in the primer.

You may also be interested in the APIs, specifically around RDF. These are not documented to that level yet, but you can get a good idea of the functionality and how it is organised at the API level from the SAFE CLI documentation which is a thin layer on top of the native APIs.

You will though find some discussions on use and implementation of RDF within the APIs here and on the community forum, and there is at least one demo application by Maidsafe, but I think the ideas have crystallised quite a bit since that was built. I think that the FilesContainer is now implemented as RDF, and possibly also the NRS (name resolution system).

There have been discussions about implementing RDF in order to support SPARQL querying as a native capability, but that will be for the future, and is not specified yet.

So the FilesContainer is probably the best example of how SAFE uses RDF under the hood, although RDF or any other serialised data can of course be stored on the network. The plan as I understand it is to ensure that the SAFE API allows developers to select the data representation they wish to work with, whether JSON, JSON-LD, or perhaps Turtle, but I’m less clear on this.

Hope that helps.