Resource Description Framework implementations and 'Portable Data...': Some Links and learnings

joshuef · April 27, 2018, 5:30pm

Since @happybeing’s post I’ve been thinking the last weeks about how one could start fleshing out a SAFE implementation of this.

I was humming and hawing about replying over there, but this is (intended) to be a more general post on info about RDF implementation on SAFE (as opposed to SOLID). Focussing more on what’s out there and how we might best implement something RDF-y, that could then be used by other stacks (like SOLID) (in a similar way to the SAFE Plume implementation).

As for me, the RDF/turtle setup in Plume could be SAFEr. (using MDs for example, as opposed to ttl files and NFS emulation.). But I’ve been struggling to figure our how/why you get RDF schemas, how you could easily reference them and how to do that in SAFE.

And soo, I’ve been reading deeper into things like IPLD and RDF/Turtle and now JSON-LD.

What follows is some learnings from my readings and some light opining.

@bochaco @Krishna @frabrunelle @happybeing FYIs. Feel free to correct me if I’m misunderstanding something here! And please opine! Just wanted to create a summary of some more useful content i’ve been reading.

IPLD:

IPLD is away of ensuring references to specific ‘hash’ based content can be universally understood. (ipfs content, bitcoin blocks, git commits… anything).

Short takeaway:

We could have some parser to establish a way of linking to specific SAFE content (MD/ID, by xor address hash), as a way of referencing SAFE data on other platforms.
multiformats/hash/base/codec etc are cool. It’s a simple solution to avoiding problems in future that we should look into using throughout SAFE to future proof encodings/hashing etc @nbaksalyar @ustulation, not sure if you’s have looked there/have thoughts? (is multihash something we could be for the xor addresses, for eg?).

Longer:

It doesn’t (as I understand), offer much that we can add to SAFE (beyond using it to reference specific data, off of SAFE) once we have a URI structure in place for easily identifying content on the network (as we’ve been talking about). That would form part of the IPLD/SAFE implementation to make the data reference-able outwith of the network.

This talk was really informative for IPLD: https://www.youtube.com/watch?v=Bqs_LzBjQyk and I’d recommend giving it a watch for a decent overview.

RDF (Resource Description Framework).

https://www.w3.org/RDF/

This is the main way of ensuring that data you write can be interpreted easily across programs. And is one of the founding priciples of SOLID

Coupled with OWL, or something similar it enables you to describe your data with the aim of making it portable.

ie, saying "http://thing.com" is a link.

There are many ways to do this. And it’s a bit/lot overwhelming.

eg, @happybeing was using turtle as part of his presentation: Synergy - Project SOLID and SAFEnetwork

this is one option. And seems to be one of the more established ways of representing the data. however it’s not the most human readable thing, and it requires specific parsing for use in web browsers, and to save to the network (or simply using NFS and storing the turtle files).

(more reading: https://stackoverflow.com/questions/1740341/what-is-the-difference-between-rdf-and-owl ; https://www.w3.org/standards/techs/rdf#w3c_all )

After digging in here, it was not clear to me even where to begin with proposing a data type for something on SAFE (or how to find something already existing, which we could reuse). Which, if we’re wanting to get devs going with some sort of RDF for data stored on SAFE, we’re going to need to make as easy as possible.*

[ * we as in community, not necessarily maidsafe.]

Enter JSON-LD

JSON-LD is another way of representing RDF schemas/content. But it seems like quite a clear one to me (and is fully compatible with the RDF standard).

An enlightening read on why it exists (in a world with RDF)
http://manu.sporny.org/2014/json-ld-origins-2/

and the basics are here:
https://www.w3.org/TR/json-ld/#basic-concepts

Being a JSON object, JSON-LD lends itself to a key:value storage system like mutable data, quite nicely.

To me it seems muuuuch simpler/clearer, AND it has some clearly defined resource descriptions (such as person: https://json-ld.org/contexts/person.jsonld ) ready for reuse. AAAand a handy-dandy tool for exploring the schemas https://json-ld.org/playground/

More reading: https://json-ld.org/learn.html

Soooooooo

I think, as a representation of RDF schemas, JSON-LD is leading the pack for me in terms of clarity / usability with the network.

We should probably look (if it’s desirable for the community), at hosting the schemas on the network once we have addressable content. To aid development (and prevent those nasty HTTP requests).

I’m not sure how to make things easier for devs though. How we encourage using such structures…? It seems to me something we want to incentivize in some form (maybe just by making it easy). I’ve no idea how though atm…

I think we should start spitballing around things that could be useful on SAFE. (Nothing gets the ball rolling like getting the ball rolling):

So I’m going to have a look at a User profile (as some sort of person extension). And what that might look like on SAFE (given that maybe you want to store your full profile, but not let everyone/app see your some parts of that (age or nationality, for example…).

Anyway. I mostly wanted to share some learnings and links. As I feel like I have somewhere to start now!

happybeing · April 27, 2018, 7:34pm

Great work Josh, brilliant to see you digging into this and coming at it in a different way.

I confess I haven’t considered the merits of different representations much. I do recall reading some of the reasons why Turtle et al were favoured over XML and JSON-LD but most of it escapes me so it would be best to go over to the Linked Data gitter chat and ask there. It might be something to do with how triples are a fundamental of RDF and the other formats were not designed with them in mind, or maybe not!

The other thing to consider when assessing how to implement LD within SAFE is compatibility with what Tim et al refer to as the “Standard Web” - they pointed me at a spec of which I’ve only read a small part, but it guides everything according to Tim’s vision, and we should at least consider and discuss the implications of deviating from it (if indeed we are!). I don’t have the link to hand.

Really glad you’re jumping in here Josh. The water’s lovely

bochaco · May 14, 2018, 12:30pm

Very nice summary @joshuef, I just wanted to comment that I’ve been also researching around the multiformats and CID (one of the standards behind IPLD) for addressable content and they indeed look promising to me too. I was looking for a way to have the content type of the data and apprently you can have that with CID.

happybeing · May 14, 2018, 3:30pm

@joshuef:
An enlightening read on why it exists (in a world with RDF)
http://manu.sporny.org/2014/json-ld-origins-2/

I just read this and it occurs to me we’ll need to understand both the RDF/Semantic Web side (Turtle, SPARQL etc) and the Web Dev side (JSON-LD) before we can decide which to use where. Just reading that suggests the latter more for simplifying traditional Client - (complex) Server APIs, whereas RDF looks forward to new things assuming there’s Linked Data appearing from many sources. My impression is that the latter is happening so I’m not skeptical like Manu that there’s no opportunity here, on the contrary, I think it is a big opportunity based on already available SPARQL interfaces to Linked Data sources. But to me that’s not the main opportunity - I think separating apps from owning data is the killer feature that we want to get going on SAFE. Having the ability to switch apps without losing the data I create, or for any app to be able to pull in data I made with several other apps, and write it out again, creates enormous possibilities. Plus pulling in data from outside sources. So many possibilities

I do agree with Manu on W3c specs though - I used to think it was just me, but most of them seem impenetrable. The Solid / LDP ones excepted!

So maybe we should test out both technologies (and anything else if it seems important enough) in order to work out how best to apply them.

For example, maybe JSON-LD is good for providing an API layer of design patterns sitting directly on MD key-value stores, so for devs who might otherwise have chosen a solution using MongoDB back end (as the author Manu Sporny talks about) while RDF/Turtle/SPARQL might be something supported mainly as a platform (ie Solid / LDP) for a class of apps which separate UI/App from data, and slide any app to read and write data of any other app.

I’m just speculating here, not having read anything about JSON-LD yet, but Manu is clear that he had no interest in the Semantic Web - which is what shows Solid to deliver independence of App and Data. So I’m seeing JSON-LD as traditional app architecture, rather than an alternative way of implementing Solid style apps.

So maybe we will have two threads of activity - one Solid focused, and the other making use of JSON-LD to build better and clearer APIs for more traditional style apps. I suspect there’s a role and a need for both.

There may be overlapping areas - for instance it would be great @joshuef if we could have a unified approach to user profiles on SAFE and on Solid, though I’m not sure this is possible. In SAFE we have a public ID, and on Solid WebID, which is a URI - so those are potentially equivalent. The latter is also a link to a profile. So far so good - if we can make a SAFE profile based on a public ID in a syntactically equivalent way to a WebID. That leaves only the content and format of a SAFE profile versus a Solid WebID profile, which is something I don’t know enough about but we can get input from Tim in things like this at any time. I can imagine difference in representation could be handled with requested format headers and conversions in a RESTful interface, but maybe that won’t be necessary. I digress!

Those are just some initial thoughts, so if I’m going off track based on some misconceptions, or anyone simply has another take, please share.

UPDATE:
Found some notes I made from Wikipedia in 2014:

RDF (Resource Description Framework)

Note: RDF triples are not entity-attribute-value (sky - colour - blue) but entiry-relationship-value (sky - has - the colour blue)

SPARQL (SPARQL Protocol And RFD Query Language):

SPARQL allows users to write queries against data that can loosely be called “key-value” data, more specifically it is data that follows the RDF specification of the W3C. The entire database is thus a set of “subject-predicate-object” triples. This is analogous to some NoSQL database’s usage of the term “document-key-value”, such as MongoDB.

dirvine · May 14, 2018, 9:55pm

@Krishna can help here as well. In SAFE your public ID is supposed to be almost a URL, i.e. the location of a data type (7) that also shows the services of that ID. It can be your main ID or another of the 3 allowed. This should be able to be translated to any other ID type that is url like. I know Krishna has been looking into SOLID a wee bit so he may also have input here.

joshuef · May 23, 2018, 7:50am

@happybeing I’d understood something different about JSON-LD. What I took away from those pages was that JSON-LD is another RDF descriptor. Similar to Turtle, just a different format (for the reasons Manu gets into).

So Turtle / JSON-LD can be used interchangeably. All you need is something to parse that data.

And then with either of those (or the many other types of RDF) we can use those to perform SPARQL queries etc.

Given that, I was suggesting JSON-LD as the RDF description for SAFE as it fits within mutable data so well. (One less abstraction than turtle which requires NFS emulation to store etc. Also could gain benefits for versioning [although there are arguments to be made for immutability of the data/ versions also… but that’s another thing]).

Tootally. I think if we have @bochaco’s suggestion for XOR accessible content on the network this becomes much easier. We can simply have our (whatever) form of RDF storage on the network and point to that for your WebID.

The question (for me) becomes how might we standardise that on the network to make this easy for users.

That could come in the form of a specific container, that might only house WebIDs, with a default one that might link to any other [unless other specified by the app].

joshuef · May 23, 2018, 8:19am

edited ^ for the correct link re: xor urls

bzee · May 23, 2018, 8:27am

It seems that JSON-LD is not just another format: RDF AND JSON-LD UseCases - Data on the Web Best Practices

But it can be used as such. Apparently JSON-LD also extends RDF in some ways. I think it’s confusing because the terms are used to refer to both the standards and the actual data.

Looking at the table in the section I linked, JSON-LD and RDF are used depending on the application.

That could be solved by disowning the MutableData.

Standardised in what way exactly? Conventions or technical requirements?

joshuef · May 23, 2018, 8:40am

I’m thinking conventionally. You should be able to do whatever you want on the network re: data. I don’t think it would be wise / or work to try and force people into anything.

But if we can make it easy to do things in a way that works for RDF type data, with the aim for having your data be truly portable across apps. Then we’re taking the right steps IMO.

(also, conversely in my statement, i said ‘users’ by I’m thinking about ‘users’ being devs here… I should watch my language as that’s definitely confusing!)

Good link @bzee , which has some further clarification on what I was trying to get at in my previous post:

It is important to underline that there is a need to educate/inform user community about the difference between data model(i.e. RDF) and a serialization format (i.e. RDF/XML, JSON-LD, TriG etc). Recommending a graph based, clean, easy to use RDF serialization format would be a good step that can help users to select/no-select JSON-LD based on their requirements. Following table lists few application categories with suitable data model approach.

happybeing · May 23, 2018, 11:31am

Josh, this is still an open question to me, and it might be that we develop and offer more than one way so maybe the question is what to develop first, rather than only this or only that.

I realise that I’m neither clear on the pros and cons of JSON-LD v Turtle (as representation) or very clear yet about what you have in mind. I have questions about your thinking such as:

how you see JSON-LD + Mutable Data in terms of implementation?
whether you are thinking in terms of providing only a convention, i.e. how to use the SAFE MD API with JSON-LD, or providing one or more layers of protocol on top of that in client libraries?
if APIs in client libraries, then what would you like/not like to see (I think this is an important element because we need to make something that developers can understand and deploy in meaningful ways)?
if you have any use cases in mind to help us think about this?

For myself, I’m suggesting client libraries supporting RESTful interfaces that attempt to comply with the Standard Web (so Solid, LDP, WebDav, FTP and so on, and the ability for people to develop additional RESTful services), as prototyped in safenetwork-webapi. Also, that by sitting this on SAFE NFS we make the end-user’s data available to SAFE apps which understand SAFE NFS, not just those built on RESTful (or other supported APIs).

Much of that could sit on top of JSON-LD + Mutable Data but I’m not sure if that’s something you have in mind, and whether there are any particular API patterns or use cases which you envisage and would be better sitting on a JSON-LD + MD implementation? I understand that you would be doing away with SAFE NFS, while I see that as a useful feature to keep in place.

So in exploring these approaches we should consider that (with/without NFS) along with Turtle/JSON-LD representation, plus any differences in terms of what is built on top using the different approaches to the underlying structures, and example use cases - particularly if there are ones which might be better suited to JSON-LD on MD.

I’m not the guy to know the pros and cons of Turtle (and other RDF representations) versus JSON-LD but comments I’ve seen from Tim in the Solid chat and having read the article by Manu which shows he’s designed it for other purposes mean that we should look into that further, and once we have a clearer picture of the alternative you are thinking about, these ideas would be a good basis for discussions with Tim. I know he’s keen to talk to us so this would be a good thing to understand, and with some urgency because I am guessing he’ll prefer approaches that fit his models (such as Standard Web, Solid server, WebID, Turtle etc).

Clarifying your suggestions will also enable us to flesh out the pros and cons in various respects. BTW, there’s no reason both approaches can’t be supported, and others too if they add value and assist developers by providing useful options.

bzee · May 23, 2018, 8:54pm

I’ve looked more closely into Turtle and JSON-LD. As far as I can see, they’re both equally fit to hold RDF data —and they (and other formats) can be converted into each other. The main difference between the two is that Turtle is designed for RDF specifically, while JSON-LD happens to be able to serialize RDF data.

Personally, JSON-LD seems my preferred option as it looks familiar.

JSON-LD opens the possibility of using it with MDs, as Joshuef mentioned:

MDs would have potential advantages over IData: changing ownership, configuring permissions and mutability.

However, there is perhaps a more important reason for choosing MDs over IDatas: One of my main concerns is performance. I assume that IData would require the usage of the NFS emulation layer on top of MDs. That would mean that looking up a JSON-LD (or Turtle) file would require two sequential look-ups — one for the NFS MD and the next for the IData contents. By using MDs directly, there is only a single look-up. (Correct me if I’m wrong! My understanding of the SAFE internals is limited.)

The data will contain a lot of references to other data — which is of course the principle of Linked Data — resulting in multiple sequential queries I assume.

happybeing · May 24, 2018, 6:45am

Good analysis, we need more depth and breath I think, so more of this

Maybe also look into Solid and node-solid-server to understand that side, for myself I still don’t have a clear picture of the desirability of JSON-LD for MD.

My concern is that without going into detail, the linked nature of RDF on the Semantic Web (which was not a design consideration for JSON-LD - it was designed for Web APIs) there will be other downsides, but maybe not. It needs thinking through though.

happybeing · May 24, 2018, 7:52am

Oh, another thing to factor in is how any solution handles other features of Solid, a major one being Access Control Lists (ACL).

For example one idea @Viv had was to use MData rather than IData to store the RDF resource/file (which might make JSON-LD even more attractive for representation), but at the cost of missing immutability, and so on.

All the pieces need to fit together.

happybeing · May 24, 2018, 7:59am

Something @bzee just picked up:

@bzee: Oh wait, the exact phrasing is also on json-ld.org, and has a reference to a RDF standard explanation: RDF 1.1 Concepts and Abstract Syntax

@happybeing: Ah good catch. People using ‘generalised’ tools/libraries to create JSON-LD that is incompatible with other (Solid) apps, without realising it, is an important consideration for me.

bzee · May 24, 2018, 8:32am

That is a very important area. I imagine a SAFE implementation would deviate the most from Solid here. Another one is authentication. Trying to fit those Solid specs into the SAFE Network architecture in a strict way might be pointless. Some of the permissions Insert, Update, Delete and ManagePermission might be mapped to HTTP verbs, but every user has ‘Read’ permission. The only way to make people unable to read it, is to encrypt it. Which would require an extra layer of complexity when a user wants to read a data structure.

About the immutability; I think that can be mimicked with MDs like I mentioned earlier:

(and removing permissions.)

bochaco · May 24, 2018, 8:56am

The drawback about using MD to mimic immutability is that you are either limited to the maximum size an MD can have, or you have to implement the indexing mechanism to store several chunks in several MD’s.

happybeing · May 24, 2018, 12:09pm

For reference, this is the document Tim and co use when talking about the ‘Standard Web’. I’ve read bits not that long ago but am not as familiar with this as I’d like. It would be good to have someone who knows this well look at the solutions we come up with, either in the dev team or by talking with Tim and his colleagues (online and off).

https://www.w3.org/TR/webarch/

bochaco · May 24, 2018, 5:04pm

I’m thinking that perhaps the data shall be stored in such a way to be serialisation format agnostic, and depending which API/library it’s used to access it you obtain Turtle or JSON-LD format. One of the things I’m trying to get clear in my head is if this is the way to make sure we separate what is part of the network as a platform and what really belongs to the application and/or presentation layer (if we see it as a stack).

intrz · May 24, 2018, 5:33pm

For RDF serialization it could also be worth looking into the binary RDF format HDT.

http://www.rdfhdt.org/what-is-hdt/

happybeing · May 24, 2018, 8:16pm

If we want to ensure RDF compliance, it may be wise to store in a compliant format, although if we support both compliant and non compliant formats at some point there would need to handle conversions from non compliant data (eg ‘generalised’ RDF in JSON-LD) to a compliant format (eg Turtle). I guess the thing to do is to look for where others support both and see how they deal with this.