Resource Description Framework implementations and 'Portable Data...': Some Links and learnings

happybeing · May 23, 2018, 11:31am

Josh, this is still an open question to me, and it might be that we develop and offer more than one way so maybe the question is what to develop first, rather than only this or only that.

I realise that I’m neither clear on the pros and cons of JSON-LD v Turtle (as representation) or very clear yet about what you have in mind. I have questions about your thinking such as:

how you see JSON-LD + Mutable Data in terms of implementation?
whether you are thinking in terms of providing only a convention, i.e. how to use the SAFE MD API with JSON-LD, or providing one or more layers of protocol on top of that in client libraries?
if APIs in client libraries, then what would you like/not like to see (I think this is an important element because we need to make something that developers can understand and deploy in meaningful ways)?
if you have any use cases in mind to help us think about this?

For myself, I’m suggesting client libraries supporting RESTful interfaces that attempt to comply with the Standard Web (so Solid, LDP, WebDav, FTP and so on, and the ability for people to develop additional RESTful services), as prototyped in safenetwork-webapi. Also, that by sitting this on SAFE NFS we make the end-user’s data available to SAFE apps which understand SAFE NFS, not just those built on RESTful (or other supported APIs).

Much of that could sit on top of JSON-LD + Mutable Data but I’m not sure if that’s something you have in mind, and whether there are any particular API patterns or use cases which you envisage and would be better sitting on a JSON-LD + MD implementation? I understand that you would be doing away with SAFE NFS, while I see that as a useful feature to keep in place.

So in exploring these approaches we should consider that (with/without NFS) along with Turtle/JSON-LD representation, plus any differences in terms of what is built on top using the different approaches to the underlying structures, and example use cases - particularly if there are ones which might be better suited to JSON-LD on MD.

I’m not the guy to know the pros and cons of Turtle (and other RDF representations) versus JSON-LD but comments I’ve seen from Tim in the Solid chat and having read the article by Manu which shows he’s designed it for other purposes mean that we should look into that further, and once we have a clearer picture of the alternative you are thinking about, these ideas would be a good basis for discussions with Tim. I know he’s keen to talk to us so this would be a good thing to understand, and with some urgency because I am guessing he’ll prefer approaches that fit his models (such as Standard Web, Solid server, WebID, Turtle etc).

Clarifying your suggestions will also enable us to flesh out the pros and cons in various respects. BTW, there’s no reason both approaches can’t be supported, and others too if they add value and assist developers by providing useful options.

bzee · May 23, 2018, 8:54pm

I’ve looked more closely into Turtle and JSON-LD. As far as I can see, they’re both equally fit to hold RDF data —and they (and other formats) can be converted into each other. The main difference between the two is that Turtle is designed for RDF specifically, while JSON-LD happens to be able to serialize RDF data.

Personally, JSON-LD seems my preferred option as it looks familiar.

JSON-LD opens the possibility of using it with MDs, as Joshuef mentioned:

MDs would have potential advantages over IData: changing ownership, configuring permissions and mutability.

However, there is perhaps a more important reason for choosing MDs over IDatas: One of my main concerns is performance. I assume that IData would require the usage of the NFS emulation layer on top of MDs. That would mean that looking up a JSON-LD (or Turtle) file would require two sequential look-ups — one for the NFS MD and the next for the IData contents. By using MDs directly, there is only a single look-up. (Correct me if I’m wrong! My understanding of the SAFE internals is limited.)

The data will contain a lot of references to other data — which is of course the principle of Linked Data — resulting in multiple sequential queries I assume.

happybeing · May 24, 2018, 6:45am

Good analysis, we need more depth and breath I think, so more of this

Maybe also look into Solid and node-solid-server to understand that side, for myself I still don’t have a clear picture of the desirability of JSON-LD for MD.

My concern is that without going into detail, the linked nature of RDF on the Semantic Web (which was not a design consideration for JSON-LD - it was designed for Web APIs) there will be other downsides, but maybe not. It needs thinking through though.

happybeing · May 24, 2018, 7:52am

Oh, another thing to factor in is how any solution handles other features of Solid, a major one being Access Control Lists (ACL).

For example one idea @Viv had was to use MData rather than IData to store the RDF resource/file (which might make JSON-LD even more attractive for representation), but at the cost of missing immutability, and so on.

All the pieces need to fit together.

happybeing · May 24, 2018, 7:59am

Something @bzee just picked up:

@bzee: Oh wait, the exact phrasing is also on json-ld.org, and has a reference to a RDF standard explanation: RDF 1.1 Concepts and Abstract Syntax

@happybeing: Ah good catch. People using ‘generalised’ tools/libraries to create JSON-LD that is incompatible with other (Solid) apps, without realising it, is an important consideration for me.

bzee · May 24, 2018, 8:32am

That is a very important area. I imagine a SAFE implementation would deviate the most from Solid here. Another one is authentication. Trying to fit those Solid specs into the SAFE Network architecture in a strict way might be pointless. Some of the permissions Insert, Update, Delete and ManagePermission might be mapped to HTTP verbs, but every user has ‘Read’ permission. The only way to make people unable to read it, is to encrypt it. Which would require an extra layer of complexity when a user wants to read a data structure.

About the immutability; I think that can be mimicked with MDs like I mentioned earlier:

(and removing permissions.)

bochaco · May 24, 2018, 8:56am

The drawback about using MD to mimic immutability is that you are either limited to the maximum size an MD can have, or you have to implement the indexing mechanism to store several chunks in several MD’s.

happybeing · May 24, 2018, 12:09pm

For reference, this is the document Tim and co use when talking about the ‘Standard Web’. I’ve read bits not that long ago but am not as familiar with this as I’d like. It would be good to have someone who knows this well look at the solutions we come up with, either in the dev team or by talking with Tim and his colleagues (online and off).

https://www.w3.org/TR/webarch/

bochaco · May 24, 2018, 5:04pm

I’m thinking that perhaps the data shall be stored in such a way to be serialisation format agnostic, and depending which API/library it’s used to access it you obtain Turtle or JSON-LD format. One of the things I’m trying to get clear in my head is if this is the way to make sure we separate what is part of the network as a platform and what really belongs to the application and/or presentation layer (if we see it as a stack).

intrz · May 24, 2018, 5:33pm

For RDF serialization it could also be worth looking into the binary RDF format HDT.

http://www.rdfhdt.org/what-is-hdt/

happybeing · May 24, 2018, 8:16pm

If we want to ensure RDF compliance, it may be wise to store in a compliant format, although if we support both compliant and non compliant formats at some point there would need to handle conversions from non compliant data (eg ‘generalised’ RDF in JSON-LD) to a compliant format (eg Turtle). I guess the thing to do is to look for where others support both and see how they deal with this.

bochaco · May 25, 2018, 8:58am

Yes, but I don’t mean to store it in a non-compliant form, but in a “raw” RDF form.

If we take that RDF is the data model we can store the data as subject–predicate–object entries/items which can then be serialised using any of the standard/compliant serialisation formats like Turtle or JSON-LD. Perhaps what I’m trying to say is to implement it as a triplestore.

This seems also natural as in the clearnet servers can support multiple formats and the client can request a specific format using the Accept header specifying a preference and its weight, e.g. Accept: text/turtle,application/rdf+xml,application/xhtml+xml;q=0.8,text/html;q=0.7

Also, I guess we cannot say JSON-LD is not compliant, but just broader perhaps, as per my understanding…?..

happybeing · May 25, 2018, 10:14am

I think here we should regard JSON-LD as non compliant but compatible or find some term that helps people realise that if they create some arbitrary JSON-LD it is quite likely not compatible with RDF unless they take pains to ensure that. The technical term is ‘generalised’ but I don’t think that conveys this point. On the other hand, if you generate Turtle, N3 etc, you definitely have RDF as I understand it.

I see your approach as cleaner than using a generalised, potentially non compliant format, but I assume you would lose compatibility with SAFE NFS because if we use SAFE NFS, then I think Turtle would be the sensible choice.

Why? Because then any Solid app can access RDF resources from a SAFE based file system (eg FUSE), or by a browser fetch that supports SAFE NFS, without recourse to a Solid / RDF API layer. (BTW I’m going to try making a SAFE FUSE mount as a side project of my side project Fingers crossed that my skills have finally revived sufficiently to tackle that almost four years since David first suggested I have a go at it. Thank god I couldn’t attempt it back then lol).

Maybe we later go native in a technically cleaner more efficient way as you are thinking, after initially focusing on building bridges to as many existing apps as we can, and with other teams to begin with? I think a good way to do the latter is to provide access to SAFE storage via existing popular APIs and protocols. The leverage that provides is something we need to factor in when developing and prioritising these options.

I would though like to also see a full and detailed exploration of the ideas you and @joshuef are developing because I do see the potential merit of them, and I also know that you both bring insight and understanding to this which I lack. So bring it on and let’s wrestle these solutions into something awesome. This is great!

EDIT: another option would of course be to ensure that SAFE NFS API presents your native RDF as part of a filesystem. That would kill both birds with one stone if it makes sense to do that.

bochaco · May 25, 2018, 1:15pm

I agree, but I wasn’t trying to mean that we should use one instead of the other, in fact I see them as complementary.

The NFS emulation/convention provides you with a way to organise the resources, but nothing is implied/inferred from it about the content/data stored in that hierarchy.

At the moment we can store only Files with our NFS emulation layer, but what if you could also store RDF resources using the NFS emulation?
In that way you still organise the RDF resources using a hierarchy which you may then want to use to create the links using a public ID. But you could also provide links using the XOR name of the RDF resource itself (if you weren’t interested in a public ID type of URL/link), so the RDF resource is a different MD.

Now, you could of course store RDF resources already simply as text or xml files (ImmutableData’s) using the serialisation you like, and this is where I think we could do better in helping users to follow the standard/convention of the semantic web, by giving an emulation/abstraction layer to create RDF resources which can always be retrieved using different serialisation formats (again by the abstraction layer), while making sure the RDF is stored in such a way that it can be easily serialised in any other format we still don’t provide, but we/anyone might support in the future.

Thus, you could read these RDF resources, by either providing the path to the NFS emulation or directly using the XOR name of the RDF resource, with the RDF emulation; the content is in any case retrieved to you in Turtle, JSON-LD, etc.

Edit: I’m just thinking that my thoughts are being driven by my belief (hopefully an understanding ) that serialisation is only needed for a transition of existing apps that currently use HTTP, but in the future we won’t/shouldn’t really need a serialisation format, apps just read the entries from a RDF MD.

happybeing · May 25, 2018, 6:05pm

Only beginning to think about this I think it will take quite a bit of thought to figure out which approach is more efficient in practice, and I think it will depend quite a lot on the use cases.

Seems quite hard to figure out whether native (RDF inside MData etc), or Turtle+NFS IData will be more efficient given uncertainties about uses cases. Large files versus small files, numbers of triples per file/resource, inter resource graph characteristics, frequencies of different kinds of access, sensitivity to latency are going to be hard to model so I’m thinking we’ll need to suck it and see (that’s the technical term for build and test different options ).

joshuef · May 28, 2018, 12:59pm

Just to be clear: I’m with you Mark that we should probably seek to stick to RDF for interoperability

I was imagining something along the lines of @bochaco’s RDF-MD.

For me, storage of RDF data as JSON-LD (in the non-generalised rdf-compatible sense), makes most sense in an MD for reasons as outlined by @bzee above (performance; one less layer of abstraction/emulation, permissions etc), as well as an MD being a key-value storage, which aligns well with JSON-LD as a representation of the data.

Something to consider for data formats is usability : I’ve been looking at JSON for years, to me JSON-LD is much clearer to see what going on quickly than other RDF formats.

It is also, just another way of writing JSON, for which there is a huuuuge tooling ecosystem across many many platforms/languages. As a web developer, whatever data I get from a server I’ll probably be converting that to JSON to work with, within my app… (Again, that can vary between languages, but it is another step for a dev to take.)

I think one of the biggest hurdles we’ll have on the network is getting devs to use these data structures.

bzee · May 28, 2018, 1:56pm

I think a good direction would be to fork something like the rdflib.js library like @happybeing did. And attempt to make it work with a specific solution (like JSON-LD – MD).

happybeing · May 29, 2018, 5:32pm

I’m not convinced of these routes yet fellas

JSON-LD:

I haven’t looked at how it handles triples, so I need to see how it does that because they don’t map well to key/value IMO (being three elements rather than two), though I’m ready to be corrected. I just don’t know yet.

Also, it isn’t clear to me that other RDF representations directly in the MD will be less efficient so that’s something to investigate.

This then leaves the question of using NFS and therefore IData to hold the representation. I like NFS compatibility because it exposes RDF to non Solid aware apps and devs. I expect that to start with there will be a lot of apps using SAFE NFS (including as a virtual drive) so I like the idea of exposing them to files containing RDF, especially in the early days.

WRT non generalised JSON-LD there’s a fundamental problem we create by directly supporting JSON, which is that people will create non compliant content in JSON-LD and store these without necessarily being aware that they’ve done this, or of the consequences. Whereas if we encourage use of a non generalised representation this won’t happen unless people decide to go their own way and generate something non compliant. Because of that, I think those folk are more likely to recognise the consequences and deal sensibly with them (eg converting from RDF stored by another app to JSON-LD and then save it back etc. I think they are likely to see and understand the problems of non compliant content).

The next issue here is that RDF is the way it is for reasons (which I don’t understand but think are probably important), and the tools and libraries designed for it will have features designed to exploit it and so generally be more suitable for the purposes intended for RDF. Whereas when using JSON-LD there will be some tools and libraries which are suited and some which are not - because they were designed for other purposes (eg APIs for which JSON-LD was designed in the first place). So I expect that with JSON-LD there’s increased likelihood of RDF compatibility issues. Some, perhaps most JSON-LD tools and libraries will be built by people who don’t realise this is an issue, or just don’t care because their use case is different. So I see potential for confusion and wasted time here.

Again I haven’t surveyed this, but it worries me, and if you are going to work with RDF I think there’s a benefit to learning to view it in an RDF representation designed for the purpose, rather than one which you may be familiar with but was designed for a different purpose.

Turtle is very readable IMO, but I need to look at some equivalent JSON-LD to know if there’s much at stake in this respect. So the wrestling continues

Forking RDF Libraries

I have forked rdflib.js and have proposed we do something similar with Solid-auth-client so that adding support for SAFE to an existing Solid app is as close to just dropping in the SAFE js modules as possible.

Because my changes are tiny I’m hopeful that at some point they will be merged by the Solid team and that compatibility with SAFE will become standard in any rdflib.js apps because of that. This is because those changes are literally a few lines, which is feasible because all I’m doing is enabling us to intercept calls to fetch() so we handle any requests for a SAFE URI. This route also means Solid apps which use fetch() directly will work with SAFE, and even those using XmlHttpRequest are trivial to convert. Solid apps can and do mix all these methods together, so if we don’t support at the level of fetch() porting a Solid app to SAFE is typically going to be much harder. It is also likely we would have to support many more libraries than rdflib.js. So it makes sense to start with a RESTful interface to SAFE based on intercepted fetch().

If instead we fork rdflib.js and other Solid libraries in order to bypass the RESTful API and go directly to SAFE API (with or without JSON-LD + MD) we lose a lot of this.

Simplicity v Efficiency

Unless I can be sure of the benefits, I tend to go first for simplicity, compatibility, and in our case ease of adoption by as many people in the Solid space as possible, and handle performance and efficiency later. Better get people hooked on SAFE and demanding faster, better stuff, than make them jump through hoops in order to try it out.

So we can always provide libraries that talk directly to the SAFE API as well, but I don’t think we should skip the step of providing maximum exposure and compatibility first, which is why I’m keen on supporting a Solid RESTful API via fetch() and perhaps also SAFE NFS in the first instance.

joshuef · May 29, 2018, 6:06pm

@happybeing I think we should separate out the data representation vs solid integration.

WRT/ data representation formats:

While Json-ld can be generalised. It can also conform to RDF. And I’m right there with you that conforming is a good move.

To say it was not designed for RDF I think is wrong. It was designed to be another format to serialise this data, (with a couple of extra benefits that can make it more usable, but therefore non compliant), and be usable for the web as it currently is

But for the purposes of its use on the network, if we agree RDF compliance is paramount. It’s possible that we add tools to validate this. (if RDF data were a specific tagtype or some such…)

I think this could equally happen with any of the RDF formats… there’s nothing stopping anyone writing a turtle file themselves and getting it all wrong.

I really think you should have a look at JSON-LD Playground . You can easily view the same data in many of the different RDF formats, so you can get a good comparison of what’s going on.

happybeing · May 29, 2018, 6:12pm

You’re missing my points Josh so when I get time I will clarify. But while I believe I understand your response above it doesn’t IMO address the points I just made. Thanks for the link, I’ll look at that when I get time.