[preRFC] - Data API Ideas

joshuef · September 27, 2018, 3:01pm

Here I’m looking to get some feedback on ideas for improving the client facingAPIs. Building atop the RDF API implemented in the recent RDF proof of concept work, with the aim of simplifying creation of data, abstracting away some boilerplate functionality that comes with saving mutable/immutable data to the SAFE Network.

This scrappy wee chart aims to clarify just what would be built atop what: https://docs.google.com/drawings/d/1fbnE4tVPhfjfpXg4Ipzrkmzt12_3M7O61ShYpkbh2Ro/edit?usp=sharing

And while everything here is noted in javascript style. These APIs (I should think), would be applicable in many languages.

Does that make sense? What are missing? What changes would you want to see beyond these?

And just to reiterate: I’m suggesting building atop current functionality, so we wouldn’t be losing anything. Just making life easier for app devs out there (I hope! :P)

Finally, to note, throughout:

MD = mutable data, ID = Immutable data.

Data

PUTting some ID on the network should not involve four lines of code (with a call for a cipher?!). I just want to save a file!

const writer = await app.immutableData.create()
await writer.write( data )
const cipher = await app.cipherOpt.newPlainText();
const address = await writer.close( cipher, true )

:(

An ideal may be something like:

await app.data(<data as a string>).save()

:)

It’s this sort of simplification (which we have abstracted in various apps in various ways) that we should aim to standardise.

I’m not suggesting where these live (at this moment), though if these abstractions are considered useful across all platforms, then I’d say there’s a strong argument for developing a RUST version of these and using bindgen to generate the abstractions across platforms.

Immutable Data

Creating ID should be as simple as something like:

data.add / data.add(encrypt)
data.save

Mutable

Mutable data could by default apply some RDF schemas (the application name/version, eg), which could be useful when accessing data by other apps. But it could also not return this by default. Providing a simple way to work with objects/arrays.

mutable.add // with option to encrypt or not
mutable.schema // ?? this instantiates the graph… (built atop the RDF layer)
mutable.update // or create or update the data
mutable.save

RDF

As found in current RDF POC branch, but abstracting away the need for creating/managing a MD reference. More clearly to offer:

rdf.add
rdf.setId // et all. All RDFLib apis would still be exposed as they are.
rdf.schema // ability to utilise a schema (that the API offers, eg: WebId, see below for more.
rdf.save // when finished.

The data apis can be built on top of this (MD specifically, ID potentially).

I’m no looking into how exactly how we’re storing the RDF data (that’s for an RDF RFC).

RDF Schemas

As an start, schema functionality would allow for easily creating an RDF with vocabs from a known type. For eg, a WebId.

const profile = {
      uri: 'safe://mywebid.gabriel',
      name: 'Gabriel Viganotti',
      nick: 'bochaco',
      website: 'safe://mywebsite.gabriel',
      image: 'safe://mywebsite.gabriel/images/myavatar',
    };

The profile properties are all properties of a WebId. So we can automatically setup the RDF object using this mapping for a known schema type.

I’d propose built in schemas be only those needed for SAFE to function:

safe/web-id (WebId)
safe/ResolutionMap
safe/ResolutionMap/FilesContainer

Though we should build this functionality in such a way that it could be extendable. For example, imagine importing schemas defined elsewhere to use in your app:

import {Album} from 'schema.org-schema'; //pull in schema from another project


// Built in 'safe' schema
mutable.add({
    name: 'josh'
    uri: 'safe://lalallaa'
    img: <xor url>
}).schema('safe/web-id').save()


// using an imported schema to easily create data:
mutable.add({
    title: 'Muswell Hillbillies'
    uri: 'safe://lalallaaasdadsaasdadad'
    img: <xor url>
}).schema(Album).save()

Questions

Do we need to log all file updates/creation in an MD? (How could a file manager know where all your data is? [that wouldnt work with a modified core that removed this, though…])
Should we add version / app info to RDF data by default?
Do these seem helpful?

bzee · September 27, 2018, 6:44pm

Great ideas @joshuef! I’ve mentioned recently to @happybeing that I was worried that the SAFE NodeJS API got too high level. I would recommend more separation of concerns; which is what seems to be suggested by your draft chart. (Meaning different NPM modules, Rust crates, etc.)

Also related is then to not integrate APIs that cover different aspects: I’d recommend not abstracting (or proxy-ing) the RDF API from within the SAFE API, but just expose as much of the original APIs as possible (e.g. rdflib).

The biggest advantage is that it should stay integration friendly with other projects that just build on top of rdflib.js. Also, it assures that the SAFE API does not need to integrate any changes when the rdflib.js projects develops new features.

Anyway, this as a more general aside that worried me.

Hadn’t really thought about this, but seems extremely important to assure compatibility between SAFE apps, or at least simplify making them compatible. I do think the more abstract stuff is going to be hard to put into Rust and then generate bindings to other languages as not every language has the same interfaces of course. (E.g. existing interfaces in NodeJS would be cool to be compatible with the SAFE APIs (like Stream | Node.js v21.5.0 Documentation).)

I think the idioms proposed are a nice beginning but they have to be carefully thought out. They might seem nice for a few specific use cases, though it’s important to make them as general as possible. There are a lot of choices that are to be made when designing such APIs. Idioms different from yours (which keep separation of concerns in mind as explained in the top of my post):

// Where to put type tag and XOR? What if we want to put to random XOR?
const md = new MutableData(rdfGraph, <type as integer>); // ?
await app.put(md);

// Putting a ID
const id = new ImmutableData(<data as string>);
await app.put(id);

// To rdflib.js object and to MutableData
const rdf = md.toRdf();
const md = MutableData.from(rdf);

I do not think so. At least it should be optional. Not every peace of data would need a version, and showing which app made the MD (which is what I assume you mean) might be sensitive data the user doesn’t want to expose.

I will keep playing with these ideas in my mind a little more. Great work so far!

joshuef · September 28, 2018, 8:23am

Yeh, this is a good point. Sheparding too/from this RDFlib object is a nice idea rather than baking it in.

Indeed. But perhaps there’s a basis that’ll be useful across platforms. I know we’ve already ended up doing some of this same functionality in different apps on the JS side. I’d imagine it’s similar on the mobile side too.

I’d imagine we formulate the commands to accept an options object with such params.

Although I guess one question is how deep to go. If the lower level libs are always available, how much of this do we want to cater to?

I’d say that if you’re putting data you’re worried about being sensitive… then you encrypt it. Anything else should be fair game (and indeed, a user might think they haven’t given away that they exclusively use tinder for messaging, but perhaps the application has fields / a specific set of conventions that make that obvious.)

I don’t see explicitly stating app/app version to be a bad thing, if it allows for more easily parsing the data. Here, I imagine an app was used to save some data, but has been abandoned. Maybe they didn’t use RDF… but if we know app/version it could be useful for another app to import the data…

bzee · September 28, 2018, 9:44am

Good point. However, looking into the future I can imagine that as apps aim for compatibility they have certain common conventions; e.g. a few shared MDs that make up a base layer that should be there before an app could function. Then it would still be uneasy when a certain app has to leave its details in the MD.

If the goal is to make it easier for apps to parse the data, then I think stating the app is not the right solution. Rather mention the convention or standard that the MD is supposed to be built for.

joshuef · September 28, 2018, 10:37am

Yeh, it would be nice for devs to do this (aka: use RDF), but when they don’t… ?

But you’re right, maybe adding appinfo for each MD is heavyhanded. Maybe it’s only add it for MDs without RDF… or mayb there’s another solution (or no solution is needed!)

bzee · September 28, 2018, 10:45am

I thought we were assuming the use of RDF in this discussion; sorry if I misunderstood… Are you talking about basic key-value statements here, or about a more fundamental network metadata that would be added automatically based on the app that initiates the PUT?

If we assume the use of RDF then I think it’s up to conventions and nothing should be forced — only made optional/default so to make sure app developers follow good practices and stimulate compatibility.

That’s the point I wanted to make — that your goal doesn’t necessarily imply the solution you offered…

joshuef · September 28, 2018, 11:12am

I’m imagining that it could be done as part of the basic MD data APIs above. (So the basic key:value MD, yep). It could be added in, either as simple key:value, or as RDF data.

I’m with you here. Though I worry a lot about developers not using RDF data (or some similar format ), about the data not being a user’s own and being effectively unusable between apps. So I’d lean towards defaulting to it to be ‘on’ (and you’d still have the lower level libs available too, if you want to get really specific with your MDs).

Yeh, I’d love for it to be easier to do RDF right than wrong. Whether that’s the above idea of app naming/version, or some other stuff. I really want adding a schema to be simple (while offer the lower libs for those who need more detailed data management).

bzee · September 28, 2018, 11:50am

The OSI model contains layers that are mostly agnostic to the data they carry. E.g. the Internet Protocol doesn’t care whether it contains TCP or UDP data. However, there is an IP header that contains a protocol number telling what data is inside of it.

Projecting this onto MD and RDF, then the MD might have a field that says something about its data format — which would be outside the data itself (i.e. not in the key-value map). This also adds to my argument against using the app name, as it’s not about the app but about the protocol the data is following.

Edit: My IP example might not be relevant on second thought. I realized other layers like TCP do not tell anything about what data they’re carrying. It’s unnecessary as the applications communicating already assume a common language (like HTTP). Applying this to MD and RDF: applications just assume that the data in there match what they know and expect — no metadata necessary for interpreting the data. Like a HTTP server might send back a ‘bad request’ message, the application might not be able to interpret the data and that’s the end of it.

Meaning: the metadata wouldn’t somehow allow compatibility. The knowledge that an MD follows a certain protocol doesn’t suddenly allow the application to interpret the data correctly unless the application is sort of a multi-protocol app. Is that common? I don’t think so. Just like most web servers only understand HTTP, most apps are written for a single layer.

joshuef · September 28, 2018, 12:38pm

No, it’s not. But it could be. RDF would allow apps to see if data they are pulling in is talking about the same thing, as the data they need. That’s where vocabs and schema come in.

If we have data with no descriptors at all… that’s much harder for the app to work out automatically. But if someone produces a mapping of data to schema to allow this, it could then be simpler.

joshuef · September 28, 2018, 12:40pm

But @bzee, you raise a good point with

make sure app developers follow good practices and stimulate compatibility.

This is what a lot of these ideas are about. Making schemas simpler. And removing the need to think about RDFlib and vocabs etc.

But how else might we encourage good practice? What even might those good practices be? Can it be done via the data interface? And how else might that look?

bzee · September 28, 2018, 1:20pm

So, besides RDF there is no other protocol within the MD that an app will understand (that’s what I referred to with apps not being multi-protocol — RDF is the base protocol an app assumes to built on). Leaving the semantics aside we’re saying the same thing!

Then, within RDF applications can have their own protocols/conventions so they make sure they understand each other (which is where the vocabs and scheme you mention come in).

I think it’s an unrealistic ambition to remove the need to think about rdflib.js and vocabs. That would mean a completely new layer that abstracts RDF away.

By setting the examples for NFS containers, DNS, WebID, etc. If all these are implemented with good RDF principles, app developers have to built upon those same principles.

Perhaps these are questions that are unrelated to SAFE, but more related to SOLID/RDF in general. Maybe @happybeing can chime in here.

joshuef · September 28, 2018, 1:33pm

Yeh. It would mean that indeed. I don’t think that’s unrealistic if all you’re doing is storing application data, but want to make sure it’s self describing and meets the standards of being open and compatible.

If you’ll really be working with RDF, then you can go direct to the libraries. But for general app development. I don’t see devs rushing towards the complexity and potential decision paralysis of RDF when they just want to build something.

Knowing what’s in a given schema (as I suggest above with WebId), we could easily validate for required keys, and do all the rdf lib legwork (which is a lot of boilerplate type code for a given schema ).

To me, abstracting away some of complexities of RDF is crucial to aiding adoption and it becoming the best practice to use it. As it stands it is overly complex to use, and confusing to even get started with (just what vocab should you use?).

There’s a nice writeup on the json-lz page about some of the problems (see the background reading), that’s well worth a read.

bzee · September 28, 2018, 1:49pm

You’re right! It might not be that unrealistic on second thought…

Though I was thinking about a random app developer that wants to make a blog. In that case he has to touch on vocabs, no? He has to think about how he’s structuring his data. That’s why I thought it’s unrealistic to not think about vocabs etc.

Or do you have other app developers in mind that only have to use pre-existing boilerplate?

joshuef · September 28, 2018, 2:06pm

From my limited webId / patter experience. Choosing vocabs is a pain.

Mostly as there are so many.

So, for SAFE things, I’d say we have them built in, so making a WebId, or create a Public Name is as simple as passing your js object over (though could be in any language). We have examples of this already.

What I’m leaning into with the OP is that, we could construct some helper libraries for other Schema sets. I’d start with http://schema.org/docs/full.html as it’s in use all over the web, it coveres a heap of use cases, and it has a decent website. (though there’s nothing blocking anyone else making a hlper library for their favourite schemas).

But if we start with schema.org, you’re options are already limited. So a dev must just search there for something appropriate. If you want to make an Album, you’ll find, https://schema.org/MusicAlbum.

And then, if we have our lib, it’s as simple as in the op:

import {Album} from 'schema.org-schema'; 
mutable.add({
    title: 'Muswell Hillbillies'
    uri: 'safe://lalallaaasdadsaasdadad'
    img: <xor url>
}).schema(Album).save()

It could even be set up to be self documenting, such that Album.man() or something would display all the props.

Just ideas, but if we had that working, to me that seems way easier. And you can just get on with devving!

bzee · September 28, 2018, 2:10pm

Sounds really helpful! I guess such helper libraries could be really useful outside of SAFE; a lot of non-SAFE developers will get into this issue too! So, if such helper libraries are going to be developed I hope they’re going to be SAFE agnostic…

joshuef · September 28, 2018, 2:11pm

I’d say that’s a pretty sensible call!

happybeing · September 28, 2018, 2:25pm

Thanks for the mention @bzee I have been reading but don’t have a chunk of time to join in properly with this.

Just reading where you are now, I don’t really understand how (if I understand the abstraction layer) hiding the RDF helps developers create apps which create and re-use Sem Web data.

I see we can provide simple APIs to do the common operations (make a basic WebID and profile for ex), but if we then build JSON APIs around that to avoid the need to touch RDF in order to work with (edit/update/search) that data, we have to do that for everything, which we can’t do because most apps will be doing something different and the dev will need to make those schema choices and work at the RDF level.

The Solid team are aware of these issues and actively working on them now with the advent of Inrupt. I think we cold contribute to and benefit from that effort by engaging with them, putting our ideas to them for feedback and challenging them with our critiques of their approach etc. We began to do that through an initial conference call over WebID, and a little discussion of schemas, so I vote for much more of that and less working in isolation.

I’m not going to be able to work or write much for the next couple of days though, so apologies if I don’t respond to responses, queries etc. I’m very happy to see all these ideas too BTW, I hope that’s apparent!

joshuef · September 28, 2018, 3:13pm

If you want to work with RDF, then you can use the libs directly, that’s fine. And that would still be possible. Then this layer is not for you.

If you want to make a simpler webapp, or website… save some data. You’re not thinking about RDF… you’re not going to want to dive into all of that. (I know I wouldn’t want to if I’m learning about the SAFE Network. One thing at a time…). You simply want to build. The question is how do we get devs to build using RDF data? And having used rdflib, I would say, that is not it.

It’s the same idea of development experience that Paul Frazee was trying to solve with JSON-LZ.

We have to let devs dev. And if we want RDF data on the network for reusable apps, we have to make it as easy as possible to be creating that data.

Helping with decision paralysis and removing complexity of implementation are valid options there in my opinion.

But even then, @happybeing it’s not either/or… Once the data is there as RDF data, it can be retrieved and operated on as RDF data. There is nothing stopping that. All I’m trying to find is some way of making it easier to work with. (Such as we’ve found the need to do in safe-app-nodejs for webId/publicNames etc.)

Dev experience is a real issue with RDF (The background links on json-lz are really worth getting into.) It goes beyond just the APIs, though. Choosing vocabs and our data structures etc, is a huuuge drag. In my opinion it’s part of the reason you don’t see more people working with it, despite how long it’s been in existence. And it’s something that I’ve not seen tackled nor really sufficiently addressed on the SOLID chats.

I’m totally happy for more calls and discussions, if they’ve got some ideas on the go for this type of thing, I’d love to see it.

ah yeh, I also wont be about much the next week(s), doing a spot of travelling. I will be about, though, just less

[Wow this is weird, I can edit your post - @happybeing]

happybeing · September 28, 2018, 9:08pm

Thanks for the explanation Josh. I understand better now, and I know we share the same goals here. I am not on board though with the approach, and wasn’t convinced by Paul or the parts of his reasoning I’ve read. I think it is valid to try, but I’m not convinced and would prefer a different approach.

I don’t think it is fair to point to the lack of RDF in the world and use that. They’ve been going a different route and with great success. Government and other large datasets are coming online every day in RDF, with SPARQL queries and with that expertise, awareness and skills among devs are growing.

They are only now turning to the app model we’re talking about, which we have largely got because of that work (ie Solid), and I favour us joining with them to solve it rather than taking Paul’s approach which is fine, but isn’t about working together with Tim et al. I was quite hopeful that Maidsafe would start to jump in there and get involved in that, but it hasn’t really got going.

My guess is that the approach is not to say we can’t expect devs to get their hands dirty, so we need to hide RDF away from those who want to write simpler apps. Instead I think we need to make working with RDF easy enough for people to build even simple apps that work with it. If not, I think you risk keeping the division between people who just do app owned data stuff using JSON, and those doing Sem Web stuff, or cross app data mashing but have to work at a low level with poorer tooling to do so.

I think it would be best to focus on making all the common things devs might want to do with RDF as simple and easy as possible, but without hiding it, its power, or the fact that this stuff is now pouring out all over the web.

Did you know that facebook profiles are (maybe less now, I’m not sure after their CA issues!) available in Turtle through their graph API? What is that, a billion profiles?

The latest initiative for data portability from Mcrosoft, Google, facebook and twitter is RDF (ie the Data Transfer Project).

UK parliament databases are all coming out with SPARQL endpoints.

Those are just three things off the top of my head. RDF is just getting going, but you can already pull a billion Turtle profiles out of facebook and mash it with Hansard and Google photos in five minutes. I follow people on twitter who regularly post ‘look at this data I just mashed up’ and I’d like SAFE devs to be doing the same.

Of course we all had problems getting to grips with it, and rdflib.js. So that’s the kind of area I see the opportunity to improve. It is poorly documented and the libraries to use it fairly piecemeal. But that is all being worked on in different areas right now, but Maidsafe are not involved.

I would rather we were working with Tim’s team to show devs that RDF is massive and compelling, and finding ways to make using it as second nature as JSON.

davidpbrown · October 2, 2018, 4:54pm

Scrappy wee chart looks to use wee font in places that I cannae see!..

I’ve yet to get my head around [Rust client libs / APIs from bindgen] and whether those will be able to do everything or whether the Data API necessarily will be route to certain functions… given its suggested Data API is “For managing data on the network. MD/ID.”, perhaps that is the one route in for those core interesting functions??

My 2cents is the obvious - that any and all APIs need to be as simple as possible… documented to the point that stupid can engage… for everyone. As a noob with no experience, I could hack ye olde REST API … and looking then now for literal examples of “do this” to [authenticated/unauthorised].[read/write].[mutable/immutable] for each kind of function… zero homework!.. and ideally all available in Rust as base option.

It would be interesting to see a full list of functions set out… and in the case there is functionality that would be expected not available list that too. Any absences the might be challenged by the community. So, no surprises about what is expected from the API and not… and then allowing for imagination to take hold, even if the development waits for the API to become confirmed. For example, will streaming of large files be an option?.. I guess there are application types that will be interested to know that.

Important is knowing what is stable and where to find that in repositories (assume users haven’t used Github to navigate to the latest); and then is the API itself simple for a noob to jump the hurdle and use - that in any language, by pairing simple examples of other common API instances… so, point being that REST is so common that there are lots of abc guides… will this API flavour now have the same - or an interface that needs no manual?..

brain over