XOR address URLs (XOR URLs)


#1

Introduction

The topic of being able to share files/data stored on the network with links, which don’t use the DNS system, has been discussed a few times in the past on our forums (see further below for some references to previous discussions).

Currently the safe-app-nodejs API, and our SAFE browsers, support fetching safesites and files published with the DNS system, from URLs like safe://<service name>.<public name>/<path>, but it’s not possible to fetch data using their address on the network, i.e. the XOR address of any data stored on the network, without publishing it under a public name.

Is this a CAS?

There is a concept of Content Addressable Storage (CAS) but from our understanding, there is a subtle difference with our proposal here. A CAS seems to assume the address is derived from the content itself, and in the case of the SAFE Network data this is true for ImmutableData XOR addresses, but not for MutableData XOR addresses, this is why we hesitated to simply call this proposal to be a CAS. But it would be good to also hear other opinions in this regard. We called it XOR URLs in an attempt to highlight they are not the public-name URLs (DNS), and that they are based on the XOR addresses of the native SAFE Network data types.

Related discussions on the SAFE Network Forum:

Proposal

In order to be able to share a file/data with a URL without the need to publish it with the DNS system under a public name, it is herein proposed:

  • Extend the webFetch API to also accept XOR URLs which are constructed based on the XOR address of the file/data.
  • Be able to retrieve the content of native data objects, like the MutableData, as raw data. E.g. get the list of entries stored in a referenced MutableData.
  • Have our browser render the content in different ways if the content retrieved from a XOR URL is the raw data of a native object, effectively becoming an explorer for SAFE Network native data structures.

Specification and general considerations

XOR address encoding

The XOR address shall be encoded in the XOR URL, base16 encoding seems to be a good choice as it is case insensitive, as opposed to other case-sensitive encodings like base58btc or base64url.

ImmutableData XOR URL

XOR URLs for ImmutableData’s are the simplest ones since they don’t need any additional information to uniquely identify them on the network, as opposed to MutableData’s that also have a type tag. Therefore an ImmutableData XOR URL can be simply defined as safe://<encoded ImmutableData XOR addr>.

MutableData XOR URL

As already mentioned above, a string based on the XOR address along with a type tag is needed to uniquely identify a MutableData on the network, therefore the XOR URL for a MD needs to include the type tag in it.

When a MutableData is fetched, if it’s not an NFS container with an index.html file, the webFetch function can return the MutableData’s raw data so the browser (or any client app) can render it in a different/specific way (see below for more details of what’s being proposed here in this regard).

MutableData versions

MutableData’s are versioned and therefore this shall be also accountable in the XOR URL format to allow any MD URL to (optionally) reference a specific version.

The version value can be used to enforce a specific version to be retrieved, and otherwise fail if that specific version is not found. On the other hand, the latest version will be retrieved if the version value is omitted from the XOR URL.

Paths

Given that a referenced MutableData can effectively be the root NFS container of a safesite, any MD XOR URL can also specify a path which needs to be resolved, and the content retrieved, as when using the DNS system with public-name URLs.

Browsable content

Currently, when a public-name URL is resolved to an NFS container which doesn’t have an index.html file, the browser simply shows an error stating that the safesite content was not found.

For a MD XOR URL that doesn’t have a path, the webFetch function shall return the raw content of the MD (i.e. its key-value entries), and the browser can automatically generate an HTML page which makes the content browsable, generating links to other data when an entry’s key/value is a safe:// string, in an analogous/similar way to how web servers on the current internet allow browsing on folders.

A specific MD entry key could be supported (e.g. __non_browsable), that the owner of the MD can insert in the MD if the “browsable” feature should not be enabled. Although this cannot be really enforced but offered as a feature to be optionally supported by some clients like our browser.

XOR URLs specification

The following are the main requirements we would like to have for the encoding we use to generate the XOR URLs:

  • Be able to support new and different types of base encodings and hash functions for the XOR addresses in the future.
  • Include the content type within the XOR URL which would allow the client app to correctly render the data to the user especially when referencing an ImmutableData.

We are considering the use of multiformats and CID, which allow us to cover the above requirements. We can use a CID identifier in our URL for specifying the XOR address part, and have additional parts to support MutableData’s type tag, version, as well as the path and query parts. We can then define our SAFE URLs in the following way (BNF-like):

<safe-url>            = 'safe://' ( <xor-uri> | <public-name-uri> )

<public-name-uri>     =  [<service> '.'] <public-name> <path-query-fragment>

<xor-uri>             = <immutable-data-uri> | <mutable-data-uri>

<immutable-data-uri>  = <cid> <query-fragment>

<mutable-data-uri>    = <cid> ':' <type-tag> ['+' <content-version>] <path-query-fragment>

<path-query-fragment> = ['/' <path>] <query-fragment>

<query-fragment>      = ['?' <query>] ['#' <fragment>]

Where:

  • <cid>: follows the CID format which self-describes and encodes:

    • the base encoding for the string, we propose to use base16 for the reasons explained above,
    • the version of CID for upgradability purposes (v1 now),
    • the content type or codec of the data being referenced,
    • the XOR address of the content being referenced
  • <content-version>: for future implementation to reference versionable content, using a single address with different versions

  • <type-tag>: the type tag value if the CID is referencing a MutableData. In the absence of this value, the CID will be assumed to be for an ImmutableData

  • <path>: the path of the file if the CID is referencing a MutableData which can be accessed through the NFS emulation convention (or other emulations/conventions in the future)

  • <query>: query arguments, to be used by the client app and not for retrieving the content

  • <fragment>: fragment of the content, to be used by the client app and not for retrieving the content

The webFetch function will simply attempt to decode the <cid> part, and if it fails, it will do a fallback to assume it’s a public-name URL.

The following are examples of what would become valid XOR URLs:

  • ImmutableData XOR URL: safe://a078516207e36aa2371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4dbb59
  • MutableData XOR URL: safe://f015516207e36aa5371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4db1cc:15001
  • NFS MutableData XOR URL: safe://2c481d6207e36aa5371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4dba5f:15001/myfolder/page.html

[RFC] Public Name System: Resolution and RDF
[RFC] Public Name System: Resolution and RDF
#2

Great to see this vital feature is on the way, but time spent thinking it through has clearly paid off. Very nice. Can’t wait for the PoC!

Any chance of some CID examples? In wondering how you would encode common MIME types for immutable files for example?

Thanks.


#3

Not a big deal, but personally I’d prefer the shorter base 64 numbers. Seems unlikely that people would be typing them out anyway (usually acquire via qr code or link).


#4

The following is a valid CID, you can use this page to decode it: http://cid-utils.ipfs.team/#f015516207e36aa5371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4dbb59
You will see that CID has raw as the codec/content-type, that’s because currently it seems that the list of codecs is not finalised and we may need to send PRs to it for adding them, as well as probably to the CID/multihash project/s to support them. This is actually part of the next steps I want to test with the PoC code, to validate this.
BTW that is a valid address on Alpha2 as well, there is a MD there I’m using for testing :wink:

Yeah, I was also interested in shorter ones and that’s how we ended up looking at those other options, but I found case-sensitive would not only be a problem for users but also potentially for devs (and us :slight_smile: ) as I was trying to parse the XOR URLs with parseURL function and it was already converting it to lowercase, so that’s when I thought we may want to avoid the case sensitiveness issue altogether from the very beginning. As you said, it’s very likely people won’t type them anyway so it shouldn’t be an issue but case-sensitiveness apparently could.


#5

There are more ways to encode without case sensitivity. base32 is also case insensitive and it usually does not include 0 and 1 to avoid confusion with the letters O and I. (It also has variants that avoid even more confusion like Crockford’s Base32.) There are probably more solutions that are case insensitive.


Had more critique on some points, but after more careful reading it looks well thought out! Well done.


#6

The base32 looks nice and it’s indeed shorter @bzee (59 characters as opposed to 73 with base16), I just checked the multibase library for what is supported, and the following are examples of how our URLs may look like with base32, base32hex, and base32z (Corckford’s base32 is not supported/implemented at the moment):

base32: safe://bafkrmicq4xa5j4dfpr4n65vtgtwdp6f5fkjsxek5uqftymdpa62beqo37q

base32hex: safe://v05ahc80b6kvtvtcem1arpvqgc62mds8q87dgelkgvumrp8iumjissgj498

base32z: safe://hyfktced1wsgxzyxuiwrzrukgcagpxzowfy35zpo6hdd5jyhqxij84ri8uo

As per the author z-base-32 was designed to be easier for human use and permutes the alphabet so that the easier characters are the ones that occur more frequently: http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt


#7

Of those, I’d lean towards human readable where available. Doesn’t look like there’s many negatives to that…


#8

This proposal has been formalised with an RFC which is already available for review and debate. Feel free to participate in the discussions happening in the PR itself, or otherwise adding questions/comments here as well.


#9

I like your suggestion of using xor:// as the header in the browser. I think it provides a nice means of partitioning the two methods for requesting data. I find it to be an analogy to using https:// vs. ftp:// in a standard web browser like firefox. Mixing the two under a single safe:// could muddy the water and be confusing to a lot of people IMO.

Being reminded of ftp made me think of another acronym that might be fun/familiar for people to use instead of xor://. How about this one?

xtp://

It would refer to your “XOR Transfer Protocol” :slight_smile:


#10

I think if we are going to use safe:// in the browser then it might be a lot better to keep the xor addressing associated/consistent with the use of safe://

So maybe safexor:// might be a better option.

If you use xor:// then for normal addressing it should be ID://

This is why xor:// does not match the usage of safe:// It would be inconsistent naming to use xor:// when using safe:// for sites so I thought safexor:// might be better

BTW you noticed that ftp:// is consistent with http:// and https:// since all are singular protocols whereas safe:// and xor:// is like having network acronym with a address acronym. Just inconsistent. So if xor:// then ID:// is consistent. OR safexor:// and safe:// is consistent like HTTP and HTTPS are


I had thought of safex:// for xor addresses except safex is another project out there.


#11

Yeah, you’ve convinced me. Keeping everything as safe:// as specified in the RFC is a simpler interface. “In the end there can be only one… :wink:” Thinking about it a little more brought another analogy to mind, typing in ip addresses to a browser vs. domain names, which may have been what you guys had in mind from the start. Just like you can go to http://172.217.197.139 or http://www.google.com for example.


#12

Still I think there is merit in safexor:// so as to prevent malicious manipulation where someone registers a ID name that is actually an XOR address. This would then prevent the actual XOR address from being accessed using the browser.


#13

slight preference here for safe-xor:// if that’s allowed, over safexor://

I think it helps with spoofing a bit, but it won’t stop it, so I think it is moot whether we do this or not.

Ideally the browser should warn, or simply not serve Public Name that could be an xor address - eg of the same length, assuming all xor addresses are a fixed length? This will not inconvenience users, only those who try to mess with them I think.


#14

Although I think the current proposal (safe://<xor-uri>) is sufficient, if we go for a distinct naming scheme for xor uri’s, I prefer xafe://<xor-uri>.


#15

So if a popular set of chunks are known by their XOR addresses then all I need to do to prevent others forever retrieving them via the browser using safe://<xor-url> which is using the naming system is register the xor address as an public ID name. Thus preventing others getting the real chunk(s) but getting my spoofed site. I do think there is a need to specify if its a public name we are after or a xor address. (eg safe-xor:// or safexor://)

Or am I missing something.


#16

I think so. It is up to the browser whether it interprets what follows safe:// as an xor address or a public name.

I’m suggesting it checks if that qualifies as an xor address, (not actually looking it up, but by parsing it, checking length or whatever) and if so treats it as one.

So if you register a domain that looks like an xor address, the browser will not look up your spoofed xor site. It will attempt to retrieve the data at that address and that’s it.


#17

As @happybeing mentioned, isn’t the determination as simple as testing the string length? The network would just need to enforce a length limit for public names/domains so that they are less than the base32 fixed length hash, right? Keeping it all as safe:// is definitely the preferred choice then as proposed in the rfc. I feel silly for not reading the rfc more carefully.