XOR address URLs (XOR URLs)


#1

Introduction

The topic of being able to share files/data stored on the network with links, which don’t use the DNS system, has been discussed a few times in the past on our forums (see further below for some references to previous discussions).

Currently the safe-app-nodejs API, and our SAFE browsers, support fetching safesites and files published with the DNS system, from URLs like safe://<service name>.<public name>/<path>, but it’s not possible to fetch data using their address on the network, i.e. the XOR address of any data stored on the network, without publishing it under a public name.

Is this a CAS?

There is a concept of Content Addressable Storage (CAS) but from our understanding, there is a subtle difference with our proposal here. A CAS seems to assume the address is derived from the content itself, and in the case of the SAFE Network data this is true for ImmutableData XOR addresses, but not for MutableData XOR addresses, this is why we hesitated to simply call this proposal to be a CAS. But it would be good to also hear other opinions in this regard. We called it XOR URLs in an attempt to highlight they are not the public-name URLs (DNS), and that they are based on the XOR addresses of the native SAFE Network data types.

Related discussions on the SAFE Network Forum:

Proposal

In order to be able to share a file/data with a URL without the need to publish it with the DNS system under a public name, it is herein proposed:

  • Extend the webFetch API to also accept XOR URLs which are constructed based on the XOR address of the file/data.
  • Be able to retrieve the content of native data objects, like the MutableData, as raw data. E.g. get the list of entries stored in a referenced MutableData.
  • Have our browser render the content in different ways if the content retrieved from a XOR URL is the raw data of a native object, effectively becoming an explorer for SAFE Network native data structures.

Specification and general considerations

XOR address encoding

The XOR address shall be encoded in the XOR URL, base16 encoding seems to be a good choice as it is case insensitive, as opposed to other case-sensitive encodings like base58btc or base64url.

ImmutableData XOR URL

XOR URLs for ImmutableData’s are the simplest ones since they don’t need any additional information to uniquely identify them on the network, as opposed to MutableData’s that also have a type tag. Therefore an ImmutableData XOR URL can be simply defined as safe://<encoded ImmutableData XOR addr>.

MutableData XOR URL

As already mentioned above, a string based on the XOR address along with a type tag is needed to uniquely identify a MutableData on the network, therefore the XOR URL for a MD needs to include the type tag in it.

When a MutableData is fetched, if it’s not an NFS container with an index.html file, the webFetch function can return the MutableData’s raw data so the browser (or any client app) can render it in a different/specific way (see below for more details of what’s being proposed here in this regard).

MutableData versions

MutableData’s are versioned and therefore this shall be also accountable in the XOR URL format to allow any MD URL to (optionally) reference a specific version.

The version value can be used to enforce a specific version to be retrieved, and otherwise fail if that specific version is not found. On the other hand, the latest version will be retrieved if the version value is omitted from the XOR URL.

Paths

Given that a referenced MutableData can effectively be the root NFS container of a safesite, any MD XOR URL can also specify a path which needs to be resolved, and the content retrieved, as when using the DNS system with public-name URLs.

Browsable content

Currently, when a public-name URL is resolved to an NFS container which doesn’t have an index.html file, the browser simply shows an error stating that the safesite content was not found.

For a MD XOR URL that doesn’t have a path, the webFetch function shall return the raw content of the MD (i.e. its key-value entries), and the browser can automatically generate an HTML page which makes the content browsable, generating links to other data when an entry’s key/value is a safe:// string, in an analogous/similar way to how web servers on the current internet allow browsing on folders.

A specific MD entry key could be supported (e.g. __non_browsable), that the owner of the MD can insert in the MD if the “browsable” feature should not be enabled. Although this cannot be really enforced but offered as a feature to be optionally supported by some clients like our browser.

XOR URLs specification

The following are the main requirements we would like to have for the encoding we use to generate the XOR URLs:

  • Be able to support new and different types of base encodings and hash functions for the XOR addresses in the future.
  • Include the content type within the XOR URL which would allow the client app to correctly render the data to the user especially when referencing an ImmutableData.

We are considering the use of multiformats and CID, which allow us to cover the above requirements. We can use a CID identifier in our URL for specifying the XOR address part, and have additional parts to support MutableData’s type tag, version, as well as the path and query parts. We can then define our SAFE URLs in the following way (BNF-like):

<safe-url>            = 'safe://' ( <xor-uri> | <public-name-uri> )

<public-name-uri>     =  [<service> '.'] <public-name> <path-query-fragment>

<xor-uri>             = <immutable-data-uri> | <mutable-data-uri>

<immutable-data-uri>  = <cid> <query-fragment>

<mutable-data-uri>    = <cid> ':' <type-tag> ['+' <content-version>] <path-query-fragment>

<path-query-fragment> = ['/' <path>] <query-fragment>

<query-fragment>      = ['?' <query>] ['#' <fragment>]

Where:

  • <cid>: follows the CID format which self-describes and encodes:

    • the base encoding for the string, we propose to use base16 for the reasons explained above,
    • the version of CID for upgradability purposes (v1 now),
    • the content type or codec of the data being referenced,
    • the XOR address of the content being referenced
  • <content-version>: for future implementation to reference versionable content, using a single address with different versions

  • <type-tag>: the type tag value if the CID is referencing a MutableData. In the absence of this value, the CID will be assumed to be for an ImmutableData

  • <path>: the path of the file if the CID is referencing a MutableData which can be accessed through the NFS emulation convention (or other emulations/conventions in the future)

  • <query>: query arguments, to be used by the client app and not for retrieving the content

  • <fragment>: fragment of the content, to be used by the client app and not for retrieving the content

The webFetch function will simply attempt to decode the <cid> part, and if it fails, it will do a fallback to assume it’s a public-name URL.

The following are examples of what would become valid XOR URLs:

  • ImmutableData XOR URL: safe://a078516207e36aa2371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4dbb59
  • MutableData XOR URL: safe://f015516207e36aa5371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4db1cc:15001
  • NFS MutableData XOR URL: safe://2c481d6207e36aa5371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4dba5f:15001/myfolder/page.html

#2

Great to see this vital feature is on the way, but time spent thinking it through has clearly paid off. Very nice. Can’t wait for the PoC!

Any chance of some CID examples? In wondering how you would encode common MIME types for immutable files for example?

Thanks.


#3

Not a big deal, but personally I’d prefer the shorter base 64 numbers. Seems unlikely that people would be typing them out anyway (usually acquire via qr code or link).


#4

The following is a valid CID, you can use this page to decode it: http://cid-utils.ipfs.team/#f015516207e36aa5371e17750c93276446bdb4867c027035531b89430aa8d3ae2fa4dbb59
You will see that CID has raw as the codec/content-type, that’s because currently it seems that the list of codecs is not finalised and we may need to send PRs to it for adding them, as well as probably to the CID/multihash project/s to support them. This is actually part of the next steps I want to test with the PoC code, to validate this.
BTW that is a valid address on Alpha2 as well, there is a MD there I’m using for testing :wink:

Yeah, I was also interested in shorter ones and that’s how we ended up looking at those other options, but I found case-sensitive would not only be a problem for users but also potentially for devs (and us :slight_smile: ) as I was trying to parse the XOR URLs with parseURL function and it was already converting it to lowercase, so that’s when I thought we may want to avoid the case sensitiveness issue altogether from the very beginning. As you said, it’s very likely people won’t type them anyway so it shouldn’t be an issue but case-sensitiveness apparently could.


#5

There are more ways to encode without case sensitivity. base32 is also case insensitive and it usually does not include 0 and 1 to avoid confusion with the letters O and I. (It also has variants that avoid even more confusion like Crockford’s Base32.) There are probably more solutions that are case insensitive.


Had more critique on some points, but after more careful reading it looks well thought out! Well done.


#6

The base32 looks nice and it’s indeed shorter @bzee (59 characters as opposed to 73 with base16), I just checked the multibase library for what is supported, and the following are examples of how our URLs may look like with base32, base32hex, and base32z (Corckford’s base32 is not supported/implemented at the moment):

base32: safe://bafkrmicq4xa5j4dfpr4n65vtgtwdp6f5fkjsxek5uqftymdpa62beqo37q

base32hex: safe://v05ahc80b6kvtvtcem1arpvqgc62mds8q87dgelkgvumrp8iumjissgj498

base32z: safe://hyfktced1wsgxzyxuiwrzrukgcagpxzowfy35zpo6hdd5jyhqxij84ri8uo

As per the author z-base-32 was designed to be easier for human use and permutes the alphabet so that the easier characters are the ones that occur more frequently: http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt


#7

Of those, I’d lean towards human readable where available. Doesn’t look like there’s many negatives to that…