[RFC Discussion]: XOR-URLs

bochaco · January 28, 2019, 10:54pm

@bzee, I think this was just trying to mean that if the resolver supports to fetch and decrypt MD entries it may need to try first to decrypt them and if that fails then simply assume that entry is unencrypted, in priv MDs entries can be both encrypted but some unencrypted. I’ll be looking at applying some of the corrections made above by @mav so I’ll try to also enhance this paragraph as this bit is definitely not clear at all.

I think the confusion here has to do with the fact that the RFC is talking not just about the XOR-URL spec being proposed, but also how we can/want to support it with our resolver, i.e. our webFetch and fetch browser embedded functions.
E.g., if you are fetching a WebID with a fragment, I thought it would be desirable to have the fetch/webFetch to already take care of it and return the corresponding graph (in cases you have more than one foaf:Person graph in the same WebID), but it’s also true an app may not want that and would prefer to obtain the complete WebID profile doc for it to process the fragment (or not ). This can probably be an option which can be passed in when invoking the resolver stating if the fragment shall be processed by the resolver, or as you suggested, simply have the resolver to not support fragment inspection/resolution.

When looking at how to encode the type tag I was having the same type of thoughts and point of view as @bzee, I don’t think MD urls are literally querying a specific type and/or version of a MD in the sense that querying is supposed to be something targetting the client app or website, and nothing to do with the resolver or how to route the request, just like the fragment. So we thought that since the type tag is used by the vault to retrieve the service/content (as bzee says you cannot do it without a type tag), it fits quite well trying to make an analogy with port numbers.
I think along the same lines for the version, this is not defined yet how it will work, so we don’t know if different versions of the “same” content would be store/hold by the same section/group, I presume it all depends how this will be implemented. And even if they are in the same section/group, providing the version is/will still be part of the routing/locating mechanism rather than something dealt by the client app.

Now, I have recently started to think about the need of a type tag number, simply considering not having type tags at all to route and locate any type of content. From a user’s perspective, why do I need a “number” to locate a content on a network which holds immutable versionable data, for which I already have a hash of the content I’m looking for? AFAIK, the idea behind the type tag was to be able to have the network (vaults) to act upon data in specific ways for predefined type tags, e.g. safecoin is typetag X so mutations/requests for such MDs can be treated in a special way. But what if we can do that in some other way and remove the type tag altogether? wouldn’t that be a much cleaner CAS and URLs?

(Note I have no clear understanding yet if it’s possible to get rid of type tags but throwing it for consideration as well)… thinking out loud, what if some bits of the XOR address are actually the type tag, and what if we even expand the XOR name address space if we want to have 256 bits for it and add some more bits which are used for type tag? @ustulation / @nbaksalyar does this sounds too crazy/wrong?

I would be more inclined to this option rather than any of the other where querystring or fragment is used for any of the type tag or versioning. I’m also thinking of the suggestion from @hunterlester Cannot fetch content with a XOR-URL which typeTag is greater than 65535 · Issue #429 · maidsafe/sn_browser · GitHub, what if we propose an addition to the multiformat protocol where you can append some custom data and which we use to encode the type tag number. I’ll have to think about this more thoroughly though.

mav · January 31, 2019, 1:36am

The use of the term website is not really ideal here (page was also used in an earlier post). I reckon this url scheme is closest to an api design, not a website or webpage. Querystrings are a way to refine the data returned from the specified endpoint. Querystrings are consumed by the endpoint, not by the client.

The host field in the url maps exactly in concept to the xor location of the data, which is used to route the request. Host/xorname is not used to refine data returned from the endpoint.

The type tag and / or version number maps almost exactly in concept to the querystring as a way to get the endpoint to refine the result from a larger set of data.

The point I want to make here is that type tag is not used for routing the request. This is my reason for wanting it not to be encoded as subdomain in the host field but instead as querystring. To my mind this is closer to the semantics of the current url rfc 3986. Type tag doesn’t affect routing thus it doesn’t belong in the host field.

In a similar vein, it might be good to also think toward the future when compute is on SAFE and how futureproofing that kind of feature might impact current URL design thinking. It’s an unknown for now but that doesn’t mean we can’t at least consider keeping some options open for future uses.

I disagree (to an extent). Type/version is only used by the endpoint to filter the potential results at that xor address down to the correct data, ie it’s not used for routing (I think I’m using a different definition of routing than you; I consider routing to stop when the request reaches the vault containing the data, operations internal to the vault such as picking the correct chunk off the disk are not routing). Type/version is not part of routing.

Anyway I think I’ve already said more than necessary on this topic considering the relatively small magnitude of the point. Please carefully consider what is ‘routing’, ‘location’, ‘filtering’ etc and apply those semantics consistently within the url encoding. As IPv6 and BIP70 has shown it can be very hard to change or extend once an existing standard is in use.

rob · January 31, 2019, 7:10am

While I agree here, the is perhaps another way to consider it.

For instance when the address is routed only the upper “x” number of bits is used. The rest are contained within the section/group. Not the number of bits will vary as the number of sections vary and how they split. But it will still only be a small portion of the address used for routing.

Now the tag when requesting a MD could be considered the lower 64 bits of a combined address.

bochaco · January 31, 2019, 4:12pm

It sounds like it just has to do with the fact there is no server, so we now have only the routing side of it, and the client side.
From my point of view the client side is everything that a traditional script would do on the client or server side, or even the web server process itself, this includes the query string used by a script and path which would be taken care by the web server process. Then the type tag or version is kinda in the middle/gray-area where traditionally has to be resolved before a server side script or webserver process deals with the request, which in my view/model is still part of the routing the request to the targetted content. So it’s a matter of where we draw the line I guess.
EDIT: however for the version, as I said before, I think it all depends how it’s implemented, if different versions are kept in the same section or not, then it would go on one side or the other.

mav · January 31, 2019, 11:58pm

Agreed, this is a very reasonable perspective. The traditional role of querystring maps onto fairly ambiguous territory when used in SAFE.

riddim · March 20, 2019, 11:39am

Stupid question

Xor urls - how are they defined by now…? I saw a lot of discussion and this rfc

Sorry for not participating in this earlier and complaining now… But since the proposed z-base32 encoding is no standard encoding I wasn’t able to simply just generate the string and try to see my uploaded data with the browser…

Since this was kind of a bummer for me (and the z-base32 implementation for python that can simply installed with pip is python2 only for now) I thought it would ask why you want to use a non standard encoding which might cause trouble with other programming languages too…?

The argument with easier readability doesn’t count imho because we are talking about xor addresses which are hashes of data (at least for immutable data…?) which should result in kind of randomly distributed bits… So the characters should appear pretty much equally distributed (and with creating a new mutable data with the create new random it should be the same)

In addition to this I like the argument that base64 encoding is shorter (and nobody would type those links by hand anyway)

joshuef · March 20, 2019, 12:03pm

hey @riddim, not stupid questions at all!

The RFC is still on going so all comments / concerns / questions are very welcome!

Can’t say I agree here, I’m afraid.

z-base32 is case insensitive, which can be important when copy-pasting /passing around / normalizing urls to avoid issues (we’ve had issues with base64, already). It also strives to remove characters that could be easily confused, which should make them just that bit more useable.

In addition to this I like the argument that base64 encoding is shorter (and nobody would type those links by hand anyway)

If no one would type them, what does it matter if they are longer?

If you do type them out for whatever reason, and one or two or three characters can be confused, that’d be a huge pain (IMO).

That, at least ,is part of the reasoning for the choice of z-base32 (@bochaco maybe you can remember anye more ?)

The availability of parses might be something worth having a look at, certainly

Although as a developer you are able to use the SAFE APIs to decode the strings and retrieve the mutable data info if that’s what you’re after. Or are you imagining other situations to be using the XOR-URLs?

I’m not sure I follow.

You need to upload the data to the network and retrieve the given xor-url based on its location in the network. You cannot simply base32 some data and pass this link for the browser to decode / display data (Is that what you’re imagining?)

bochaco · March 20, 2019, 1:39pm

That was the reasoning about the choice of z-base32 so far @joshuef @riddim, for a URL having a case insensitive encoding seems a must to me.
Also, regarding different languages, you can take a look at the CID implementation for different languages if you don’t want to use the functions exposed by the SAFE API: GitHub - multiformats/cid: Self-describing content-addressed identifiers for distributed systems, we are simply using the JS implementation of CID and multiformats, and whenever this is implemented in lower layers in Rust we will presumably also be using the available one https://github.com/multiformats/rust-cid, once it’s in our SCL layer you shouldn’t need any other package/lib but just use what’s available in our API through the lang binding you are using.

@riddim, please take a look at the snippets and little description in the following post, that’s how you can retrieve a XOR-URL (remember you need the experimental APIs enabled), this may help you in getting a sense of the current state of the proposal, this is part of https://safenetforum.org/t/release-safe-browser-v0-11-0/26792:

XOR-URLs can be obtained for any ImmutableData and MutableData on the Network

As mentioned above, applications can now fetch any MutableData/ImmutableData from the Network using its XOR-URL. The APIs are now capable of generating and returning the XOR-URL for any MutableData and ImmutableData the application has access to. For more details/information about the XOR-URLs you can reference directly from the proposed RFC , from the discussions taking place in our dev forum , or by watching this screencast . The following snippets are examples of how an application can obtain the XOR-URL for a MutableData and an ImmutableData respectively:

Get MutableData XOR-URL
const typeTag = 15015;
const md = await safeApp.mutableData.newRandomPublic(typeTag);
await md.quickSetup({ key1: 'value1', key2: 'value2' });
const info = await md.getNameAndTag();
console.log('MutableData\'s XOR-URL:', info.xorUrl);
Get ImmutableData XOR-URL
const imdWriter = await safeApp.immutableData.create();
const cipherOpt = await safeApp.cipherOpt.newPlainText();
await imdWriter.write('my text');
const getXorUrl = true;
const info = await imdWriter.close(cipherOpt, getXorUrl);
console.log('ImmutableData\'s XOR-URL:', info.xorUrl);

riddim · March 20, 2019, 3:50pm

first of all - thanks for your fast feedback

true

@bochaco @joshuef is it possible that you both are suggesting to use the JS bindings in python? Oo

sorry - i might have not explained enough what i wanted to do - but this now is a bit frustrating…

i thought the whole point of xor-urls is that i can locate something in the network …

i have exactly 160 bytes of MDataInfo that enable me to find my mutable data from any computer where i put them on

somehow i thought that i would need to e.g. take the xor-name (32bytes), take the hex representation (or some widely used encoding)

put a safe:// in front of it - maybe a :777 at the end for the type_tag

and just view my uploaded mutable data in the safe-browser Oo …
…if i don’t know how the xor-url is defined - what i need to do with which bytes i cannot share data uploaded from python in a simple way (and i don’t want to be rude - but the only thing i want to know is instructions on ‘doing what with what’ to generate a link to my data coming from the MDataInfo-object - i thought i wouldn’t be asking for a lot here … all in all we should be talking about a maximum of 160 bytes …)

bochaco · March 20, 2019, 4:23pm

Briefly, the reason we don’t simply encode (with baseX) a XoR name/addr and use just that it’s because we are trying to encode some additional information in the URL and also account for any future changes in some of that information, this is what multiformats protocol provides, e.g. if in the future we want/need to use any other base encoding we can do it and the format already specifies which base we are using, the same goes for other info we are trying to include like the content type of the data being referenced by such a XOR-URL.

No, that’s not what we suggest exactly.

We are proposing to have the XOR URL to be a standard type of URLs for SAFE, which means our APIs should allow you to generate them and make use of them, therefore you wouldn’t be generating them yourself in any language but just use the APIs provided. This would need to be provided at the lower Rust lib so it can then be exposed in any language binding like Python binding. At the moment the PoC has it implemented at JS layer only but that’s just the PoC and not where such an API core implementation should be, but in Rust instead.

Since you seem to be trying to implement this in Python already, then unfortunately you would need to implement it yourself, or wait till it’s available in safe_app FFI API and just create the binding functions.

Now, if you want to implement it yourself in python right now, you will have to implement the same as it’s made here for MD XOR-URLs: safe_app_nodejs/src/api/mutable.js at master · maidsafe-archive/safe_app_nodejs · GitHub, which is what is speced out here in the RFC: https://github.com/maidsafe/rfcs/blob/357384147ae005e4061079b27a30f43cf379fda5/text/0000-xor-urls/0000-xor-urls.md#xor-urls-specification, the same goes for ImmD, you can see how it’s implemented here: safe_app_nodejs/src/api/immutable.js at master · maidsafe-archive/safe_app_nodejs · GitHub

As you can see in the code, we use the CID/multihash implementations already available for JS, i.e. we don’t even deal with base32 encoding ourselves. So you could do the same by using the implementations available for Python: GitHub - ipld/py-cid: Self-describing content-addressed identifiers for distributed systems implementation in Python and GitHub - ivilata/pymultihash: Python implementation of the multihash specification

riddim · March 20, 2019, 4:43pm

oooh - thank you well that was very helpful now =)

then i may be just too impatient - kk
i somehow didn’t expect it to be something this fancy … (i’ll read something about it later on … thanks for clarification =)

riddim · March 21, 2019, 1:00am

haHAAAAA

safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777

thank you very very much @bochaco - awesome xD

ps: side note encodings/hashes used are the following: (thank you for the comment with hin further up in the JS code =)

okay - that’s cool with me then just was super confused that it didn’t look like a ‘standard procedure’ or a ‘standardized way’ of doing it … but since it is used by IPFS as well and it seems to be a thing (i didn’t have a lot of trouble reproducing it)

but just to mention it - it’s not z-base32 you are using but standard base32 as far as i can tell

bochaco · March 21, 2019, 3:20am

Nice!
We do use base32z, you can see the definition of the consts.CID_BASE_ENCODING set to base32z here: https://github.com/maidsafe/safe_app_nodejs/blob/master/src/consts.js#L114, but I’m not sure why the one you are generating with base32z is not finding the content, I’ll debug it tomorrow, I can only think that perhaps the content type dag-pub could be causing some problem, not sure, but if you look at the JS code we set raw as the content type for MDs. This is the equivalent base32z we generate for the same XOR name ( 0x8f7fdd831e0ef35eb7f46965b8b0915e636522a0ffda8c73261983b50188289d):

safe://hyfktcerxx9qag8oq6pxmx7djcshmbrk6cp11fe895kg8gjo3oq4odnbeuw:777

Anyhow, this demonstrates you already that the CID format we use here is allowing you to encode it with another base (base32) and because that info is part of the URL you generated (safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777), the browser is able to decode it and find the content, so even that we are trying to make a decision on which base we want to use as the standard encoding, people could generate URLs with other encoding and the address still be decoded from the URL by the browser and API.

riddim · March 21, 2019, 6:09am

Oooooh - beautiful!

Ps and sorry for not simply adding the other address as text for copying…

riddim · April 7, 2019, 2:39am

TL;DR: the current cid solution looks nice and is a very elegant way of doing it but is imho not very flexible and since we need to provide only 24 byte of data to discover a piece of data (160 for including all the keys) this solution is overy complicated for a super simple task

current implementation/suggestion is this:

def getXorAddresOfMutable(data, ffi):
    xorName_asBytes = ffi.buffer(data.name)[:]
    myHash = multihash.encode(xorName_asBytes,'sha3-256')
    myCid = cid.make_cid(1,'dag-pb',myHash)
    encodedAddress = myCid.encode('base32z').decode()
    return 'safe://' + encodedAddress + ':' + str(data.type_tag)

we sha3-256 the 24 bytes of the xor-name, then we twist it somehow into a cid and in the end we encode it into base32z-ish

if we look at the bytes we can see that the difference between the hashed and unhashed value is not large - and if we look closer than we see that the sha-3’d values are just patched with (hex) 1618 from 24 to 26 byte

then some magic with cid happens and we can somehow revert that to get the xor-name back.

if we want to be case-insensitive and use a base32-encoding we are only slightly shorter than just using the hex-value. - do we plan on integrating additional information into the cid? or is it just a fancy way to do it …? if the second … then why would we do this instead of the simple hex-value that is easy and well-known/understood and not calculation-intense …?

in addition the this i saw somewhere the question/idea (i think by @happybeing) that we could make even xor-urls for private data; while the argument against it was that we cannot encode the additional keys in the cid …

if we just take the hex value of the MdataInfo we can have 160 byte in hex-representation that are describing our piece of data perfectly (the following piece of data is unencrypted so all the keys are 0); yes nobody wants to type those by hand - but as clickable link or qr-code that’s an absolute valid solution imho

if we want to give someone-read-only-access to a file we mask the key that enables modification of the file and only provide the decryption key … simple as that …

and since the MDataInfo is exactly what a program needs to handle a piece of data; and hex representation is super easy to implement and clearly defined (and easy to adapt in case something changes with appendable data/any future data format change) i really really really don’t see why we would go for something unnecessarily complicated like cids …

ps: and in addition to this the currently selected encoding base32z is definitely not a super-widely used encoding that has a different implementation in e.g. python than in Javascript and the python-generated base32z-links don’t work with the browser and i cannot decode a link that i get from the browser/JS API in python (!)… (the good thing is that “standard encoded” links do work in the browser too - but this means again that we have redundantly defined many links to the same location in the network - while the redundancy doesn’t provide any additional benefits because it’s not like check-bits that show you at least that the link is correct or wrong but it’s just randomly spread links that suddenly end up in the same location …)

so i guess my suggestion would be to just use the hex-values of the name or the whole Info*-object as xor-urls where one can mask the properties which should not be shared … simple to integrate for private data too and in the ‘easy case’ only 5 characters longer than the current proposal

(and for e.g. the type-tag 18446744073709551615 (largest possible if i’m not mistaken) the difference is [including separator] exactly 1 character vs. included in the hex string:

)

riddim · April 7, 2019, 2:57pm

pps: okay - and if i really missed something about the multiformats-thing that would make it very useful and cool for the future…

…you don’t want bas64 because of upper/lowercase and special characters … you don’t want base32 because of similar looking characters … why don’t simply choose base16 (hex) as default because it’s standardized and well known …?

(and if someone just wants to generate a link in a random language that doesn’t implement cids yet [maybe e.g. rust for exposing this through the client libs?] he can just hex the xor-name, put e.g. ‘safe://f01701620’ (description: sha3-encoded + first patch-bytes because the xor-address is shorter than the checksum, bytes then hex-encoded) in front of it and have a working link to the newly generated mutable:

safe://f017016202976d0fbd38b8d2d29bde345379b4d541b4c76c4df33920171ef20fa70f33ac8:777

because of the self describing nature of the multicodec-thing this already works anyway - it’s just not obvious to someone wanting to do it … or if you want to have your address to be keccak-512-encoded you put ‘f01701d20’ in front of it and magically end up still in the same place

safe://f01701d202976d0fbd38b8d2d29bde345379b4d541b4c76c4df33920171ef20fa70f33ac8:777

(the trick with the cids seems to be that they choose a hashing function to patch the data to a working size - and then they put the information about the used hashing function + the used encoding for the following string at the front of the string of the output)

if you personally prefer your base32z encoded strings because they are a bit shorter and easier to identify/type in then you can still generate them on demand and the browser will accept them and show you the location you want it to show …?

but please don’t use a non-standard-encoding as default behaviour in your api …

yes it’s nice that you can choose the representation of your liking for the data you encode:

safe://mAXAWII9/3YMeDvNet/RpZbiwkV5jZSKg/9qMcyYZg7UBiCid:777
safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
safe://zdjCkrCPzAj3i5AoLGmfsjvpoz1tU9LXEinMkPrf2QzD4t9cg:777
safe://f017016208f7fdd831e0ef35eb7f46965b8b0915e636522a0ffda8c73261983b50188289d:777

some of them are implemented in JS, some not … but i don’t see the value in it as of now … would be nice to be able to use the base58btc because of the length … but that again isn’t implemented in JS as of now … and if you choose to do it then i would tend to just append the type-tag in encoded form instead of doing the :777 thing … i don’t know where the real value is there [to encode it base10 and introduce a separator]

other examples for working links for the interested reader - all ends up at the same place

bin::          safe://bafkrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
base1::        safe://baearmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
base8::        safe://baedrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
base10::       safe://baeermiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
cbor::         safe://bafirmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
protobuf::     safe://bafibmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
rlp::          safe://bafqbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
bencode::      safe://bafrrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multicodec::   safe://baeybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multihash::    safe://baeyrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multiaddr::    safe://baezbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multibase::    safe://baezrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha1::         safe://baeirmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha2-256::     safe://baejbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha2-512::     safe://baejrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
dbl-sha2-256:: safe://baflbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-224::     safe://baelrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-256::     safe://baelbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-384::     safe://baekrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-512::     safe://baekbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
shake-128::    safe://baembmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
shake-256::    safe://baemrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-224::   safe://baenbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-256::   safe://baenrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-384::   safe://baeobmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-512::   safe://baeormiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
murmur3::      safe://baerbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
blake2b-8::    safe://baga6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-16::   safe://bagboiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-24::   safe://bagb6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-32::   safe://bagcoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-40::   safe://bagc6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-48::   safe://bagdoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-56::   safe://bagd6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-64::   safe://bageoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-72::   safe://bage6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-80::   safe://bagfoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-88::   safe://bagf6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-96::   safe://baggoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-104::  safe://bagg6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-112::  safe://baghoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-120::  safe://bagh6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-128::  safe://bagioiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-136::  safe://bagi6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-144::  safe://bagjoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-152::  safe://bagj6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-160::  safe://bagkoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-368::  safe://bagxoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-376::  safe://bagx6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-384::  safe://bagyoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-392::  safe://bagy6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-400::  safe://bagzoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-408::  safe://bagz6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-416::  safe://bag2oiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-424::  safe://bag26iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-432::  safe://bag3oiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-440::  safe://bag36iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-448::  safe://bag4oiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-456::  safe://bag46iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
ipfs::         safe://bagsqgfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
http::         safe://bahqagfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
https::        safe://bag5qgfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
quic::         safe://bahgagfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
ws::           safe://bahoqgfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
onion::        safe://bag6agfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
p2p-circuit::  safe://bagraefrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
dag-pb::       safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
dag-cbor::     safe://bafyrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
git-raw::      safe://baf4bmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
eth-block::    safe://bagiacfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
eth-block-list::safe://bagiqcfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777

ps:

if we would append a checksum instead of the encoding-information-stuff we could not only check for ‘incorrect characters in the encoded string’ but we could do offline-checks for typos …

as of now e.g. both of the strings are valid cids … while the last one has the last character mis-typed if you want to end up at the address of the mutable …

bochaco · April 8, 2019, 8:02pm

The key and main reason is simply to allow us to evolve without breaking things, as an example, your next statement:

Let’s imagine in the future we decide to change the encoding, or even the hash function we use for our immutable data XOR addrs, we will have to make sure that whatever new format we adopt we don’t break backward compatibility with older URLs.

Remember we are after the perptual web, we don’t want to break old URLs just because we are moving away from one encoding to another, or even from one hash function to another. So using the hex encoded XOR addr wouldn’t be enough if we want to accomplish that, we need some other ways to make sure that if I give you a URL to an immutable data on SAFE, it’s immutable and perpetual regardless what’s the most used encoding at any moment.

I get your point, although just FYI Rust seems to be covered already: GitHub - multiformats/cid: Self-describing content-addressed identifiers for distributed systems

This is just another type of CID you are creating, what they are trying to achieve with CID is to have something standard that can be use to encode additional information to the content address. Where is the Rust implementation for that CID you are creating, or the golang one just kidding ofc, I hope you understand what I mean

riddim · April 8, 2019, 8:32pm

So we patch the data we want to encode up a couple of bytes for the possibility that we might at some point randomly decide to change the format…? Hex is the 1:1 representation of the bytes (the information we want to encode) altering it (except from changing the base) always will always be less efficient… Why would we want to become less efficienct?
(and hex is around since the beginning of computing - hard to belief it won’t be understood/be hard to handle at any point in time)

riddim · April 8, 2019, 9:37pm

And imho the multiformat thing just create the impression of a future proof format…

… It can handle different encodings and different patching algorithms (hash functions)… But can it handle if we decide to add 2 additional leading bytes as check sum for offline validation? How does it handle if we decide to move from cid:typeTag to [cidWithTypeTagCodedInForNotHavingASeparator] how does it handle if we expand the address space from 32 byte to 64 byte?

All those cases cannot be simply coded into the cid but we would need to extract the bytes from the cid and then do a case decision on the length of the bytes… Just as we would do without cid… Only that with using cid we need to extract the bytes from it before we can use them… (so one additional step with cid) wasted resources in my opinion…

riddim · April 9, 2019, 4:31am

Let me say it differently.

CIDs are an elegant way of encoding random data bytes into one data encoding of your choosing (taking care of the issues that arise if the data you want to encode doesn’t fit the alignment of the encoding you want to use… you will always get the exact byte string length back you wanted to encode initially) [the chosen hashing function is just a random property of the cid to identify the length of the encoded byte string and to patch/unpatch it in the process …] they don’t take care of any data format changes

So… Unless you plan on changing the base for encoding the xor URL cids can only solve a problem they create themselves… That’s why I’m against cids for xor urls! We know the length of our xor URL, we don’t need to solve issues we cannot have - cids are answering the wrong question and are not the proper tool for this… Why would you add complexity you don’t need? but we could add some additional bits as checksum to verify the validity of a xor URL [and maybe add a byte to describe the encoding if you want… So we’d have self description again if we at some point randomly would think that it makes way more sense to move from base32 to base16 or base58] (!)

Ps:

oh well - you know what… I don’t care anymore… If you love your cids that much then go with them… Imho it’s a bad decision because it makes it overly complicated - but I don’t want to waste more of my life time for this issue that doesn’t matter anyway in the long run… just please don’t use the non standardised base32z encoding in your official api that possibly creates incompatibilities (or at least tell me which characters I need to replace by which others to get back to standard base 32 encoding to be able to decode urls you created)! (and why not append a checksum after the address too for typo-recognition…? Even the Iban comes with check bits nowadays… But please (!) not a complex solution this time but just e.g. Counting one’s in the bytes and take the last 1 or 2 digits or so… )