[RFC Discussion]: XOR-URLs

Unless there was a problem in an implementation…

True - while you could argue using those types as encoding function +adding the info how you encodes the data pretty much has the same result as just adding this encoding function as identifier for the mimetype as bytes of the payload… So while you are of course correct I don’t think the message of my statement is wrong… (and if you leave out the encoding option and just go with one - like with mutable data - or probably safecoin - it’s just encoded bytes as I said)

Yeah - right - sorry but thinking that even a small child would implement hex wrong is kind of ridiculous - while implementing base32z is obviously a very different beast

On the one hand, you argue for standardisation of one thing (base32), on another, you argue to create our own version of the well tested CID system.

I understand that bugs are frustrating. But one bad implementation doesn’t negate the benefits of z-base32 though, IMO. Especially considering all that’s outlines above (re: API to avoid need for going manual, human readability etc).

Equally, I don’t see a fleshed out benefit to doing away with CIDs which give us the flexibility to update the XOR scheme as we go, in a standardised, tested, and community supported fashion. If we have issues with, or additions we want to make. We can put them forwards, and improve this standardised system for everyone. From @bochaco’s PR they seem very open to this.

Where did I do that? I suggested to just include all relevant bytes in a base32 encoded cid’

(or just use the bytes and encode them hex - my first suggestion and still my favourite one…)

Ps:
Hey - but just go with what you decided for - people will use it or complain/use a different way to share addresses - no reason to waste many hours just because we disagree on importance of some aspects - we’ll just see and react to what happens

:+1: (Thanks. I couldn’t actually find where I’d read that.)


But aye, I think there’s certainly merit to getting typetag’s in there. And error correction could be useful too :+1: (did you have an example of a URL structure with that? ).


That’s me for the night now. Gotta think about :taco: some :bowing_man:.

1 Like

Hmm - actually I was just thinking about a simple 16 bit checksum for error detection - no error correction (because that would be at least 32 bit for one xor name one bit error correction without any additional data… (from memory… Details on block codes can be found here: https://en.m.wikipedia.org/wiki/Block_code - but all that comes with additional complexity - not sure if that is worth the hassle - just giving immediate feedback on error occurance without network calls might be good enough I would guess…)And a wrong character is most probably more than one wrong bit but a couple of them… )

Part of the confusion or disagreement I think has to do with your suggestion @riddim to encode several other things in the CID, like checksum and typetag, and you claim that would still be a CID, that’s what I believe is incorrect, the CID spec is very specific to what goes in each part of it:

So if you let’s say put: concat(<xor addr>, <type tag>, <checksum>) as the string for the <multihash-content-address> part, then what you have, strictly speaking, is not a CID anymore, as simple as that, it’s another type of content id we define using the multiformats and multicodecs, this is what I think @joshuef is meaning by “create our own version of the well tested CID system”.

Am I against creating our CID so we can incorporate type tag and possibly checksum (which makes sense to me only if we can extend XOR-URLs to use them for safecoin wallets) ? , no, not necessarily, yes I was trying to avoid it if possible to not come up with my own (non-standard) spec of encoding. But if we have to, and we do it, then:

  • We shouldn’t claim we have a CID anymore but our own-CID spec and therefore some of the implementations already available need to be forked and adapted to our own-CID spec (which is not a big deal since we will do it anyway by embedding in the SCL API)
  • We may want to actually propose that to multiformats project as an enhancement to the CID, and work with them on having that to be part of the CID spec maintained in that project.
  • We would be contradicting (no doubt) any argument where we say we don’t like baseX because it’s not “standard”, so we would have to leave that type of arguments out of the equation

I never tried to study this or understand how studies around this were made (I guess you just measure the average occurrance of letters in words of a dictionary), but by only looking at the worn off keys in my own keyboard I have to disagree or be at least skeptical about this statement. In any case, he is saying that since those are most used by humans then it’s easier to read or write than other less used, who knows, but I do like those replacements like removing the 0, L, etc. to avoid confusions

We are talking about xor names (+stuff) … If they are not roughly equally distributed the network is not in balance…

2 Likes

That’s correct @riddim , but I guess you see that the proposal that guy is making is that since those letters are more used in the human vocabulary (assuming english) then potentially they are easy to read and write when used in any string you are encoding with them than others, regardless if you are encoding xornames.


Since we touched on standards a bit, perhaps it’s a nice chance for me to share some thoughts I was having about it in the last few months, not that important for this discussion maybe, but why not share it here, this is all my own personal especulation and how I perceive it.

I think standards, in many cases in the past have been designed and worked out by big organisations which were able to not only invest/spend the money for having people in many long meetings where those standards were defined and documented, but also which were monopolising (well not exactly of course as they’d be a group of organisations) in some way many fields with their products; so if you were a small company or an individual you simply didn’t have the chance to participate in there, and you don’t have much choice but just follow those standars with no vote if you wanna sell anything you produce.

Nowadays, and I think more as we move forward with decentralising several things, I believe small companies or just individuals have more chances to compete with these companies and organisations since they can reach the end users directly, and when that happens they are in a good position to start pushing for any new “standard” that perhaps wasn’t available or defined before. Other projects, companies, individuals may follow that new spec almost immediatelly, to be able to participate in a potentially new wave of sucessfull type of application/service/product, and they won’t wait for any commitee or organisation to gather and agree on the new “standard” they can use, they will just move forward. So I guess I see a decentralisation in this regard as well, who defines a standard? …just some random thoughts :slight_smile:

3 Likes

[ignoring how questionable this thesis by itself is] while they for sure won’t appear more often when encoding xor names (or all compressed or binary data)

So the one and only advantage of base32z is leaving out L (using 1 instead) , V and 2 (0 and 1 are not part of base32 as well

Ah but there is enough left anyway… nm ec nh vvw rv S5 dq hk ft 1f 1t qg pa yx… (ofc always depending on your hand writing - with printed it shouldn’t be a problem either way… )

If you really would want to prevent character mixups you’d go with hex


edit/PS - about standards and encodings/readability

Ps: @bochaco I do agree with what you said about standards - and without someone starting with it no new standards would appear… But I can only support standards that ‘make sense’ imo… Base32z to me just looks like a random definition by one guy who had an idea… The real problem with bitcoin keys for example is that they use upper and lower case and oO0(Q) Il1 look very similar depending on the font you choose… The rest above that is more for hand written stuff and base32 / base32z seem to solve the super problematic characters I mentioned there similar good and when it comes to hand writing there would only be marginally different results in readability + the real solution that would make a difference would be reducing the character set to 16…

… The difference in readability between base32 and base32z looks to me more like a philosophical question and therefore I would definitely go with the wider spread one … (the base32z description is missing any proof for what is claimed imo …)

And that’s too why I really am bored of this topic here by now tbh … At the end of the day I don’t care too much which encoding you choose - I will just work with what I get (and implement base58+base32(+safeBrowser)+hex for python - doesn’t make sense to loop such a little task through the api and needing to take care of forwarded errors returned from rust…) and if many others share my opinion you will get asked the same questions again and again and it’s you who will need to defend your decision again and again :wink: I just want one clear definition that is consistent in itself and if I get that it’s okay for my world :wink:

I said what I had to say about this topic and if you consider implementing block codes for data validation/correction that is a way simpler task than it looks when reading the Wikipedia article (really pretty simple) and I can support you in implementing them too if you want - but I personally wouldn’t aim for too much because it becomes a pretty big overhead for little benefit if you want error correction…

1 Like

I always have a hard time with 1,lowercase l and capital I. None of these please!

1 Like

… yes you really need the right font for them to be easily identifiable…

But that’s never a problem with encodings to base 32 - it’s either all uppercase or all lowercase

(base32 using l+i / L+I and base32z using 1+i / 1+I… - none uses all 3 of l, i and 1)

IMO Base32z would be preferable to read then.

I’m not sure it matters. When do you think reading these will be useful or necessary? I don’t think I’ve ever typed a bitcoin address for example, or a public key etc.

And the American 1 doesn’t have the little hat and looks perfectly like a I again - it’s probably all a bit a matter of taste and readability depends hugely on the context and cannot really be discussed without specifying it precisely …

Folks, as you probably saw on some of the recent dev updates, we went for trying out our own encoding format for the XOR-URL string, as an alternative to using multiformats CID, at least to begin with and see how that evolves and how it works for us.

We therefore have a PR to update the RFC which is ready for review and comments: https://github.com/maidsafe/rfcs/pull/337
The changes as substantial so I’d advice to simply read the new version rather than trying to see the diffs: https://github.com/maidsafe/rfcs/blob/c68925b1d5760b5b1a1b7159b295af7709851ae8/text/0053-xor-urls/0053-xor-urls.md

The encoding format we designed is now implemented on CLI and it’s in summary like this:

  • 1 byte for XOR-URL encoding format version. This is version 1
  • 2 bytes for content type, e.g. Wallet, FilesContainer, MIME types, etc.
  • 1 byte for SAFE native data type, e.g. PublishedImmutableData, PubseqAppendOnlyData, etc.
  • 32 bytes for the content’s XoR name
  • 0 to 8 bytes for type tag value
  • The string is then encoded with z-base32 encoding
  • v=<content version> query arg for the specifying a specific version of the content, or latest if this is omitted from the URL.

I’ll be posting this also on main forum, so feel free to discuss/comment either here or in the other forum.

2 Likes

Not read the RFC yet but want to query [cough] use of the URL query for version in case it interferes with web apps which want to use query URLs. I guess any app author can cater for it, but I wonder if there are things that become difficult to achieve because of this. I can imagine ported web apps could fall foul of this, but am also wondering if it is problematic in other ways. Do you think there are any issues with apps?

I do see pros for this approach versus encoding within the XOR-URI itself for example, but if you have already thought about that and want to explain more that would be good.

I guess the big plus is that you can immediately see that it is a given version from the URI, and all versions have the same root URI. That could be made visible in other ways though. For example, a version indication shown by the browser adjacent to or as part of the address bar might actually be clearer and could be used to provide version selection UI (e.g. a drop down). Such a UI might be desirable regardless.

Also, I think the proposal is probably consistent with WWW recommendations (do you think? I think I read it in one of Tim’s notes about URIs from the early days but don’t have a ref).

2 Likes

We definitely thought about this, and as you say there could be a collision with apps using a v= query param. As you also say the main idea is that you can see (well explicitly) the version you are fetching and be sure it is/isn’t the latest version, or a specific version, which would be hidden in the URL if we encoded into it. This is ofc important as we will have safe:// URLs for things like Wallets, SAFE-IDs, or even just a website could be turned into a scam from one version to another and therefore you want to know which version you are using. We even considered using the port part for it, but we again hit the issue of it being assumed to be only 16bits in many implementations/libs even that TCP spec doesn’t set/define a size.

The main aspect I personally consider with this is that the version is as important as the xorname, or type tag, because of the said above about safe:// URLs will be used for many things, so if we have the version encoded in the string we then will need always an app or API to be able to see and/or choose a different version of it.

One thing I also thought was that if we split the version from the URL, not only in the UIs/apps, but in the APIs, then it becomes confusing since we want to use versioned links in the data itself, versioned links should be ofc supported by linked-data, so we definitely want the version to be part of the URL and not just a separate value we can pass to an API or choose from a UI component.

Another aspect, perhaps a minor one but important from UX perspective, it’s about allowing us to have the homogeneous form for NRS URLs, if we encode it in the XOR URL string, the versioned NRS URL will not only have a different form, but also the same issues raised about the query param.

So, I would personally love to have it somewhere else, as explicit as the query param or as the port number part, but I don’t see another clean option based on these considerations.

2 Likes

Thanks, and yes I agree probably the least bad option, although I think the browser UI my be worth trying alongside it. I think few people will realise the significance of the query parameters, so to guard against bad versions it probably needs a UI to draw attention and give guidance.

Going off topic to give an idea of what I mean, the browser could show a version button that is normally neutral in colour but shows the version as a number. Clicking could show a popup with some info. When an unknown version is encountered it might turn amber, or green for a version we’ve used many times. The popup can explain or link to more information, and offer alternative versions using the same colour coding. Perhaps space can be provided in the XOR encoding for the dev to also provide a short form (eg numeric) URI that could link to more info about versions of the particular data/app?

Just some thoughts and maybe over the top. It all depends on how this idea develops and how users can best make use of it. I can imagine it becoming a really useful post of the SAFE web, based on ‘trust’ of certain devs/vendors who have signed their version for example. cc @jimcollinson

2 Likes