That’s the one and only reason of cid existence - to be able to represent the same sequence of bytes by the encoding of your liking (hex, base32, base32z, base 2, base, 64,…) it adds information about the data that follows in front of it and then presents the string - no error detection, no other benefits…(that’s why I said it doesn’t make sense to exclude any data - like the type tag - from the cid because that defeats the only reason one would use them…)
Which you wouldn’t even need to think of if you used base16 (hex) or base32…
Well - being widely used and clearly +reasonable (and therefore unlikely to change) specified - for example hex is widely used and everyone knows what number is represented by which character - (and easy to read - easy to type with numpad+left hand at abcdef) … So a standard… base32 takes a-z+2-7 in the sequence you’d expect (alphabetic +small to large)… base32z takes alphabet (without L and V) + 3-9 and shuffles them around …
Okay - yes - unambiguously defined - that meets my requirement…
While this is still super subjective and random imho (and we are talking about more or less random bytes… There is no such thing as ‘more commonly occurring characters’…)
Not entirely true.
It allows us to string together a variety of information in a standardised fashion, which can be useful. (At least IPFS have found it to be useful). Providing a means to change encodings/hash functions (giving us a means to improve URLs in the future) as you note, and provide more info to boot (mimetypes eg).
You’re right though, maybe it doesn’t go far enough? Typetag certainly won’t work in the port number, eg.
I think I’ve not really followed your error detection point before. But aye, that could be an interesting property to have.
Unless there was a problem in an implementation…
True - while you could argue using those types as encoding function +adding the info how you encodes the data pretty much has the same result as just adding this encoding function as identifier for the mimetype as bytes of the payload… So while you are of course correct I don’t think the message of my statement is wrong… (and if you leave out the encoding option and just go with one - like with mutable data - or probably safecoin - it’s just encoded bytes as I said)
Yeah - right - sorry but thinking that even a small child would implement hex wrong is kind of ridiculous - while implementing base32z is obviously a very different beast
On the one hand, you argue for standardisation of one thing (base32), on another, you argue to create our own version of the well tested CID system.
I understand that bugs are frustrating. But one bad implementation doesn’t negate the benefits of z-base32 though, IMO. Especially considering all that’s outlines above (re: API to avoid need for going manual, human readability etc).
Equally, I don’t see a fleshed out benefit to doing away with CIDs which give us the flexibility to update the XOR scheme as we go, in a standardised, tested, and community supported fashion. If we have issues with, or additions we want to make. We can put them forwards, and improve this standardised system for everyone. From @bochaco’s PR they seem very open to this.
Where did I do that? I suggested to just include all relevant bytes in a base32 encoded cid’
(or just use the bytes and encode them hex - my first suggestion and still my favourite one…)
Hey - but just go with what you decided for - people will use it or complain/use a different way to share addresses - no reason to waste many hours just because we disagree on importance of some aspects - we’ll just see and react to what happens
(Thanks. I couldn’t actually find where I’d read that.)
But aye, I think there’s certainly merit to getting typetag’s in there. And error correction could be useful too (did you have an example of a URL structure with that? ).
That’s me for the night now. Gotta think about some .
Hmm - actually I was just thinking about a simple 16 bit checksum for error detection - no error correction (because that would be at least 32 bit for one xor name one bit error correction without any additional data… (from memory… Details on block codes can be found here: https://en.m.wikipedia.org/wiki/Block_code - but all that comes with additional complexity - not sure if that is worth the hassle - just giving immediate feedback on error occurance without network calls might be good enough I would guess…)And a wrong character is most probably more than one wrong bit but a couple of them… )
Part of the confusion or disagreement I think has to do with your suggestion @riddim to encode several other things in the CID, like checksum and typetag, and you claim that would still be a CID, that’s what I believe is incorrect, the CID spec is very specific to what goes in each part of it:
So if you let’s say put:
concat(<xor addr>, <type tag>, <checksum>) as the string for the
<multihash-content-address> part, then what you have, strictly speaking, is not a CID anymore, as simple as that, it’s another type of content id we define using the multiformats and multicodecs, this is what I think @joshuef is meaning by “create our own version of the well tested CID system”.
Am I against creating our CID so we can incorporate type tag and possibly checksum (which makes sense to me only if we can extend XOR-URLs to use them for safecoin wallets) ? , no, not necessarily, yes I was trying to avoid it if possible to not come up with my own (non-standard) spec of encoding. But if we have to, and we do it, then:
- We shouldn’t claim we have a CID anymore but our own-CID spec and therefore some of the implementations already available need to be forked and adapted to our own-CID spec (which is not a big deal since we will do it anyway by embedding in the SCL API)
- We may want to actually propose that to multiformats project as an enhancement to the CID, and work with them on having that to be part of the CID spec maintained in that project.
- We would be contradicting (no doubt) any argument where we say we don’t like baseX because it’s not “standard”, so we would have to leave that type of arguments out of the equation
I never tried to study this or understand how studies around this were made (I guess you just measure the average occurrance of letters in words of a dictionary), but by only looking at the worn off keys in my own keyboard I have to disagree or be at least skeptical about this statement. In any case, he is saying that since those are most used by humans then it’s easier to read or write than other less used, who knows, but I do like those replacements like removing the
L, etc. to avoid confusions
We are talking about xor names (+stuff) … If they are not roughly equally distributed the network is not in balance…
That’s correct @riddim , but I guess you see that the proposal that guy is making is that since those letters are more used in the human vocabulary (assuming english) then potentially they are easy to read and write when used in any string you are encoding with them than others, regardless if you are encoding xornames.
Since we touched on standards a bit, perhaps it’s a nice chance for me to share some thoughts I was having about it in the last few months, not that important for this discussion maybe, but why not share it here, this is all my own personal especulation and how I perceive it.
I think standards, in many cases in the past have been designed and worked out by big organisations which were able to not only invest/spend the money for having people in many long meetings where those standards were defined and documented, but also which were monopolising (well not exactly of course as they’d be a group of organisations) in some way many fields with their products; so if you were a small company or an individual you simply didn’t have the chance to participate in there, and you don’t have much choice but just follow those standars with no vote if you wanna sell anything you produce.
Nowadays, and I think more as we move forward with decentralising several things, I believe small companies or just individuals have more chances to compete with these companies and organisations since they can reach the end users directly, and when that happens they are in a good position to start pushing for any new “standard” that perhaps wasn’t available or defined before. Other projects, companies, individuals may follow that new spec almost immediatelly, to be able to participate in a potentially new wave of sucessfull type of application/service/product, and they won’t wait for any commitee or organisation to gather and agree on the new “standard” they can use, they will just move forward. So I guess I see a decentralisation in this regard as well, who defines a standard? …just some random thoughts
[ignoring how questionable this thesis by itself is] while they for sure won’t appear more often when encoding xor names (or all compressed or binary data)
So the one and only advantage of base32z is leaving out L (using 1 instead) , V and 2 (0 and 1 are not part of base32 as well
Ah but there is enough left anyway… nm ec nh vvw rv S5 dq hk ft 1f 1t qg pa yx… (ofc always depending on your hand writing - with printed it shouldn’t be a problem either way… )
If you really would want to prevent character mixups you’d go with hex
edit/PS - about standards and encodings/readability
Ps: @bochaco I do agree with what you said about standards - and without someone starting with it no new standards would appear… But I can only support standards that ‘make sense’ imo… Base32z to me just looks like a random definition by one guy who had an idea… The real problem with bitcoin keys for example is that they use upper and lower case and oO0(Q) Il1 look very similar depending on the font you choose… The rest above that is more for hand written stuff and base32 / base32z seem to solve the super problematic characters I mentioned there similar good and when it comes to hand writing there would only be marginally different results in readability + the real solution that would make a difference would be reducing the character set to 16…
… The difference in readability between base32 and base32z looks to me more like a philosophical question and therefore I would definitely go with the wider spread one … (the base32z description is missing any proof for what is claimed imo …)
And that’s too why I really am bored of this topic here by now tbh … At the end of the day I don’t care too much which encoding you choose - I will just work with what I get (and implement base58+base32(+safeBrowser)+hex for python - doesn’t make sense to loop such a little task through the api and needing to take care of forwarded errors returned from rust…) and if many others share my opinion you will get asked the same questions again and again and it’s you who will need to defend your decision again and again I just want one clear definition that is consistent in itself and if I get that it’s okay for my world
I said what I had to say about this topic and if you consider implementing block codes for data validation/correction that is a way simpler task than it looks when reading the Wikipedia article (really pretty simple) and I can support you in implementing them too if you want - but I personally wouldn’t aim for too much because it becomes a pretty big overhead for little benefit if you want error correction…
I always have a hard time with 1,lowercase l and capital I. None of these please!
… yes you really need the right font for them to be easily identifiable…
But that’s never a problem with encodings to base 32 - it’s either all uppercase or all lowercase
(base32 using l+i / L+I and base32z using 1+i / 1+I… - none uses all 3 of l, i and 1)
IMO Base32z would be preferable to read then.
I’m not sure it matters. When do you think reading these will be useful or necessary? I don’t think I’ve ever typed a bitcoin address for example, or a public key etc.
And the American 1 doesn’t have the little hat and looks perfectly like a I again - it’s probably all a bit a matter of taste and readability depends hugely on the context and cannot really be discussed without specifying it precisely …
Folks, as you probably saw on some of the recent dev updates, we went for trying out our own encoding format for the XOR-URL string, as an alternative to using multiformats CID, at least to begin with and see how that evolves and how it works for us.
We therefore have a PR to update the RFC which is ready for review and comments: https://github.com/maidsafe/rfcs/pull/337
The changes as substantial so I’d advice to simply read the new version rather than trying to see the diffs: https://github.com/maidsafe/rfcs/blob/c68925b1d5760b5b1a1b7159b295af7709851ae8/text/0053-xor-urls/0053-xor-urls.md
The encoding format we designed is now implemented on CLI and it’s in summary like this:
- 1 byte for XOR-URL encoding format version. This is version 1
- 2 bytes for content type, e.g. Wallet, FilesContainer, MIME types, etc.
- 1 byte for SAFE native data type, e.g. PublishedImmutableData, PubseqAppendOnlyData, etc.
- 32 bytes for the content’s XoR name
- 0 to 8 bytes for type tag value
- The string is then encoded with z-base32 encoding
v=<content version>query arg for the specifying a specific version of the content, or latest if this is omitted from the URL.
I’ll be posting this also on main forum, so feel free to discuss/comment either here or in the other forum.