[RFC Discussion]: XOR-URLs

#63

I’m not sure that’s completely accurate (and, forgive me, a little unfair). I think everyone here is happy to head new views on things but that doesn’t mean that we all have to agree on everything.

Equally, as we’ve noted in here a couple of times, we’re not ignoring this… it’s just that we’re not always able to reply straight away (or necessarily fast, especially when you’re providing food for thought). I’m sorry if that’s frustrating, I get it, but sometimes we’re going to have to ask for a little patience.


For example, there’s some solid reasoning for maintaining a z-base32 : readability and avoiding confusion.

You argument re: ‘its not standard’, is not a strong one given a) you can use any hashing function you want with CID, as @bochaco has pointed out, that’s part of the purpose (it seems issues you have are issues with a python implementation… which can be PRd to to help everyone out there). and b) it’s proposed that this hashing function should be available in the Safe Client Libs API. So it’s going to be language independent, which is what we’re considering here. (Though props to you for diving in to a language where we don’t have an impl yet! That is massively appreciated :+1: :bowing_man:

#64

Regarding links w/encryption keys:

A link + key is not secure. If it’s passed all together, it may as well not be encrypted. (Which is about as efficient as not encypting, but just no publishing the link anywhere… the link should be effectively un-guessable ).

That’s where ‘out of band’ comes in. Which means: passed but not in the same medium. ie: share your encrypted photos XOR link with friends and family, but tell them the password to decrypt it on the phone.

But they could only do this on the decrypting computer (where a decrypt key is entered). Whereas encoding into the link, anyone could do this without needing your decryption keys… as it’s in the link.

#65

Absolutely I know I’m being a bit unfair there… But you need to realize by not responding with just your opinion and just with explaining your motivation but hiding in your company just discussing with yourselves and working out a strategy on how to react on that response from me you don’t exactly motivate to do it again in the future & and don’t give me a chance to react to those thoughts you have when being confronted with my input… It’s not a discussion then but just you taking into account some maybe not thought through and therefore odd looking comments…

Then the user would get a string exactly encoded the way you specify it in the api…? Why do you use cids then if you want to dictate the format anyway? (yes - future proofeness… But if you expose it it would be nice to enable the user to specify encodings if needed… And here again I need to guess how you plan to integrate it and if this would be okay for my world or not if you don’t tell me what your plan is… )

Absolutely - and its not meant to be secure in the context where I would want to share it this way - why would I want to protect it when it is meant to be easy to consume by the people I sent exactly this link (without the need to call them and tell them 35 characters)

I don’t see how this solution solves the issue you described

… Ps: and yes I know what out of band means… But sharing xor name and key on different channels is not the only way to ensure this security… If an app wants to secure the data it could share most of the (address +key containing) cid on one channel and the one missing character +position as number on a second channel… So I would only need to share one character +a number through phone which wouldn’t be a problem at all (in contrast to sharing a secure password) and my data is as secure as data can be…

Providing the possibility for a link to contain the password comes with all the upsides of cids and just comes with additional possibilities… You can share a link without password too if you want and tell someone the password through a second channel as well if you insist…

#66

Okay - last post on this matter here - you suggested that I should open a pull request for the python implementation of base32z because it’s different to the one in JS. (which I didn’t do because I really don’t like this random encoding and would prefer nobody in this world would use and nobody gets motivated to use it… )

If I’d open this pull request and it would be accepted all python programs using base32z which have stored data would immediately loose this data… Because even the self describing cid would in both cases say base32z but you wouldn’t know if old or new generation…

So by offering a base32z resolution in the libs you can say you offer a solution for ‘same link representation independently from the base32z implementation of the programming language’ but how do you react if the JS/rust implementation of base32z changes (because it’s no standard and this obviously happened and could happen again) then suddenly either you implement your ‘not official’ version of base32z and are using an unnamed custom encoding (and working with ‘invalid cids’) or all links on safe sites don’t work anymore…

Ps: but maybe there was a very good reason to not only leave out L, V and 2 but also do exactly this re-ordering/assignment of characters in base32z

base32: a -> y base32z
base32: b -> b base32z
base32: c -> n base32z
base32: d -> d base32z
base32: e -> r base32z
base32: f -> f base32z
base32: g -> g base32z
base32: h -> 8 base32z
base32: i -> e base32z
base32: j -> j base32z
base32: k -> k base32z
base32: l -> m base32z
base32: m -> c base32z
base32: n -> p base32z
base32: o -> q base32z
base32: p -> x base32z
base32: q -> o base32z
base32: r -> t base32z
base32: s -> 1 base32z
base32: t -> u base32z
base32: u -> w base32z
base32: v -> i base32z
base32: w -> s base32z
base32: x -> z base32z
base32: y -> a base32z
base32: z -> 3 base32z
base32: 2 -> 4 base32z
base32: 3 -> 5 base32z
base32: 4 -> h base32z
base32: 5 -> 7 base32z
base32: 6 -> 6 base32z
base32: 7 -> 9 base32z

and nobody will feel the need to change it again ever :slightly_smiling_face:

#67

I think it’s been checked against more than the JS one, but I’m not sure to be honest. (@bochaco do you know more there?). You may be right, perhaps the JS version is off and we should patch it. Worth checking for sure. Either way, whichever implementation is broken should be fixed, otherwise it could lead to problems for folk :+1: .

There is this only spec I can find.: http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt (I’m not sure what the python impl follows?). What would make it ‘standard’? Surely depends on your favourite standards body no? (Much as there seem to be a few different URL standards), interoperability is key.

If an implementation fails to meet the spec, then there’s an issue there… Same as if an implementation fails with base64 too. Similar consequences I’d imagine. But I’m sorry, I don’t see the possibility of bad implementations as an argument for or against any given spec.


All of which is beside the point, since with CID, you can use your favourite encoding.

If there’s a pressing need for many different encodings to be used via SCL, then perhaps the client lib implementation could allow for passing in your own functionality for the hashing function…?

(Or in general: if the client lib API implementation isn’t something you like, then you’re free to not use it and implement a new API for CID creation, eg. As long as it’s using CID then the urls should be decodable, I believe.)

So far though, I personally still don’t see a compelling reason not to use z-base32 right now though.

#68

Do I understand that the CID encoding scheme leads to multiple possible URIs for the same resource?

If so, that seems undesirable and I think might go against the ‘standard web’. I think I posted a link to this address ages ago in a related discussion. I think we should follow web standards and conventions unless there’s a good reason not to, so need to consider this and examine any implications - if the answer to my question is yes!

1 Like
#69

No, I haven’t digged into the implementation of it at all, for sure we’ll need to double check this with some automated tests whener this becomes the official impl in SCL (I mean official in the sense of non-PoC), e.g. compare some hard-coded XORnames with the output encoded XOR-URL and make sure it’s the expected. As you say we can send PRs to whichever impl we find bugs in, this is where open source shows its powers I believe.

That’s correct, but I don’t see a problem with that (my humble point of view), you also have more than one URL referencing the same resource if you create public name URLs. I can admit at the beginning it also sounded bad to me, but if you are using a URL perhaps as part of a contract which references an ImmD, then that will be ok even if there are alisases or other URLs to the same resource.


On a side note, I’m trying to understand a bit more about several other aspects of the network, like the new Appendable data, to see if this is impacted somehow, or perhaps the safecoin and how public keys for transfering safecoins could be also used and/or impact this RFC.

#70

That’s the one and only reason of cid existence - to be able to represent the same sequence of bytes by the encoding of your liking (hex, base32, base32z, base 2, base, 64,…) it adds information about the data that follows in front of it and then presents the string - no error detection, no other benefits…(that’s why I said it doesn’t make sense to exclude any data - like the type tag - from the cid because that defeats the only reason one would use them…)

2 Likes
#71

Which you wouldn’t even need to think of if you used base16 (hex) or base32…

Well - being widely used and clearly +reasonable (and therefore unlikely to change) specified - for example hex is widely used and everyone knows what number is represented by which character - (and easy to read - easy to type with numpad+left hand at abcdef) … So a standard… base32 takes a-z+2-7 in the sequence you’d expect (alphabetic +small to large)… base32z takes alphabet (without L and V) + 3-9 and shuffles them around …

Okay - yes - unambiguously defined - that meets my requirement…

While this is still super subjective and random imho (and we are talking about more or less random bytes… There is no such thing as ‘more commonly occurring characters’…)

#72

Not entirely true.

It allows us to string together a variety of information in a standardised fashion, which can be useful. (At least IPFS have found it to be useful). Providing a means to change encodings/hash functions (giving us a means to improve URLs in the future) as you note, and provide more info to boot (mimetypes eg).

You’re right though, maybe it doesn’t go far enough? Typetag certainly won’t work in the port number, eg.

I think I’ve not really followed your error detection point before. But aye, that could be an interesting property to have.

#73

Unless there was a problem in an implementation…

#74

True - while you could argue using those types as encoding function +adding the info how you encodes the data pretty much has the same result as just adding this encoding function as identifier for the mimetype as bytes of the payload… So while you are of course correct I don’t think the message of my statement is wrong… (and if you leave out the encoding option and just go with one - like with mutable data - or probably safecoin - it’s just encoded bytes as I said)

Yeah - right - sorry but thinking that even a small child would implement hex wrong is kind of ridiculous - while implementing base32z is obviously a very different beast

#75

On the one hand, you argue for standardisation of one thing (base32), on another, you argue to create our own version of the well tested CID system.

I understand that bugs are frustrating. But one bad implementation doesn’t negate the benefits of z-base32 though, IMO. Especially considering all that’s outlines above (re: API to avoid need for going manual, human readability etc).

Equally, I don’t see a fleshed out benefit to doing away with CIDs which give us the flexibility to update the XOR scheme as we go, in a standardised, tested, and community supported fashion. If we have issues with, or additions we want to make. We can put them forwards, and improve this standardised system for everyone. From @bochaco’s PR they seem very open to this.

#76

Where did I do that? I suggested to just include all relevant bytes in a base32 encoded cid’

(or just use the bytes and encode them hex - my first suggestion and still my favourite one…)

Ps:
Hey - but just go with what you decided for - people will use it or complain/use a different way to share addresses - no reason to waste many hours just because we disagree on importance of some aspects - we’ll just see and react to what happens

#77

:+1: (Thanks. I couldn’t actually find where I’d read that.)


But aye, I think there’s certainly merit to getting typetag’s in there. And error correction could be useful too :+1: (did you have an example of a URL structure with that? ).


That’s me for the night now. Gotta think about :taco: some :bowing_man:.

1 Like
#78

Hmm - actually I was just thinking about a simple 16 bit checksum for error detection - no error correction (because that would be at least 32 bit for one xor name one bit error correction without any additional data… (from memory… Details on block codes can be found here: https://en.m.wikipedia.org/wiki/Block_code - but all that comes with additional complexity - not sure if that is worth the hassle - just giving immediate feedback on error occurance without network calls might be good enough I would guess…)And a wrong character is most probably more than one wrong bit but a couple of them… )

#79

Part of the confusion or disagreement I think has to do with your suggestion @riddim to encode several other things in the CID, like checksum and typetag, and you claim that would still be a CID, that’s what I believe is incorrect, the CID spec is very specific to what goes in each part of it:

So if you let’s say put: concat(<xor addr>, <type tag>, <checksum>) as the string for the <multihash-content-address> part, then what you have, strictly speaking, is not a CID anymore, as simple as that, it’s another type of content id we define using the multiformats and multicodecs, this is what I think @joshuef is meaning by “create our own version of the well tested CID system”.

Am I against creating our CID so we can incorporate type tag and possibly checksum (which makes sense to me only if we can extend XOR-URLs to use them for safecoin wallets) ? , no, not necessarily, yes I was trying to avoid it if possible to not come up with my own (non-standard) spec of encoding. But if we have to, and we do it, then:

  • We shouldn’t claim we have a CID anymore but our own-CID spec and therefore some of the implementations already available need to be forked and adapted to our own-CID spec (which is not a big deal since we will do it anyway by embedding in the SCL API)
  • We may want to actually propose that to multiformats project as an enhancement to the CID, and work with them on having that to be part of the CID spec maintained in that project.
  • We would be contradicting (no doubt) any argument where we say we don’t like baseX because it’s not “standard”, so we would have to leave that type of arguments out of the equation
#80

I never tried to study this or understand how studies around this were made (I guess you just measure the average occurrance of letters in words of a dictionary), but by only looking at the worn off keys in my own keyboard I have to disagree or be at least skeptical about this statement. In any case, he is saying that since those are most used by humans then it’s easier to read or write than other less used, who knows, but I do like those replacements like removing the 0, L, etc. to avoid confusions

#81

We are talking about xor names (+stuff) … If they are not roughly equally distributed the network is not in balance…

2 Likes
#82

That’s correct @riddim , but I guess you see that the proposal that guy is making is that since those letters are more used in the human vocabulary (assuming english) then potentially they are easy to read and write when used in any string you are encoding with them than others, regardless if you are encoding xornames.


Since we touched on standards a bit, perhaps it’s a nice chance for me to share some thoughts I was having about it in the last few months, not that important for this discussion maybe, but why not share it here, this is all my own personal especulation and how I perceive it.

I think standards, in many cases in the past have been designed and worked out by big organisations which were able to not only invest/spend the money for having people in many long meetings where those standards were defined and documented, but also which were monopolising (well not exactly of course as they’d be a group of organisations) in some way many fields with their products; so if you were a small company or an individual you simply didn’t have the chance to participate in there, and you don’t have much choice but just follow those standars with no vote if you wanna sell anything you produce.

Nowadays, and I think more as we move forward with decentralising several things, I believe small companies or just individuals have more chances to compete with these companies and organisations since they can reach the end users directly, and when that happens they are in a good position to start pushing for any new “standard” that perhaps wasn’t available or defined before. Other projects, companies, individuals may follow that new spec almost immediatelly, to be able to participate in a potentially new wave of sucessfull type of application/service/product, and they won’t wait for any commitee or organisation to gather and agree on the new “standard” they can use, they will just move forward. So I guess I see a decentralisation in this regard as well, who defines a standard? …just some random thoughts :slight_smile:

3 Likes