[RFC Discussion]: XOR-URLs

#54

Like the following definition, which includes clear definitions of what each of these parts exactly are and encode:

If we change any of that we cannot say our URl contains a CID

1 Like
#55

if we’d hash all the bytes of xor-name and typetag it would be a valid cid - wouldn’t it? what would speak against this? - as it indeed would be simpler than splitting up a link into cid + type-tag … +it would benefit from the properties of a cid …? (what would be the motivation to exclude something from the encoding algorithm?)

#56

Yes, that’s an option, but just trying to be very critical and strict (not saying it’s correct), hashing xorname+typetag can strictly be considered a SAFE address? probably yes, but in SAFE currently the xorname is the address, that’s what routing only knows, the type tag is something else, still to locate the data…so that’s all I mean

2 Likes
#57

Sorry, ignore me (often best :stuck_out_tongue_winking_eye:) I misread the page I linked. I see it is not part of CID but showing the generic case of which the CID <mc><hash> is an implementation. Dang it!

1 Like
#58

Okay - since we’re talking now - what’s speaking against adding a small checksum and this systematic to share private data that comes with keys?

Or which other alternatives can you think of (what are their upsides?)

#59

I’m not sure why I don’t see the need for a checksum, if you cannot fetch the content it’s probably invalid, even if the XOR-URL was checked-sum but couldn’t fetch the content what is it that you can get/conclude out of it?

Having decryption keys in the URL to a private content…hold on…isn’t that contradictory? if you do that then decryption keys become just like “an additional encoding” to your URL, as eveyone with such URL can see the content and therefore not private. In fact, some toosl out there already handle sharing data by just providing a difficult-to-guess URL, but if you have the URL you have access, so it’s pseudo private and shared.

I think encryption keys or any key needed to decrypt/fetch a piece of data needs to be out of band, with some other type of sharing mechanism at the application layer (I wouldn’t disagree at all we should provide those utilities though)

Edit: an example of application layer solution is safe://<XOR-URL>?key=<keys to decrypt>

1 Like
#60

True - but you need Internet connection for validating this (and need to wait for the response ‘not available’) so only online possible and slower

Well - simple sharing of encrypted data with a group? (for example my holiday pictures with my grandma (who can click a link in a messenger but for sure can’t operate many programs) and family)

safe://<XOR-URL>?key=<keys to decrypt>

how is this different except for the missing upsides that come with cids?

#61

That is something to be handled by the application and not by the routing/resolver, otherwise it’s like putting encryption keys encoded in a domain name in existing internet

#62

And…? What’s the problem with that? The same mutable would have multiple links then that grant different access rights…?

The application can just take the link as it is and hand it to the api => has the rights that it got granted without the programmer needing to examine the link for maybe or not added keys and creating in those cases other mutables with additional key information added to them…? (and since you need to provide a function for this functionality anyway… Where is the difference for you to process 1 longer link vs. Link +key…?)

edit/ps:

do you realize that you “requested for comments” here but as i comment on it and give technical/usability arguments + examples i only get an answer after asking repeatedly +only get opinions back but no arguments…?

pps: and the difference between ‘putting encryption keys encoded in a domain name in existing internet’ and my suggestion is that the current DNS system is a public list and therefore it’s 100% different to my suggestion … of course you wouldn’t want to have your keys in plain text in a public list … that’s why you can’t do it currently … but we don’t have this limit on safe and names need to be resolved locally anyway so why would we opt for the clumsy work-around that needs to be used by the clearnet…?

Ppps: maybe there is a hidden argument behind the opinion… ‘out of band’ of what do you mean…? Since sharing a link with additional key info or sharing link including the key info is always the same context/band… Out of band of the name resolution? If someone can read the lib calls for name resolution he can read the lib call for retrieving the piece of data with key as argument too…? So I don’t see additional security by splitting it up…? (while if you split up the cid containing both key and xor name and transfer both parts through 2 different channels none of the 2 by themselves can make sense at all if caught by a 3rd party and only put together again will reveal the name+rights at the same time)

#63

I’m not sure that’s completely accurate (and, forgive me, a little unfair). I think everyone here is happy to head new views on things but that doesn’t mean that we all have to agree on everything.

Equally, as we’ve noted in here a couple of times, we’re not ignoring this… it’s just that we’re not always able to reply straight away (or necessarily fast, especially when you’re providing food for thought). I’m sorry if that’s frustrating, I get it, but sometimes we’re going to have to ask for a little patience.


For example, there’s some solid reasoning for maintaining a z-base32 : readability and avoiding confusion.

You argument re: ‘its not standard’, is not a strong one given a) you can use any hashing function you want with CID, as @bochaco has pointed out, that’s part of the purpose (it seems issues you have are issues with a python implementation… which can be PRd to to help everyone out there). and b) it’s proposed that this hashing function should be available in the Safe Client Libs API. So it’s going to be language independent, which is what we’re considering here. (Though props to you for diving in to a language where we don’t have an impl yet! That is massively appreciated :+1: :bowing_man:

#64

Regarding links w/encryption keys:

A link + key is not secure. If it’s passed all together, it may as well not be encrypted. (Which is about as efficient as not encypting, but just no publishing the link anywhere… the link should be effectively un-guessable ).

That’s where ‘out of band’ comes in. Which means: passed but not in the same medium. ie: share your encrypted photos XOR link with friends and family, but tell them the password to decrypt it on the phone.

But they could only do this on the decrypting computer (where a decrypt key is entered). Whereas encoding into the link, anyone could do this without needing your decryption keys… as it’s in the link.

#65

Absolutely I know I’m being a bit unfair there… But you need to realize by not responding with just your opinion and just with explaining your motivation but hiding in your company just discussing with yourselves and working out a strategy on how to react on that response from me you don’t exactly motivate to do it again in the future & and don’t give me a chance to react to those thoughts you have when being confronted with my input… It’s not a discussion then but just you taking into account some maybe not thought through and therefore odd looking comments…

Then the user would get a string exactly encoded the way you specify it in the api…? Why do you use cids then if you want to dictate the format anyway? (yes - future proofeness… But if you expose it it would be nice to enable the user to specify encodings if needed… And here again I need to guess how you plan to integrate it and if this would be okay for my world or not if you don’t tell me what your plan is… )

Absolutely - and its not meant to be secure in the context where I would want to share it this way - why would I want to protect it when it is meant to be easy to consume by the people I sent exactly this link (without the need to call them and tell them 35 characters)

I don’t see how this solution solves the issue you described

… Ps: and yes I know what out of band means… But sharing xor name and key on different channels is not the only way to ensure this security… If an app wants to secure the data it could share most of the (address +key containing) cid on one channel and the one missing character +position as number on a second channel… So I would only need to share one character +a number through phone which wouldn’t be a problem at all (in contrast to sharing a secure password) and my data is as secure as data can be…

Providing the possibility for a link to contain the password comes with all the upsides of cids and just comes with additional possibilities… You can share a link without password too if you want and tell someone the password through a second channel as well if you insist…

#66

Okay - last post on this matter here - you suggested that I should open a pull request for the python implementation of base32z because it’s different to the one in JS. (which I didn’t do because I really don’t like this random encoding and would prefer nobody in this world would use and nobody gets motivated to use it… )

If I’d open this pull request and it would be accepted all python programs using base32z which have stored data would immediately loose this data… Because even the self describing cid would in both cases say base32z but you wouldn’t know if old or new generation…

So by offering a base32z resolution in the libs you can say you offer a solution for ‘same link representation independently from the base32z implementation of the programming language’ but how do you react if the JS/rust implementation of base32z changes (because it’s no standard and this obviously happened and could happen again) then suddenly either you implement your ‘not official’ version of base32z and are using an unnamed custom encoding (and working with ‘invalid cids’) or all links on safe sites don’t work anymore…

Ps: but maybe there was a very good reason to not only leave out L, V and 2 but also do exactly this re-ordering/assignment of characters in base32z

base32: a -> y base32z
base32: b -> b base32z
base32: c -> n base32z
base32: d -> d base32z
base32: e -> r base32z
base32: f -> f base32z
base32: g -> g base32z
base32: h -> 8 base32z
base32: i -> e base32z
base32: j -> j base32z
base32: k -> k base32z
base32: l -> m base32z
base32: m -> c base32z
base32: n -> p base32z
base32: o -> q base32z
base32: p -> x base32z
base32: q -> o base32z
base32: r -> t base32z
base32: s -> 1 base32z
base32: t -> u base32z
base32: u -> w base32z
base32: v -> i base32z
base32: w -> s base32z
base32: x -> z base32z
base32: y -> a base32z
base32: z -> 3 base32z
base32: 2 -> 4 base32z
base32: 3 -> 5 base32z
base32: 4 -> h base32z
base32: 5 -> 7 base32z
base32: 6 -> 6 base32z
base32: 7 -> 9 base32z

and nobody will feel the need to change it again ever :slightly_smiling_face:

#67

I think it’s been checked against more than the JS one, but I’m not sure to be honest. (@bochaco do you know more there?). You may be right, perhaps the JS version is off and we should patch it. Worth checking for sure. Either way, whichever implementation is broken should be fixed, otherwise it could lead to problems for folk :+1: .

There is this only spec I can find.: http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt (I’m not sure what the python impl follows?). What would make it ‘standard’? Surely depends on your favourite standards body no? (Much as there seem to be a few different URL standards), interoperability is key.

If an implementation fails to meet the spec, then there’s an issue there… Same as if an implementation fails with base64 too. Similar consequences I’d imagine. But I’m sorry, I don’t see the possibility of bad implementations as an argument for or against any given spec.


All of which is beside the point, since with CID, you can use your favourite encoding.

If there’s a pressing need for many different encodings to be used via SCL, then perhaps the client lib implementation could allow for passing in your own functionality for the hashing function…?

(Or in general: if the client lib API implementation isn’t something you like, then you’re free to not use it and implement a new API for CID creation, eg. As long as it’s using CID then the urls should be decodable, I believe.)

So far though, I personally still don’t see a compelling reason not to use z-base32 right now though.

#68

Do I understand that the CID encoding scheme leads to multiple possible URIs for the same resource?

If so, that seems undesirable and I think might go against the ‘standard web’. I think I posted a link to this address ages ago in a related discussion. I think we should follow web standards and conventions unless there’s a good reason not to, so need to consider this and examine any implications - if the answer to my question is yes!

1 Like
#69

No, I haven’t digged into the implementation of it at all, for sure we’ll need to double check this with some automated tests whener this becomes the official impl in SCL (I mean official in the sense of non-PoC), e.g. compare some hard-coded XORnames with the output encoded XOR-URL and make sure it’s the expected. As you say we can send PRs to whichever impl we find bugs in, this is where open source shows its powers I believe.

That’s correct, but I don’t see a problem with that (my humble point of view), you also have more than one URL referencing the same resource if you create public name URLs. I can admit at the beginning it also sounded bad to me, but if you are using a URL perhaps as part of a contract which references an ImmD, then that will be ok even if there are alisases or other URLs to the same resource.


On a side note, I’m trying to understand a bit more about several other aspects of the network, like the new Appendable data, to see if this is impacted somehow, or perhaps the safecoin and how public keys for transfering safecoins could be also used and/or impact this RFC.

#70

That’s the one and only reason of cid existence - to be able to represent the same sequence of bytes by the encoding of your liking (hex, base32, base32z, base 2, base, 64,…) it adds information about the data that follows in front of it and then presents the string - no error detection, no other benefits…(that’s why I said it doesn’t make sense to exclude any data - like the type tag - from the cid because that defeats the only reason one would use them…)

2 Likes
#71

Which you wouldn’t even need to think of if you used base16 (hex) or base32…

Well - being widely used and clearly +reasonable (and therefore unlikely to change) specified - for example hex is widely used and everyone knows what number is represented by which character - (and easy to read - easy to type with numpad+left hand at abcdef) … So a standard… base32 takes a-z+2-7 in the sequence you’d expect (alphabetic +small to large)… base32z takes alphabet (without L and V) + 3-9 and shuffles them around …

Okay - yes - unambiguously defined - that meets my requirement…

While this is still super subjective and random imho (and we are talking about more or less random bytes… There is no such thing as ‘more commonly occurring characters’…)

#72

Not entirely true.

It allows us to string together a variety of information in a standardised fashion, which can be useful. (At least IPFS have found it to be useful). Providing a means to change encodings/hash functions (giving us a means to improve URLs in the future) as you note, and provide more info to boot (mimetypes eg).

You’re right though, maybe it doesn’t go far enough? Typetag certainly won’t work in the port number, eg.

I think I’ve not really followed your error detection point before. But aye, that could be an interesting property to have.

#73

Unless there was a problem in an implementation…