And this absolutely! We’re getting there can’t wait to see some really powerful apps on safe
Nice, good finding, I think you should send a PR to the their repo to fix it
Not trying to defend any base encoding, in fact I think it’s not that important for two rasons:
- any base encoding will be decodable (that’s the point of the CID spec), so if some people use base32, base64, etc. they will all would work fine and can fetch the same content. I think the one that most people use would become the standard.
- remember that both JS and your python implementation won’t be needed (and would go away) if this becomes natively supported by SAFE libs, note the following from our previous discussions:
And now, in an attempt to bring back the main discussion, yesterday we had a talk with @joshuef about considering the type tag to be encoded in the CID itself rather than being the port number of the URL. This would not only solve issues we already face with some browsers or libs not supporting port numbers larger than 65525 (even that the spec doesn’t mention such a limit IIRC), but also in the future if we need additional parameters for our data types we can embedd them in the CID string, i.e. the
<multihash-content-address> of the CID spec, after all the type tag is part of our addressing system and would still make sense to be part of the content-address part. Just bringing it up for discussion and feedback about this alternative.
The only problem is that strictly speaking that wouldn’t be just a hash but a hash+number, which wouldn’t be a valid CID anymore…?..
Could the mutlticodec
<key> be used for the
tag_type in order to leave the hash as is (see here).
This is the CID string:
<cidv1> ::= <multibase-prefix><cid-version><multicodec-content-type><multihash-content-address>
<varint hash function code><varint digest size in bytes><hash function output>
So do you mean to use a multicodec in the
<hash function output> ? or where exactly?
valid CID as in ‘a cid is a multiformat-encoded hash’ and we would be be using it as ‘multiformat encoded hash+additional data’ - or did i misunderstand you?
maybe not as intended by the creators but as we don’t only use the hash for data discovery/recovery and it works well with the cid format that makes great sense imho
Like the following definition, which includes clear definitions of what each of these parts exactly are and encode:
If we change any of that we cannot say our URl contains a CID
if we’d hash all the bytes of xor-name and typetag it would be a valid cid - wouldn’t it? what would speak against this? - as it indeed would be simpler than splitting up a link into cid + type-tag … +it would benefit from the properties of a cid …? (what would be the motivation to exclude something from the encoding algorithm?)
Yes, that’s an option, but just trying to be very critical and strict (not saying it’s correct), hashing xorname+typetag can strictly be considered a SAFE address? probably yes, but in SAFE currently the xorname is the address, that’s what routing only knows, the type tag is something else, still to locate the data…so that’s all I mean
Sorry, ignore me (often best ) I misread the page I linked. I see it is not part of CID but showing the generic case of which the CID
<mc><hash> is an implementation. Dang it!
Okay - since we’re talking now - what’s speaking against adding a small checksum and this systematic to share private data that comes with keys?
Or which other alternatives can you think of (what are their upsides?)
I’m not sure why I don’t see the need for a checksum, if you cannot fetch the content it’s probably invalid, even if the XOR-URL was checked-sum but couldn’t fetch the content what is it that you can get/conclude out of it?
Having decryption keys in the URL to a private content…hold on…isn’t that contradictory? if you do that then decryption keys become just like “an additional encoding” to your URL, as eveyone with such URL can see the content and therefore not private. In fact, some toosl out there already handle sharing data by just providing a difficult-to-guess URL, but if you have the URL you have access, so it’s pseudo private and shared.
I think encryption keys or any key needed to decrypt/fetch a piece of data needs to be out of band, with some other type of sharing mechanism at the application layer (I wouldn’t disagree at all we should provide those utilities though)
Edit: an example of application layer solution is
safe://<XOR-URL>?key=<keys to decrypt>
True - but you need Internet connection for validating this (and need to wait for the response ‘not available’) so only online possible and slower
Well - simple sharing of encrypted data with a group? (for example my holiday pictures with my grandma (who can click a link in a messenger but for sure can’t operate many programs) and family)
safe://<XOR-URL>?key=<keys to decrypt>
how is this different except for the missing upsides that come with cids?
That is something to be handled by the application and not by the routing/resolver, otherwise it’s like putting encryption keys encoded in a domain name in existing internet
And…? What’s the problem with that? The same mutable would have multiple links then that grant different access rights…?
The application can just take the link as it is and hand it to the api => has the rights that it got granted without the programmer needing to examine the link for maybe or not added keys and creating in those cases other mutables with additional key information added to them…? (and since you need to provide a function for this functionality anyway… Where is the difference for you to process 1 longer link vs. Link +key…?)
do you realize that you “requested for comments” here but as i comment on it and give technical/usability arguments + examples i only get an answer after asking repeatedly +only get opinions back but no arguments…?
pps: and the difference between ‘putting encryption keys encoded in a domain name in existing internet’ and my suggestion is that the current DNS system is a public list and therefore it’s 100% different to my suggestion … of course you wouldn’t want to have your keys in plain text in a public list … that’s why you can’t do it currently … but we don’t have this limit on safe and names need to be resolved locally anyway so why would we opt for the clumsy work-around that needs to be used by the clearnet…?
Ppps: maybe there is a hidden argument behind the opinion… ‘out of band’ of what do you mean…? Since sharing a link with additional key info or sharing link including the key info is always the same context/band… Out of band of the name resolution? If someone can read the lib calls for name resolution he can read the lib call for retrieving the piece of data with key as argument too…? So I don’t see additional security by splitting it up…? (while if you split up the cid containing both key and xor name and transfer both parts through 2 different channels none of the 2 by themselves can make sense at all if caught by a 3rd party and only put together again will reveal the name+rights at the same time)
I’m not sure that’s completely accurate (and, forgive me, a little unfair). I think everyone here is happy to head new views on things but that doesn’t mean that we all have to agree on everything.
Equally, as we’ve noted in here a couple of times, we’re not ignoring this… it’s just that we’re not always able to reply straight away (or necessarily fast, especially when you’re providing food for thought). I’m sorry if that’s frustrating, I get it, but sometimes we’re going to have to ask for a little patience.
For example, there’s some solid reasoning for maintaining a z-base32 : readability and avoiding confusion.
You argument re: ‘its not standard’, is not a strong one given a) you can use any hashing function you want with CID, as @bochaco has pointed out, that’s part of the purpose (it seems issues you have are issues with a python implementation… which can be PRd to to help everyone out there). and b) it’s proposed that this hashing function should be available in the Safe Client Libs API. So it’s going to be language independent, which is what we’re considering here. (Though props to you for diving in to a language where we don’t have an impl yet! That is massively appreciated
Regarding links w/encryption keys:
A link + key is not secure. If it’s passed all together, it may as well not be encrypted. (Which is about as efficient as not encypting, but just no publishing the link anywhere… the link should be effectively un-guessable ).
That’s where ‘out of band’ comes in. Which means: passed but not in the same medium. ie: share your encrypted photos XOR link with friends and family, but tell them the password to decrypt it on the phone.
But they could only do this on the decrypting computer (where a decrypt key is entered). Whereas encoding into the link, anyone could do this without needing your decryption keys… as it’s in the link.
Absolutely I know I’m being a bit unfair there… But you need to realize by not responding with just your opinion and just with explaining your motivation but hiding in your company just discussing with yourselves and working out a strategy on how to react on that response from me you don’t exactly motivate to do it again in the future & and don’t give me a chance to react to those thoughts you have when being confronted with my input… It’s not a discussion then but just you taking into account some maybe not thought through and therefore odd looking comments…
Then the user would get a string exactly encoded the way you specify it in the api…? Why do you use cids then if you want to dictate the format anyway? (yes - future proofeness… But if you expose it it would be nice to enable the user to specify encodings if needed… And here again I need to guess how you plan to integrate it and if this would be okay for my world or not if you don’t tell me what your plan is… )
Absolutely - and its not meant to be secure in the context where I would want to share it this way - why would I want to protect it when it is meant to be easy to consume by the people I sent exactly this link (without the need to call them and tell them 35 characters)
I don’t see how this solution solves the issue you described
… Ps: and yes I know what out of band means… But sharing xor name and key on different channels is not the only way to ensure this security… If an app wants to secure the data it could share most of the (address +key containing) cid on one channel and the one missing character +position as number on a second channel… So I would only need to share one character +a number through phone which wouldn’t be a problem at all (in contrast to sharing a secure password) and my data is as secure as data can be…
Providing the possibility for a link to contain the password comes with all the upsides of cids and just comes with additional possibilities… You can share a link without password too if you want and tell someone the password through a second channel as well if you insist…
Okay - last post on this matter here - you suggested that I should open a pull request for the python implementation of base32z because it’s different to the one in JS. (which I didn’t do because I really don’t like this random encoding and would prefer nobody in this world would use and nobody gets motivated to use it… )
If I’d open this pull request and it would be accepted all python programs using base32z which have stored data would immediately loose this data… Because even the self describing cid would in both cases say base32z but you wouldn’t know if old or new generation…
So by offering a base32z resolution in the libs you can say you offer a solution for ‘same link representation independently from the base32z implementation of the programming language’ but how do you react if the JS/rust implementation of base32z changes (because it’s no standard and this obviously happened and could happen again) then suddenly either you implement your ‘not official’ version of base32z and are using an unnamed custom encoding (and working with ‘invalid cids’) or all links on safe sites don’t work anymore…
Ps: but maybe there was a very good reason to not only leave out L, V and 2 but also do exactly this re-ordering/assignment of characters in base32z
base32: a -> y base32z
base32: b -> b base32z
base32: c -> n base32z
base32: d -> d base32z
base32: e -> r base32z
base32: f -> f base32z
base32: g -> g base32z
base32: h -> 8 base32z
base32: i -> e base32z
base32: j -> j base32z
base32: k -> k base32z
base32: l -> m base32z
base32: m -> c base32z
base32: n -> p base32z
base32: o -> q base32z
base32: p -> x base32z
base32: q -> o base32z
base32: r -> t base32z
base32: s -> 1 base32z
base32: t -> u base32z
base32: u -> w base32z
base32: v -> i base32z
base32: w -> s base32z
base32: x -> z base32z
base32: y -> a base32z
base32: z -> 3 base32z
base32: 2 -> 4 base32z
base32: 3 -> 5 base32z
base32: 4 -> h base32z
base32: 5 -> 7 base32z
base32: 6 -> 6 base32z
base32: 7 -> 9 base32z
and nobody will feel the need to change it again ever
I think it’s been checked against more than the JS one, but I’m not sure to be honest. (@bochaco do you know more there?). You may be right, perhaps the JS version is off and we should patch it. Worth checking for sure. Either way, whichever implementation is broken should be fixed, otherwise it could lead to problems for folk .
There is this only spec I can find.: http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt (I’m not sure what the python impl follows?). What would make it ‘standard’? Surely depends on your favourite standards body no? (Much as there seem to be a few different URL standards), interoperability is key.
If an implementation fails to meet the spec, then there’s an issue there… Same as if an implementation fails with base64 too. Similar consequences I’d imagine. But I’m sorry, I don’t see the possibility of bad implementations as an argument for or against any given spec.
All of which is beside the point, since with CID, you can use your favourite encoding.
If there’s a pressing need for many different encodings to be used via SCL, then perhaps the client lib implementation could allow for passing in your own functionality for the hashing function…?
(Or in general: if the client lib API implementation isn’t something you like, then you’re free to not use it and implement a new API for CID creation, eg. As long as it’s using CID then the urls should be decodable, I believe.)
So far though, I personally still don’t see a compelling reason not to use z-base32 right now though.