Hey @riddim, I don’t think everyone is really in love with CID and/or completely sold on it, this was just a proposal made and one way of achieving the goal. I think there are valid points in your critic which are not being ignored, if that’s what you feel. I’m personally waiting for others to also chime in here with their opinions and perspectives, I’m aware some other people are trying to catch up with these discussions. I’m trying to explain what were the decisions and the reasoning behind what has been done, but it’s good we are looking at them and reviewing them from different angles.
Aye, @riddim don’t be disheartened please.
It’s awesome you’re certainly raising your concerns. This is how we progress on this front
I’d been of the opinion that flexibility and future proofing are worthwhile additions. But you raise some good points. I need to digest and re read some of your posts above before I can opine something more though
But aye, please don’t mistake a lack of immediate response as a lack of interest in your posts/points.
All good just mega busy
tbh I’m not 100% against cid… Probably the point where I got a bit upset was when I realised that cid is just a way of representing bytes and that you chose a non standard encoding for doing so as default behaviour … I don’t care that much about a couple of bytes more or less… But I would rather discuss if it wouldn’t make sense to include a check byte for offline typo recognition and to include the type tag just as bytes instead of the :typeTag thing that looks a bit pointless to me tbh…
Ps: and since I found out that indeed I can just hex the name and set something in front of it I had the impression you don’t know what cid precisely is and think it’s a careless package for all data (but it’s not - it’s just one way to represent it - actually a smart way because you first say how the following data will be structured - but it is really no more then just a representation of bytes… (that can even be base 8 or base 2 as pure zeros and one’s… If you exclude the type tag from the cid you make it complicated to transfer an xor address in such an environment instead of simple… Same goes with an environment where someone in the future wants to use pure base base64 or base 128 data… )
Note: base64 or base128 would for example be possible if someone used 6 or 7 parallel data transfer channels (just going away from the visual data representation layer and looking at the technical level)
So indeed there might be future use cases with different encodings where cids then could be natively at home and one wouldn’t need to decode and re-encode the data by hand (if the type tag is included in the xor address cid - otherwise you need to split it up again and treat the type tag different from the rest and need to re-encode it)
Finally had the chance to scroll shortly through the new primer…
That’s something I didn’t pay attention to earlier…
Plus while making Screenshots for this here I got a bit confused…
Immutable data Name: 32byte array
I thought for mutable data it would be 32byte too (24 byte name +8byte type tag) did this change…?
Anyway - but to sum it up the current proposal is to do:
- cid(name):typeTag for mutable public
- cid(name+mime type) for immutable public
- David said 32byte array for safecoin (which is just data)
- format unclear for private data
… Looks like a bunch of different formats arising…
I know I sound a bit like a broken record now - but I would vote for cid(relevantBytes+checksum) for simply everything…
You can just return all three of base58, base32 and base16 through the api and people themselves can decide which one they want to use (the browser should be able to decode at least base32 and base16)
So it would be
- cid(name+mimeType+checksum) for immutables
- cid(name+checksum) for safecoin
- cid(name+typeTag+checksum) for public mutable
- cid(name+keys+typeTag+checksum) for private mutable
… Still many different lengths of data but a bit more homogeneous…
[i cannot post more than 3 times in a row - so here an EDIT]
… Okay more on encoding and the differences between base32 and base32z… Just that everyone knows what we are talking about
I had a look at the JS implementation of those 2 and can make a base32 string from a base32z-js-string now…
So this here
is as base32
As we can see the base32z leaves out l, v and 2
And re-sorts all other characters
Hey @riddim, thanks for your valuable comments! I’m still reading through this discussion and will chime in soon. For now I’ve got just a quick remark:
we never had 24 byte XorNames – for Mutable Data it’s 32 byte
XorName + an 8 byte (64 bit) type tag:
Then I need to check this in pySafe! Thx!
ps: all good - don’t know why i thought it would be 24 bytes …
Okay – playing with immutables now …
and here again there is the question how xor-links are supposed to work for me
i uploaded a jpg to this xor-name (hex)
and the same file as png to this xor-name (hex)
i know that it‘s really there because i downloaded it on a different pc and both downloads succeeded without error
since the prefix you used for your png link at safe://hygjdkftyx3k7kr51q9mxapy418zk3stdsss8suyqcim3b56jcten8d4j9emo is not in the python implementation of multicodec i „added it manually“ to my local version of it (just smuggled it into the source code)
then i used the safe://toolbox.dapp to analyse the picture link you provided to extract the xor-name – i can download the lamp and get the data …
when i convert it to a cid i get as base32:
which seems to be fine (page loads – toolbox analyses)
for base32z then suddenly i only get ‚roughly‘ what your link is (pay attention to the 2 additional y‘s) and safe://hygkdrftyexmueqbwimmyjnbkfm5hdbex8eukqt7j34t6mosdxuk3xobgi34y then analyses fine again with the toolbox + loads the picture (so it‘s definitely not ‚just base32z encoded‘ but somehow there are additional characters that were not there before [and imo are not supposed to be there – since it‘s 2 additional characters that obviously don‘t contain any information … otherwise the base32 encoded data wouldn‘t analyze and work…])
if i try the same with my uploaded png i get:
which doesn‘t let me view the png in the browser and doesn‘t analyse with the toolbox,
fails too – and with the 2 additional y‘s (as in the example link)
it fails as well …
so what am i doing wrong with my png?
If i messed up something how is the precise specification of the xor-url of an immutable? why are there those 2 additional y’s? () and why don’t we just append the mime type to the bytes and encode it just the same way we did before …?
As it is now for me in python - I need to manually patch the multicodecs implementation to have the required mime types (not sure how standardised that is - and how widely used… The last update of the hash constants on github for python was 2015… where do those codec-numbers come from anyway? I didn’t see them in the iana link from the github issue and the used codec for the png is not the in the issue mentioned x1910 but x1914…? May we run into collisions with the definitions suddenly? ) then I can generate a link (which only works in some cases as it seems)…
Ps: oh sorry! My mistake with the y! I think I made a copy&paste error with the base32z declaration dict… On second view the lengths of the links looked fishy
Then your base32z link is perfect - it’s just that I fail with generating the right link to my uploaded png
It’s just the CID and as you already know we use the multicodec-content-type part for the mime types as suggested
As you can see I worked on a PR against the multicode repo, which wasn’t merged yet, they were suggesting some minors changes that I/we will need to work on to presumably get it there.
Now, the python implementation is perhaps not using the master list of codec as it should, which is the one from the multicodec repo from my understanding: https://github.com/multiformats/multicodec/blob/master/table.csv and this is why you and me had to patch the table to have them in there untill they are effectively approved and make it part of the master list (our SAFE experimental api uses my patch to the table: https://github.com/bochaco/js-multicodec/tree/mime-types-as-codecs, which is used from https://github.com/bochaco/js-cid/tree/temp-use-bochaco-multicodec that in turn is the safe_app_nodejs’s dependency).
Therefore the CID implementation (in any lang) shall follow the spec from https://github.com/multiformats/cid, the issues you had so far seem to be all due to some tiny difference in the CID implementation and/or encodings used within them. Do I know if the js one is the correct one and the python is wrong, no I don’t know, since the browser and tools all use the same implementation. In any case, if we use CID and multiformats, we should be able to work on PRs to be sent to those implementations, in fact these problems you are seeing could be a good issue to be reported in the python implementation repo
Ah - sorry for my impatience =D… Thx I’ll dig a bit further why my link doesn’t generate correctly as soon as I have some time again - you don’t happen to be able to generate a working png link to
@bochaco - that would be super nice and might speed up finding my mistake
These are the XOR-URLs I’m manually generating with JS:
I cannot fetch the files with any of them, and the safe-URL Analyser is decoding them correctly to the correct xorname and mime type (you can try them). Perhaps because they are private, i.e. encrypted, are they?
I just pushed a commit you can use to generate ImmD XOR-URLs with different mime types, as long as you didn’t encrypt the targeted Immd files (we are missing an API which allow you to generate the XOR-URL of an ImmD without trying to fetch it, as it fails to decrypt it if not owned).
Just clone the repo https://github.com/bochaco/safe-tools, then
npm i && npm start, then open it with the browser at
localhost:1357, you’ll see a new tab for obtaining ImmD XOR-URLs.
Yapp - it’s encrypted - that’s correct
I’ll test it asap (sorry - not possible before tomorrow) =)
oh you mean owned by the app and not owned by my account - aren’t you? and the browser doesn’t own my png …?
ps: oh nice - and when i don’t accidentally damage the link before re-encoding it to base32z i get exactly your link! @bochaco
No, I just mean that if the ImmD you are trying to fetch is encrypted by another app, the call to app.immutableData.fetch(iDataAddress) fails because it cannot decrypt the ImmutableData, therefore you don’t get the
Reader object where you could call
getXorUrl(mimeType) on. So I’m thinking we should have an API that you can do
app.immutableData.genXorUrl(iDataAddress, mimeType) which doesn’t try to fetch it but just generate the XOR-URL for you.
That’s really good news!! so to clarify, are you saying that there is no issue then between the Python implementation and JS one for CID as we suspected before?
so you mean yes i wasn’t aware that encrypted means only possible to decrypt by the app that created it Oo … how can that be …? how do i get the decryption key? do i always need to re-upload an immutable as unencrypted if i want to share it? Oo
No absolutely not - there is an issue and it’s all down in the ‘standard libs for cid’ - I use the character table I found in the JS implementation of base32z to re-encode base32 to JS-base32z ‘by hand’ - the ‘regular python functions’ to encode to base32z leads to different strings…
(my opinion regarding using base32z or not in your api didn’t change … - I think you should definitely go with the standard base32 because that comes with less trouble in all languages … the skipping of certain ambiguous characters might be something one could consider … but this weird re-ordering to match ‘more frequent characters’ to easier to identify characters doesn’t make sense at all … since we are talking about immutable data identifiers => some hash of encrypted data => (at least pseudo-)random data by design and for mutables only not randomly generated mutables will not be randomly distributed (standard containers and stuff using that as starting point will again be randomly distributed because of the account being a random one…) … while i would prefer even more the use of base16…)
…and that i (again) made a mistake with the base32z-encoding thing so that it took me longer to come up with the correct way of doing it again shows why it would be better to just use the standardized base32 … i lost hours in finding the correct way to re-encode data, i lost hours for reverse-engineering base32z to base32 encoded safe-links and then see where the difference is/how the correct code for the image-link you used is, you lost hours in reading my posts and answering my questions … and in the end you and me both are slower in working for the SAFE Network …
And this absolutely! We’re getting there can’t wait to see some really powerful apps on safe
Nice, good finding, I think you should send a PR to the their repo to fix it
Not trying to defend any base encoding, in fact I think it’s not that important for two rasons:
- any base encoding will be decodable (that’s the point of the CID spec), so if some people use base32, base64, etc. they will all would work fine and can fetch the same content. I think the one that most people use would become the standard.
- remember that both JS and your python implementation won’t be needed (and would go away) if this becomes natively supported by SAFE libs, note the following from our previous discussions:
And now, in an attempt to bring back the main discussion, yesterday we had a talk with @joshuef about considering the type tag to be encoded in the CID itself rather than being the port number of the URL. This would not only solve issues we already face with some browsers or libs not supporting port numbers larger than 65525 (even that the spec doesn’t mention such a limit IIRC), but also in the future if we need additional parameters for our data types we can embedd them in the CID string, i.e. the
<multihash-content-address> of the CID spec, after all the type tag is part of our addressing system and would still make sense to be part of the content-address part. Just bringing it up for discussion and feedback about this alternative.
The only problem is that strictly speaking that wouldn’t be just a hash but a hash+number, which wouldn’t be a valid CID anymore…?..
Could the mutlticodec
<key> be used for the
tag_type in order to leave the hash as is (see here).
This is the CID string:
<cidv1> ::= <multibase-prefix><cid-version><multicodec-content-type><multihash-content-address>
<varint hash function code><varint digest size in bytes><hash function output>
So do you mean to use a multicodec in the
<hash function output> ? or where exactly?