The checksum is to ensure the url does not have typos or errors. The checksum is a hash of the url not of the content at that url.
If there are typos in the url then the checksum has a very low chance of matching and an error can be shown, eg ‘do not send funds to this url since the checksum has failed and there’s probably a typo or transmission error’.
BIP39 has 1 bit of checksum for every 32 bits of data. So a 12 word mnemonic is 132 bits, which is 128 of data + 4 of checksum
It seems to me several fields of the url are acting as an equivalent of http headers Content-Type and Content-Encoding… not sure if it’s exactly true but that’s what comes to mind with this. Is the url an appropriate place for that metadata? I think it’s necessary to be there but it does seem a little strange to me for some reason.
The current state-of-the-art for checksum encoding (in bitcoin anyhow) is bech32m, well worth a look to understand why they moved to this encoding:
Not saying we should use bech32m but we should at least be aware of it and what problems it aims to solve.
Another interesting checksum is ethereum using capitalization specified in eip55. But this won’t work with base64 encoded urls.
Broadly, I feel that urls are so long anyway that another 4 bytes isn’t problematic. But 4 bytes does seem too much when we look at other standards.
bech32 addresses have a 30 bit checksum (6 character checksum, each char is from an alphabet of 32 chars, which is 5 bits per char). From bip173:
We pick 6 checksum characters as a trade-off between length of the addresses and the error-detection capabilities, as 6 characters is the lowest number sufficient for a random failure chance below 1 per billion. For the length of data we’re interested in protecting (up to 71 bytes for a potential future 40-byte witness program), BCH codes can be constructed that guarantee detecting up to 4 errors.