Uint8Array Buffer / String conversion in javascript

In order to store buffer values returned from the API as string I’m searching for a way to properly convert utf8array buffers to String and vice versa. I’ve tried many ways but for some Arrays I always get different values when converting back from string to buffer.

Here’s my code to see / reproduce my problem:

I’ve two functions for conversion:

function stringToArray(bufferString) {
	let uint8Array = new TextEncoder("utf-8").encode(bufferString);
	return uint8Array;
}
function arrayToString(bufferValue) {
	return new TextDecoder("utf-8").decode(bufferValue);
}

Here is my test code:

let encKeyPairHandle = await window.safeCrypto.generateEncKeyPair(appHandle);
let pubEncKeyHandle = await window.safeCryptoKeyPair.getPubEncKey(encKeyPairHandle);
let rawPubEncKey = await window.safeCryptoPubEncKey.getRaw(pubEncKeyHandle);
console.log(rawPubEncKey.buffer);
let rawPubEncKeyStr = arrayToString(rawPubEncKey.buffer);
rawPubEncKey = stringToArray(rawPubEncKeyStr);
console.log(rawPubEncKey);

As you can see, I convert the buffer into a string and then back again. However the two values are not the same. The second one is longer and different from the first one. Here is my console output:

Uint8Array(32) [228, 60, 188, 226, 76, 55, 247, 204, 126, 72, 202, 159, 6, 112, 162, 47, 188, 255, 124, 94, 25, 172, 141, 111, 0, 239, 119, 67, 128, 155, 116, 52]

Uint8Array(58) [239, 191, 189, 60, 239, 191, 189, 239, 191, 189, 76, 55, 239, 191, 189, 239, 191, 189, 126, 72, 202, 159, 6, 112, 239, 191, 189, 47, 239, 191, 189, 239, 191, 189, 124, 94, 25, 239, 191, 189, 239, 191, 189, 111, 0, 239, 191, 189, 119, 67, 239, 191, 189, 239, 191, 189, 116, 52]

I’ve searched and found that many people seem to struggle converting such arrays in javascript. For instance I’ve also tried this implementation here: https://gist.github.com/joni/3760795 but it also doesn’t produce equal arrays. Interestingly, if I try my functions on a simple string (like “Hello”) it works without problems. It seems to be a problem with the utf-8 chars.

Has anyone found a solution for this which works for all kind of strings / buffers?

2 Likes

The chaty and listy apps show an implementation of this, look for uintToString(uintArray) function. ( only one way, though )
( discourse won’t let me paste it correctly here, messes with &'s )

1 Like

Actually both functions above work, but only if the original input was a string containing valid utf-8 chars. Because window.safeCryptoPubEncKey.getRaw returns a “raw” byte array it won’t work I guess.

For my chat app I’ve settled using a comma-seperated string where each position represents a byte. I don’t know if that’s the most efficient way but it allows me to easily convert it back to a byte array. I think that’s also what MaidSafe is doing with their mobile chat app here (but I’m not sure):

var iDataNameBytes = iDataNameEncoded.ToUtfString().Split(',').Select(val => Convert.ToByte(val)).ToList();

What’s the reason you want to convert it to a string?

Actually I thought I need it in order to convert the JSON object (which stores the byte array) into a string but I just tested it and it works. Only drawback is, that when I convert the stringified object back into JSON the array is a normal Array instead of a Uint8Array. But it turns out this isn’t an issue because window.safeCrypto.pubEncKeyKeyFromRaw also accepts a normal Array as input.

TL:DR:
Thanks @bzee, you’re right, it’s unnecessary. :smiley:

2 Likes

Glad you found a solution.

I’m still highly curious about what’s going on with TextEncoder. I’ve not used it before.

Running experiments and I also see the same problems that you described, where the decoder returns a 30 character long string, instead of the expected 32 for a raw public key.

I’ve been consistently using the following:

function strToBuffer (string) {
  let arrayBuffer = new ArrayBuffer(string.length * 1);
  let newUint = new Uint8Array(arrayBuffer);
  newUint.forEach((_, i) => {
    newUint[i] = string.charCodeAt(i);
  });
  return newUint;
}

let bufferAsString = String.fromCharCode.apply(null, );

However, I do see that this has short comings, especially when I try to decode and encode GBK2312 for example. TextEncoder/ TextDecoder works nicely for GBK characters.

I’ll be further researching…

2 Likes

By encoding a byte array with arbitrary content into a UTF-8 string, the string will probably have lost its original meaning of the byte array. As far as a developer should be concerned, the UTF-8 encoding is unsuitable to store arbitrary binary data into, to later retrieve that exact same data.

The keyword here is arbitrary. If the MutableData is expected to hold UTF-8 encoded data, then it’s fine of course.

I just got across this popular write-up about it: Hazards of Converting Binary Data To A String
.

4 Likes

Good stuff! Thank you. I’m going to play with hexadecimal and base64.

Aaaaah, this is home. Honored to be here enjoying this with you all.

4 Likes

This topic was automatically closed after 60 days. New replies are no longer allowed.