Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

In the following:

 scala> (new String(Array[Byte](1, 2, 3, -1, -2, -127))).getBytes
 res12: Array[Byte] = Array(1, 2, 3, -1, -2, 63)

why is -127 converted to 63? and how do I get it back as -127

[EDIT:] Java version below (to show that its not just a "Scala problem")

c:mp>type Main.java
public class Main {
    public static void main(String [] args) {
        byte [] b = {1, 2, 3, -1, -2, -127};
        byte [] c = new String(b).getBytes();
        for (int i = 0; i < 6; i++){
            System.out.println("b:"+b[i]+"; c:"+c[i]);
        }
    }
}
c:mp>javac Main.java
c:mp>java Main
b:1; c:1
b:2; c:2
b:3; c:3
b:-1; c:-1
b:-2; c:-2
b:-127; c:63
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
172 views
Welcome To Ask or Share your Answers For Others

1 Answer

The constructor you're calling makes it non-obvious that binary-to-string conversions use a decoding: String(byte[] bytes, Charset charset). What you want is to use no decoding at all.

Fortunately, there's a constructor for that: String(char[] value).

Now you have the data in a string, but you want it back exactly as is. But guess what! getBytes(Charset charset) That's right, there's an encoding applied automatically also. Fortunately, there is a toCharArray() method.

If you must start with bytes and end with bytes, you then have to map the char arrays to bytes:

(new String(Array[Byte](1,2,3,-1,-2,-127).map(_.toChar))).toCharArray.map(_.toByte)

So, to summarize: converting between String and Array[Byte] involves encoding and decoding. If you want to put binary data in a string, you have to do it at the level of characters. Note, however, that this will give you a garbage string (i.e. the result will not be well-formed UTF-16, as String is expected to be), and so you'd better read it out as characters and convert it back to bytes.

You could shift the bytes up by, say, adding 512; then you'd get a bunch of valid single Char code points. But this is using 16 bits to represent every 8, a 50% encoding efficiency. Base64 is a better option for serializing binary data (8 bits to represent 6, 75% efficient).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...