Category: Problems

  • Encoding.Unicode.GetBytes/GetString inappropriate for encoding/decoding non-textual data

    You see a few examples around the net using Encoding.Unicode.GetBytes and Encoding.Unicode.GetString as a pair to encode/decode some data. I just found out today that the two methods are not entirely complimentary. This F# example shows why:

    open System.Text
    let bytes = [|0uy; 216uy; 140uy; 210uy; 47uy; 9uy; 0uy; 0uy|]
    let str = Encoding.Unicode.GetString(bytes)
    let bytes' = Encoding.Unicode.GetBytes(str)
    printfn "%A\r\n%A\r\n%A" bytes str bytes'

    It outputs something like:

    [|0uy; 216uy; 140uy; 210uy; 47uy; 9uy; 0uy; 0uy|]
    "�튌य "
    [|253uy; 255uy; 140uy; 210uy; 47uy; 9uy; 0uy; 0uy|]

    Note the difference in the first two bytes in the byte arrays. This is probably OK for encoding/decoding actual Unicode text, but if you’re using the methods to perhaps serialize an object, you may find your code breaks.