Encoding.Unicode.GetBytes/GetString inappropriate for encoding/decoding non-textual data

You see a few examples around the net using Encoding.Unicode.GetBytes and Encoding.Unicode.GetString as a pair to encode/decode some data. I just found out today that the two methods are not entirely complimentary. This F# example shows why:

  1. open System.Text
  2. let bytes = [|0uy; 216uy; 140uy; 210uy; 47uy; 9uy; 0uy; 0uy|]
  3. let str = Encoding.Unicode.GetString(bytes)
  4. let bytes' = Encoding.Unicode.GetBytes(str)
  5. printfn "%A\r\n%A\r\n%A" bytes str bytes'

It outputs something like:

  1. [|0uy; 216uy; 140uy; 210uy; 47uy; 9uy; 0uy; 0uy|]
  2. "�튌य "
  3. [|253uy; 255uy; 140uy; 210uy; 47uy; 9uy; 0uy; 0uy|]

Note the difference in the first two bytes in the byte arrays. This is probably OK for encoding/decoding actual Unicode text, but if you’re using the methods to perhaps serialize an object, you may find your code breaks.

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • email
  • LinkedIn
  • Technorati

Leave a Reply

Your email address will not be published. Required fields are marked *