ruby - Using binary data (strings in utf-8) from external file -


i have problem using strings in utf-8 format, e.g. "\u0161\u010d\u0159\u017e\u00fd". when such string defined variable in program works fine. when use such string reading external file wrong output (i don't want/expect). i'm missing necessary encoding stuff...

my code:

file  = "c:\\...\\vlmlist_unicode.txt" #\u306b\u3064\u3044\u3066 data = file.open(file, 'rb') { |io| io.read.split(/\t/) } puts data data_var = "\u306b\u3064\u3044\u3066" puts data_var 

output:

\u306b\u3064\u3044\u3066 # don't want について # want 

i'm trying read file in binary form specifying 'rb' there other problem... run code in netbeans 7.3.1 build in jruby 1.7.3 (i tried ruby 2.0.0 without effect.)

since i'm new in ruby world ideas welcomed...

if file contains literal escaped string:

\u306b\u3064\u3044\u3066 

then need unescape after reading. ruby string literals, why second case worked you. taken answer "is best way unescape unicode escape sequences in ruby?", can use this:

file  = "c:\\...\\vlmlist_unicode.txt" #\u306b\u3064\u3044\u3066 data = file.open(file, 'rb') { |io|    contents = io.read.gsub(/\\u([\da-fa-f]{4})/) { |m|      [$1].pack("h*").unpack("n*").pack("u*")   }   contents.split(/\t/) } 

alternatively, if make more readable, extract substitution new method, , add string class:

class string   def unescape_unicode     self.gsub(/\\u([\da-fa-f]{4})/) { |m|        [$1].pack("h*").unpack("n*").pack("u*")     }   end end 

then can call:

file  = "c:\\...\\vlmlist_unicode.txt" #\u306b\u3064\u3044\u3066 data = file.open(file, 'rb') { |io|    io.read.unescape_unicode.split(/\t/) } 

Comments

Popular posts from this blog

javascript - DIV "hiding" when changing dropdown value -

Does Firefox offer AppleScript support to get URL of windows? -

android - How to install packaged app on Firefox for mobile? -