GHC truncating Unicode character output

烂漫一生 提交于 2019-12-12 15:15:04

问题


I can't get GHCi or GHC to print unicode codepoint 221A (sqrt symbol: √).

I don't think it's my shell, because I can get ruby to do it:

irb> puts "\u221A"
√

GHC/GHCi is another issue:

ghci> putStrLn "\8730"

ghci> withFile "temp.out" WriteMode $ flip hPutStrLn "\8730"
ghci> readFile "temp.out"
"\SUB\n"

So what am I doing wrong?

(GHC v6.l0.3)


回答1:


GHC's behavior with unicode changed in GHC 6.12.1 to "do the right thing" with Unicode strings. Prior versions truncate to 8 bit characters on IO (forcing the use of an encoding library).

That is, '\8730' is 0x221a, while '\SUB' is 0x1a -- the high byte is gone.

Here with GHC 7:

Prelude> print "√\n"
"\8730\n"
Prelude> putStr "√\n"
√
Prelude> putStr "\8730√\n"
√√

But I get your result with GHC 6.8. Like this:

Prelude> writeFile "/tmp/x" "√\n"
Prelude> readFile "/tmp/x"
"\SUB\n"

as the unicode bits are being truncated to 8 bits.

GHC 7 + IO works as expected:

Prelude> writeFile "/tmp/x" "\8730√\n"
Prelude> readFile "/tmp/x"
"\8730\8730\n"
Prelude> s <- readFile "/tmp/x"
Prelude> putStr s
√√

Can you upgrade to GHC 7 (in the Haskell Platform) to get full Unicode support? If this is not possible, you can use one of the encoding libraries, such as utf8-string



来源:https://stackoverflow.com/questions/5655544/ghc-truncating-unicode-character-output

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!