Read file with UTF-8 in Haskell as IO String

时间秒杀一切 提交于 2021-02-18 21:13:32

问题


I have the following code which works fine unless the file has utf-8 characteres :

module Main where
import Ref
main = do
    text <- getLine
    theInput <- readFile text
    writeFile ("a"++text) (unlist . proc . lines $ theInput)

With utf-8 characteres I get this: hGetContents: invalid argument (invalid byte sequence)

Since the file I'm working with has UTF-8 characters, I would like to handle this exception in order to reuse the functions imported from Ref if possible.

Is there a way to read a UTF-8 file as IO String so I can reuse my Ref's functions?. What modifications should I make to my code?. Thanks in Advance.

I attach the functions declarations from my Ref module:

unlist :: [String] -> String
proc :: [String] -> [String]

from prelude:

lines :: String -> [String]

回答1:


This can be done with just GHC's basic (but extended from the standard) System.IO module, although you'll then have to use more functions:

module Main where

import Ref
import System.IO

main = do
    text <- getLine
    inputHandle <- openFile text ReadMode 
    hSetEncoding inputHandle utf8
    theInput <- hGetContents inputHandle
    outputHandle <- openFile ("a"++text) WriteMode
    hSetEncoding outputHandle utf8
    hPutStr outputHandle (unlist . proc . lines $ theInput)
    hClose outputHandle -- I guess this one is optional in this case.



回答2:


Thanks for the answers, but I found the solution by myself. Actually the file I was working with has this codification:

ISO-8859 text, with CR line terminators

So to work with that file with my haskell code It should have this codification instead:

UTF-8 Unicode text, with CR line terminators

You can check the file codification with the utility file like this:

$ file filename

To change the file codification follow the instructions from this link!




回答3:


Use System.IO.Encoding.

The lack of unicode support is a well known problem with with the standard Haskell IO library.

module Main where

import Prelude hiding (readFile, getLine, writeFile)
import System.IO.Encoding
import Data.Encoding.UTF8

main = do
    let ?enc = UTF8
    text <- getLine
    theInput <- readFile text
    writeFile ("a" ++ text) (unlist . proc . lines $ theInput)


来源:https://stackoverflow.com/questions/33444796/read-file-with-utf-8-in-haskell-as-io-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!