Haskell read/write binary files complete working example

一个人想着一个人 提交于 2019-12-01 08:07:33

If you're doing binary I/O, you almost certainly want ByteString for the actual input/output part. Have a look at the hGet and hPut functions it provides. (Or, if you only need strictly linear access, you can try using lazy I/O, but it's easy to get that wrong.)

Of course, a byte string is just an array of bytes; your next problem is interpreting those bytes as character / integers / doubles / whatever else they're supposed to be. There are a couple of packages for that, but Data.Binary seems to be the most mainstream one.

The documentation for binary seems to want to steer you towards using the Binary class, where you write code to serialise and deserialise whole objects. But you can use the functions in Data.Binary.Get and Data.Binary.Put to deal with individual items. There you will find functions such as getWord32be (get Word32 big-endian) and so forth.

I don't have time to write a working code example right now, but basically look at the functions I mention above and ignore everything else, and you should get some idea.

Now with working code:

module Main where

import Data.Word
import qualified Data.ByteString.Lazy as BIN
import Data.Binary.Get
import Data.Binary.Put
import Control.Monad
import System.IO

main = do
  h_in  <- openFile "Foo.bin" ReadMode
  h_out <- openFile "Bar.bin" WriteMode
  replicateM 1000 (process_chunk h_in h_out)
  hClose h_in
  hClose h_out

chunk_size = 1000
int_size = 4

process_chunk h_in h_out = do
  bin1 <- BIN.hGet h_in chunk_size
  let ints1 = runGet (replicateM (chunk_size `div` int_size) getWord32le) bin1
  let ints2 = map (\ x -> if x < 1000 then 2*x else x) ints1
  let bin2 = runPut (mapM_ putWord32le ints2)
  BIN.hPut h_out bin2

This, I believe, does what you asked for. It reads 1000 chunks of chunk_size bytes, converts each one into a list of Word32 (so it only ever has chunk_size / 4 integers in memory at once), does the calculation you specified, and writes the result back out again.

Obviously if you did this "for real" you'd want EOF checking and such.

Best way to work with binary I/O in Haskell is by using bytestrings. Lazy bytestrings provide buffered I/O, so you don't even need to care about it.

Code below assumes that chunk size is a multiple of 32-bit (which it is).

module Main where

import Data.Word
import Control.Monad
import Data.Binary.Get
import Data.Binary.Put
import qualified Data.ByteString.Lazy as BS
import qualified Data.ByteString as BStrict

-- Convert one bytestring chunk to the list of integers
-- and append the result of conversion of the later chunks.
-- It actually appends only closure which will evaluate next
-- block of numbers on demand.
toNumbers :: BStrict.ByteString -> [Word32] -> [Word32]
toNumbers chunk rest = chunkNumbers ++ rest
    where
    getNumberList = replicateM (BStrict.length chunk `div` 4) getWord32le
    chunkNumbers = runGet getNumberList (BS.fromStrict chunk)

main :: IO()
main = do
    -- every operation below is done lazily, consuming input as necessary
    input <- BS.readFile "in.dat"
    let inNumbers = BS.foldrChunks toNumbers [] input
    let outNumbers = map (\x -> if x < 1000 then 2*x else x) inNumbers
    let output = runPut (mapM_ putWord32le outNumbers)
    -- There lazy bytestring output is evaluated and saved chunk
    -- by chunk, pulling data from input file, decoding, processing
    -- and encoding it back one chunk at a time
    BS.writeFile "out.dat" output

Here is a loop to process one line at a time from stdin:

import System.IO

loop = do b <- hIsEOF stdin
          if b then return ()
               else do str <- hGetLine stdin
                       let str' = ...process str...
                       hPutStrLn stdout str'

Now just replace hGetLine with something that reads 4 bytes, etc.

Here is the I/O section for Data.ByteString:

https://hackage.haskell.org/package/bytestring-0.10.6.0/docs/Data-ByteString.html#g:29

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!