hSeek and SeekFromEnd in Haskell

自闭症网瘾萝莉.ら 提交于 2019-12-10 18:21:46

问题


I'm looking to retrieve just the last line of a file quickly in Haskell---starting from the end, not the beginning---and having some difficulties using hSeek correctly.

It seems the SeekFromEnd N behaves differently than finding the length of the file sz, and using AbsoluteSeek to go (sz - N) bytes.

outh <- openFile "test.csv" ReadMode

λ> hIsSeekable outh
True

λ> hFileSize outh
81619956
λ> hSeek outh AbsoluteSeek 1000
λ> hTell outh
1000

λ> hSeek outh SeekFromEnd 1000
λ> hTell outh
81620956

λ> hSeek outh AbsoluteSeek 0
λ> hGetLine outh
"here's my data"

λ> hSeek outh SeekFromEnd 10000
-*** Exception: test.csv: hGetLine: end of file

Hm, that's weird.

So, I made a function that does this with absolute instead:

λ> hSeek outh SeekFromEnd 100000
λ> hTell outh
81719956

fromEnd outh = do
  sz <- hFileSize outh
  hSeek outh AbsoluteSeek (sz - 100000)

λ> fromEnd outh

λ> hTell outh
81519956

So output-wise, they have different answers which is weird. Additionally, I can now also use hGetLine, which SeekFromEnd failed on:

λ> hGetLine outh
"partial output"
λ> hGetLine outh
"full output, lots of fields, partial output"

Not clear to me what's going on here. Why does my fromEnd behave differently than SeekFromEnd in permitting hGetLine?

Part II of the question: what /would/ be the right strategy for starting at the end of the file and seeking backwards to the first newline (the first \n after the EOF newline)?

In this question, I'm looking specifically for an answer using SeekFromEnd.


回答1:


The offset to SeekFromEnd is expected to be negative.

As for getting the last line of a file, we come across the annoyance that we have to scan each character from the end, one by one, every time resetting the position. That said, we can do it - we just keep moving back until we encounter the first \n character.

import System.IO

-- | Given a file handle, find the last line. There are no guarantees as to the 
-- position of the handle after this call, and it is expected that the given
-- handle is seekable.
hGetLastLine :: Handle -> IO String
hGetLastLine hdl = go "" (negate 1)
  where
  go s i = do
    hSeek hdl SeekFromEnd i
    c <- hGetChar hdl
    if c == '\n'
      then pure s
      else go (c:s) (i-1)

You may want to add an off by one here, as most files generally end in an \n (and that empty line is probably not what you want)



来源:https://stackoverflow.com/questions/41654849/hseek-and-seekfromend-in-haskell

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!