Split ByteString on a ByteString (instead of a Word8 or Char)

不打扰是莪最后的温柔 提交于 2019-12-06 11:34:30

问题


I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as:

split :: Word8 -> ByteString -> [ByteString]

But I want to split on a multi-character ByteString (like splitting on a String instead of a Char):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import.

I've had some ideas on how to do this, but they seem kind of hacky (e.g. take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. Is there a way to do this without rebuilding the function from scratch?


回答1:


The documentation of Bytestrings breakSubstring contains a function that does what you are asking for:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y



回答2:


There are a few functions in bytestring for splitting on subsequences:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

There's also a

  • bytestring-csv package, http://hackage.haskell.org/package/bytestring-csv
  • a split package: http://hackage.haskell.org/package/split for strings though.


来源:https://stackoverflow.com/questions/1398322/split-bytestring-on-a-bytestring-instead-of-a-word8-or-char

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!