Split ByteString on a ByteString (instead of a Word8 or Char)

问题

I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as:

split :: Word8 -> ByteString -> [ByteString]

But I want to split on a multi-character ByteString (like splitting on a String instead of a Char):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import.

I've had some ideas on how to do this, but they seem kind of hacky (e.g. take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. Is there a way to do this without rebuilding the function from scratch?

回答1:

The documentation of Bytestrings breakSubstring contains a function that does what you are asking for:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y

回答2:

There are a few functions in bytestring for splitting on subsequences:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

There's also a

bytestring-csv package, http://hackage.haskell.org/package/bytestring-csv
a split package: http://hackage.haskell.org/package/split for strings though.

来源：https://stackoverflow.com/questions/1398322/split-bytestring-on-a-bytestring-instead-of-a-word8-or-char

标签

string

text

haskell

csv

bytestring