bytestring | 易学教程

Is it possible to use Text or ByteString on HXT in Haskell?

阅读更多关于 Is it possible to use Text or ByteString on HXT in Haskell?

问题 I think HXT, a XML/HTML processing library in Haskell, has really flexible and powerful methods for traversing and manipulating DOM trees by Arrows. http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell.html It seems, however, HXT has only String representation for DOM node contents. http://hackage.haskell.org/packages/archive/hxt/9.1.6/doc/html/Text-XML-HXT-DOM-TypeDefs.html#t:XNode Is it possible to use either of ByteString or Text for HXT? Text is preferred since I am using HXT with

Segfault reading lazy bytestring past 2^18 bytes

阅读更多关于 Segfault reading lazy bytestring past 2^18 bytes

问题 Consider the following code: http://hpaste.org/90394 I am memory mapping a large 460mb file to a lazy ByteString. The length of the ByteString reports 471053056 . When nxNodeFromID file 110000 is changed to a lower node ID, ie: 10000 , it works perfectly. However; as soon as I try and serialize anything past exactly 2^18 bytes ( 262144 ) of the ByteString I get Segmentation fault/access violation in generated code and termination. I'm running Windows and using GHC 7.4.2. Please advise whether

Purity of functions generating ByteString (or any object with ForeignPtr component)

阅读更多关于 Purity of functions generating ByteString (or any object with ForeignPtr component)

问题 Since a ByteString is a constructor with ForeignPtr : data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload {-# UNPACK #-} !Int -- offset {-# UNPACK #-} !Int -- length If I have a function that returns ByteString , then given an input, say a constant Word8 , the function will return a ByteString with non-deterministic ForeignPtr value - as to what that value will be is determined by the memory manager. So, does that mean that a function that returns ByteString is not pure? That

Where is Network.Socket.ByteString.Lazy's sendTo?

阅读更多关于 Where is Network.Socket.ByteString.Lazy's sendTo?

Both Network.Socket.ByteString and Network.Socket.ByteString.Lazy have a send function. Network.Socket.ByteString has a sendTo function, but Network.Socket.ByteString.Lazy doesn't. How can I use Network.Socket.ByteString 's sendTo with a Lazy.ByteString or Network.Socket.ByteString.Lazy 's send function. (i.e. how do I tell it where to send the packet.) Can anyone recommend a good tutorial on Haskell's Strings, BytesStrings. Lazy.ByteStrings, etc. as I find them very confusing (coming from a Java/Python background). Note that sendTo is strict in the data sent, and so there's no real logic to

Converting character offsets into byte offsets (in Python)

阅读更多关于 Converting character offsets into byte offsets (in Python)

问题 Suppose I have a bunch of files in UTF-8 that I send to an external API in unicode. The API operates on each unicode string and returns a list with (character_offset, substr) tuples. The output I need is the begin and end byte offset for each found substring. If I'm lucky the input text contains only ASCII characters (making character offset and byte offset identical), but this is not always the case. How can I find the begin and end byte offsets for a known begin character offset and

Bytestring linking in ghc

阅读更多关于 Bytestring linking in ghc

问题 Consider the following simple code: import Crypto.Hash.SHA1 (hashlazy) import qualified Data.ByteString as BS main = return () I installed cabal install --global bytestring and then I obtain (on a newly installed Ubuntu 12.04 machine using ghc 7.4.1): GHCi runtime linker: fatal error: I found a duplicate definition for symbol fps_minimum whilst processing object file /usr/local/lib/bytestring-0.10.0.1/ghc-7.4.1/HSbytestring-0.10.0.1.o This could be caused by: * Loading two different object

In Haskell, will calling length on a Lazy ByteString force the entire string into memory?

阅读更多关于 In Haskell, will calling length on a Lazy ByteString force the entire string into memory?

问题 I am reading a large data stream using lazy bytestrings, and want to know if at least X more bytes is available while parsing it. That is, I want to know if the bytestring is at least X bytes long. Will calling length on it result in the entire stream getting loaded, hence defeating the purpose of using the lazy bytestring? If yes, then the followup would be: How to tell if it has at least X bytes without loading the entire stream? EDIT: Originally I asked in the context of reading files but

Split ByteString on a ByteString (instead of a Word8 or Char)

阅读更多关于 Split ByteString on a ByteString (instead of a Word8 or Char)

问题 I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as: split :: Word8 -> ByteString -> [ByteString] But I want to split on a multi-character ByteString (like splitting on a String instead of a Char): split :: ByteString -> ByteString -> [ByteString] I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character

Matching bytestrings in Parsec

阅读更多关于 Matching bytestrings in Parsec

问题 I am currently trying to use the Full CSV Parser presented in Real World Haskell . In order to I tried to modify the code to use ByteString instead of String , but there is a string combinator which just works with String . Is there a Parsec combinator similar to string that works with ByteString , without having to do conversions back and forth? I've seen there is an alternative parser that handles ByteString : attoparsec , but I would prefer to stick with Parsec, since I'm just learning how

How do I convert a unicode header to byte string in Flask?

阅读更多关于 How do I convert a unicode header to byte string in Flask?

I have a flask app that I've been able to get running on my development server. However, when I try to run the same app under mod_wsgi I get an error: TypeError: expected byte string object for header name, value of type unicode found I've tried to convert the headers many different ways but I'm getting the same error: for k,v in dict(request.headers).iteritems(): response.headers[k.encode('latin-1')] = v.encode('latin-1') I've also tried the following but get the same exact error: .encode('utf-8'), decode('utf-8'), decode('latin-1'), str() Am I doing something wrong? EDIT (the real stacktrace