Can we write a function in Haskell to measure the number of bytes that a string of n characters occupies?

血红的双手。 提交于 2019-12-10 16:07:09

问题


Suppose I have the string

"abc"

and I want to calculate the number of bytes it occupies in memory.

I could do:

import Data.Bits
finiteBitSize ("abc" :: [Char])

but that breaks because [Char] is not a type supported by the function. (Also it is bits not bytes, but the point was to paint a picture of what I'm looking for).

My question is: Can we write a function in Haskell to measure the number of bytes that a string of n characters occupies?


回答1:


There's a Haskell package called ghc-datasize which allows you to calculate real occupied memory by every Haskell value. You can even see an example of how to calculate string size in the official Hackage documentation:

  • http://hackage.haskell.org/package/ghc-datasize

To calculate the length of the String you need to use the following function:

recursiveSize :: a -> IO Word

It looks like this:

ghci> recursiveSize $!! "foobar"
240



回答2:


It's complicated.

Let's talk about GHC, and about String specifically, and let's assume the thing has been fully evaluated, so we didn't get to use it iteratively in a GC-friendly way and we didn't get to delay evaluation and store a tiny thunk representing an enormous data structure.

After making all those simplifying assumptions, we're gonna need to know some definitions.

type String = [Char]
data [a] = [] | a : [a] -- pseudosyntax
data Char = C# Char# -- I'm guessing, couldn't find a canonical source

Now we're going to use a few rules of thumb. First: unboxed stuff (like Char#) is generally stored in a machine word. We live in a world of 64-bit machines, so Char# is probably 8 bytes, even though it likely only uses the bottom 4 bytes of it. Second: data constructors are a word to say which constructor, plus a word to point at each of the fields.

Now we're ready to tot it up.

Empty strings are [], one word for the constructor, no words for fields, so one word total.

Nonempty strings are c : cs, so one word for the : constructor, one word to point to c, one word to point to cs, one word for the C# constructor, one word for the Char#. That's 5 words plus however many we need for cs.

So for a String of length n, we've got 5*n words to represent the body of the String and one extra for the terminating []. Practically speaking that's 40 bytes per character! Yikes.

And now you know why packed representations like Text (or, when appropriate, ByteString) are such a big deal.



来源:https://stackoverflow.com/questions/57739123/can-we-write-a-function-in-haskell-to-measure-the-number-of-bytes-that-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!