Count frequency of each element in a list

前端 未结 2 1008
日久生厌
日久生厌 2020-12-09 20:02

I try to write a program which will count the frequency of each element in a list.

    In: \"aabbcabb\"
    Out: [(\"a\",3),(\"b\",4),(\"c\",1)]
相关标签:
2条回答
  • 2020-12-09 20:41

    You didn't say whether you want to write it whole on your own, or whether it's OK to compose it from some standard functions.

    import Data.List
    
    g s = map (\x -> ([head x], length x)) . group . sort $ s
    
    -- g = map (head &&& length) . group . sort     -- without the [...]
    

    is the standard quick-n-dirty way to code it.


    OK, so your original idea was to Code it Point-Free Style (certain tune playing in my head...):

    frequencyOfElt :: (Eq a) => [a] -> [(a,Int)]
    frequencyOfElt xs = countElt (unique xs) xs     -- change the result type
      where 
        unique [] = []
        unique (x:xs) = x : unique (filter (/= x) xs)  
    
        countElt ref target =   -- Code it Point-Free Style  (your original idea)
          zip 
            ref $               -- your original type would need (map (:[]) ref) here
            map length $
              zipWith ($)       -- ((filter . (==)) c) === (filter (== c))
                (zipWith ($) (repeat (filter . (==))) ref)  
                (repeat target)
    

    I've changed the type here to the more reasonable [a] -> [(a,Int)] btw. Note, that

    zipWith ($) fs (repeat z) === map ($ z) fs
    zipWith ($) (repeat f) zs === map (f $) zs === map f zs
    

    hence the code simplifies to

        countElt ref target =  
          zip 
            ref $              
            map length $
              map ($ target)      
                (zipWith ($) (repeat (filter . (==))) ref)  
    

    and then

        countElt ref target =  
          zip 
            ref $              
            map length $
              map ($ target) $
                map (filter . (==)) ref
    

    but map f $ map g xs === map (f.g) xs, so

        countElt ref target =  
          zip 
            ref $              
            map (length . ($ target) . filter . (==)) ref      -- (1)
    

    which is a bit clearer (for my taste) written with a list comprehension,

        countElt ref target =  
            [ (c, (length . ($ target) . filter . (==)) c) | c <- ref] 
         == [ (c,  length ( ($ target) ( filter (== c))))  | c <- ref]     
         == [ (c,  length $ filter (== c) target)          | c <- ref]     
    

    Which gives us an idea to re-write (1) further as

        countElt ref target =  
          zip <*> map (length . (`filter` target) . (==)) $ ref
    

    but this obsession with point-free code becomes pointless here.


    So going back to the readable list comprehensions, using a standard nub function which is equivalent to your unique, your idea becomes

    import Data.List
    
    frequencyOfElt xs = [ (c, length $ filter (== c) xs) | c <- nub xs]
    

    This algorithm is actually quadratic (~ n^2), so it is worse than the first version above which is dominated by sort i.e. is linearithmic (~ n log(n)).


    This code though can be manipulated further by a principle of equivalent transformations:

      = [ (c, length . filter (== c) $ sort xs) | c <- nub xs]
    

    ... because searching in a list is the same as searching in a list, sorted. Doing more work here -- will it pay off?..

      = [ (c, length . filter (== c) $ sort xs) | (c:_) <- group $ sort xs]
    

    ... right? But now, group had already grouped them by (==), so there's no need for the filter call to repeat the work already done by group:

      = [ (c, length . get c . group $ sort xs) | (c:_) <- group $ sort xs]
                where get c gs = fromJust . find ((== c).head) $ gs
    
      = [ (c, length g) | g@(c:_) <- group $ sort xs]
    
      = [ (head g, length g) | g <- group (sort xs)]
    
      = (map (head &&& length) . group . sort) xs
    

    isn't it? And here it is, the same linearithmic algorithm from the beginning of this post, actually derived from your code by factoring out its hidden common computations, making them available for reuse and code simplification.

    0 讨论(0)
  • 2020-12-09 20:59

    Using multiset-0.1:

    import Data.Multiset
    
    freq = toOccurList . fromList 
    
    0 讨论(0)
提交回复
热议问题