Merge sort in Haskell

泄露秘密 提交于 2019-12-03 06:57:40

问题


I am new to Haskell and I am trying to implement a few known algorithms in it.

I have implemented merge sort on strings. I am a bit disappointed with the performance of my Haskell implementation compared to C and Java implementations. On my machine (Ubuntu Linux, 1.8 GHz), C (gcc 4.3.3) sorts 1 000 000 strings in 1.85 s, Java (Java SE 1.6.0_14) in 3.68 s, Haskell (GHC 6.8.2) in 25.89 s. With larger input (10 000 000 strings), C takes 21.81 s, Java takes 59.68 s, Haskell starts swapping and I preferred to stop the program after several minutes.

Since I am new to Haskell, I would be interested to know if my implementation can be made more time / space efficient.

Thank you in advance for any hint Giorgio

My implementation:

merge :: [String] -> [String] -> [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) = if x < y
                        then x : (merge xs (y:ys))
                        else y : (merge (x:xs) ys)

mergeSort :: [String] -> [String]
mergeSort xs = if (l < 2)
                 then xs
                 else merge h t
               where l = length xs
                     n = l `div` 2
                     s = splitAt n xs
                     h = mergeSort (fst s)
                     t = mergeSort (snd s)

回答1:


Try this version:

mergesort :: [String] -> [String]
mergesort = mergesort' . map wrap

mergesort' :: [[String]] -> [String]
mergesort' [] = []
mergesort' [xs] = xs
mergesort' xss = mergesort' (merge_pairs xss)

merge_pairs :: [[String]] -> [[String]]
merge_pairs [] = []
merge_pairs [xs] = [xs]
merge_pairs (xs:ys:xss) = merge xs ys : merge_pairs xss

merge :: [String] -> [String] -> [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
 = if x > y
        then y : merge (x:xs)  ys
        else x : merge  xs    (y:ys)

wrap :: String -> [String]
wrap x = [x]
  1. Bad idea is splitting list first. Instead of it just make list of one member lists. Haskell is lazy, it will be done in right time.
  2. Then merge pairs of lists until you have only one list.

Edit: Someone who down-vote this answer: above merge sort implementation is same algorithm as used in ghc Data.List.sort except with cmp function removed. Well ghc authors are may be wrong :-/




回答2:


In Haskell, a string is a lazy list of characters and has the same overhead as any other list. If I remember right from a talk I heard Simon Peyton Jones give in 2004, the space cost in GHC is 40 bytes per character. For an apples-to-apples comparation you probably should be sorting Data.ByteString, which is designed to give performance comparable to other languages.




回答3:


Better way to split the list to avoid the issue CesarB points out:

split []             = ([], [])
split [x]            = ([x], [])
split (x : y : rest) = (x : xs, y : ys)
                       where (xs, ys) = split rest

mergeSort []  = []
mergeSort [x] = [x]
mergeSort xs  = merge (mergesort ys) (mergesort zs)
                where (ys, zs) = split xs

EDIT: Fixed.




回答4:


I am not sure if this is the cause of your problem, but remember that lists are a sequential data structure. In particular, both length xs and splitAt n xs will take an amount of time proportional to the length of the list (O(n)).

In C and Java, you are most probably using arrays, which take constant time for both operations (O(1)).

Edit: answering your question on how to make it more efficient, you can use arrays in Haskell too.



来源:https://stackoverflow.com/questions/1215432/merge-sort-in-haskell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!