问题
I am doing a school task where I am given a small bit of sample code which I can use later. I understand 90% of this code but there is one little line/function that I for the life of me can't figure out what it does (I am very new to Haskell btw).
Sample code:
data Profile = Profile {matrix::[[(Char,Int)]], moleType::SeqType, nrOfSeqs::Int, nm::String} deriving (Show)
nucleotides = "ACGT"
aminoacids = sort "ARNDCEQGHILKMFPSTWYVX"
makeProfileMatrix :: [MolSeq] -> [[(Char, Int)]]
makeProfileMatrix [] = error "Empty sequence list"
makeProfileMatrix sl = res
where
t = seqType (head sl)
defaults =
if (t == DNA) then
zip nucleotides (replicate (length nucleotides) 0) -- Row 1
else
zip aminoacids (replicate (length aminoacids) 0) -- Row 2
strs = map seqSequence sl -- Row 3
tmp1 = map (map (\x -> ((head x), (length x))) . group . sort)
(transpose strs) -- Row 4
equalFst a b = (fst a) == (fst b)
res = map sort (map (\l -> unionBy equalFst l defaults) tmp1)
{-Row 1: 'replicate' creates a list of zeros that is equal to the length of the 'nucleotides' string.
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the nucleotides-}
{-Row 2: 'replicate' creates a list of zeros that is equal to the length of the 'aminoacids' string.
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the aminoacids-}
{-Row 3: The function 'seqSequence' is applied to each element in the 'sl' list and then returns a new altered list.
In other words 'strs' becomes a list that contains the all the sequences in 'sl' (sl contains MolSeq objects, not strings)-}
{-Row 4: (transpose strs) creates a list that has each 'column' of sequences as a element (the first element is made up of each first element in each sequence etc.).
--}
I have written an explanation for each marked Row in the code (which I think so far is correct) but I get stuck when I try to figure out what Row 4 does. I understand the 'transpose' bit but I can't at all figure out what the inner map function does. As far as I know a 'map' function needs a list as a second parameter to function but the inner map function only has an anonymous function but no list to operate on. To be perfectly clear I don't understand what the entire inner line map (\x -> ((head x), (length x))) . group . sort does. Please help!
Bonus!:
Here is another piece of sample code that I can't figure out (never worked with classes in Haskell):
class Evol object where
name :: object -> String
distance :: object -> object -> Double
distanceMatrix :: [object] -> [(String, String, Double)]
addRow :: [object] -> Int -> [(String, String, Double)]
distanceMatrix [] = []
distanceMatrix object =
addRow object 0 ++ distanceMatrix (tail object)
addRow object num -- Adds row to distance matrix
| num < length object = (name a, name b, distance a b) : addRow object (num + 1)
| otherwise = []
where
a = head object
b = object !! num
-- Determines the name and distance of an instance of "Evol" if the instance is a "MolSeq".
instance Evol MolSeq where
name = seqName
distance = seqDistance
-- Determines the name and distance of an instance of "Evol" if the instance is a "Profile".
instance Evol Profile where
name = profileName
distance = profileDistance
Especially this part:
addRow object num -- Adds row to distance matrix
| num < length object = (name a, name b, distance a b) : addRow object (num + 1)
| otherwise = []
where
a = head object
b = object !! num
You don't have to explain this one if you don't want to I am just slightly confused as to what 'addRow' actually is trying to do (in detail).
Thanks!
回答1:
map (\x -> (head x, length x)) . group . sort is an idiomatic way of generating a histogram. When you see something like this that you don’t understand, try breaking it down into smaller pieces and testing them on sample inputs:
(\x -> (head x, length x)) "AAAA"
-- ('A', 4)
(group . sort) "CABABA"
-- ["AAA", "BB", "C"]
(map (\x -> (head x, length x)) . group . sort) "CABABA"
map (\x -> (head x, length x)) (group (sort "CABABA"))
-- [('A', 3), ('B', 2), ('C', 1)]
It’s written in point-free style as a composition of 3 functions, map (…), group, and sort, but could also be written as a lambda:
\row -> map (…) (group (sort row))
For each row in the transposed matrix, it produces a histogram of the data in that row. You could get a more visual representation of this by formatting it and printing it out:
let
showHistogramRow row = concat
[ show $ head row
, ":\t"
, replicate (length row) '#'
]
input = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
putStr
$ unlines
$ map showHistogramRow
$ group
$ sort input
-- 1: ##
-- 2: #
-- 3: ##
-- 4: #
-- 5: ###
-- 6: #
-- 9: #
As for this:
addRow object num -- Adds row to distance matrix
| num < length object = (name a, name b, distance a b) : addRow object (num + 1)
| otherwise = []
where
a = head object
b = object !! num
addRow makes a list of the distances from the first element in object to each of the other elements. It uses indexing into the list in a sort of non-obvious way, when a simpler and more idiomatic map would suffice:
addRow object = map (\ b -> (name a, name b, distance a b)) object
where a = head object
Ordinarily it’s good to avoid partial functions such as head because they can throw an exception on some inputs (e.g. head []). Here it’s fine, however, because if the input list is empty, then a will never be used, and so head will never be called.
distanceMatrix could be expressed with a map as well, because it’s just calling a function (addRow) on all the tails of the list and concatenating them together with ++:
distanceMatrix object = concatMap addRow (tails object)
This could be written in point-free style too. \x -> f (g x) can be written as just f . g; here, f is concatMap addRow and g is tails:
distanceMatrix = concatMap addRow . tails
Evol just describes the set of types for which you can generate a distanceMatrix, including MolSeq and Profile. Note that addRow and distanceMatrix don‘t need to be members of this class, because they’re implemented entirely in terms of name and distance, so you could move them to the top level:
distanceMatrix :: (Evol object) => [object] -> [(String, String, Double)]
distanceMatrix = concatMap addRow . tails
addRow :: (Evol object) => [object] -> Int -> [(String, String, Double)]
addRow object = map (\ b -> (name a, name b, distance a b)) object
where a = head object
回答2:
the inner map function only has an anonymous function but no list to operate on
Given there's a function f of type a -> b -> c, which takes two arguments and returns a value of type c. If the f is called with one parameter it returns another function of type b -> c, which is going to take one more parameter and return a value. This is called currying.
This line:
map (map (\x -> ((head x), (length x))) . group . sort) (transpose strs)
can be transformed into:
map (\str -> (map (\x -> ((head x), (length x))) . group . sort) str)(transpose strs)
In this form, it might be cleared, that there's actually a list to operate on.
This function
(map (\x -> ((head x), (length x))) . group . sort)
is just a composition of sort, group and map (\x -> ((head x), (length x))).
Let's see how it works on [2,1,1,1,4]:
sort [2, 1, 1, 1, 4] => [1, 1, 1, 2, 4]
group [1, 1, 1, 2, 4] => [[1,1,1],[2],[4]]
map (\x -> ((head x), (length x))) => [(1,3),(2,1),(4,1)]
It just returns a list of tuples. Every tuple contains an element as a first element and the number of occurrence as a second element.
来源:https://stackoverflow.com/questions/46131310/haskell-having-trouble-understanding-a-small-bit-of-code