问题
I'm exploring a csv file in an interactive ghci session (in a jupyter notebook):
import Text.CSV
import Data.List
import Data.Maybe
dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat
-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index
indexRow records 1
-- this works!
-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records
While this works if I know in advance the type of column 3:
map (\x -> read x :: Double) $ indexField records 3
How can I ask read
to infer what the type might be when for example my columns could contain strings or num? I'd like it to try for me, but:
map read $ indexField records 3
fails with
Prelude.read: no parse
I don't care whether they are string or num, I just need that they are all the same and I am failing to find a way to specify that generally with the read function at least.
Weirdly, if I define a mean function like so:
mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))
This works:
mean $ map read $ indexField records 2
Just 13.501359655240003
But without the mean, this still fails:
map read $ indexField records 2
Prelude.read: no parse
回答1:
Unfortunately, read
is at the end of its wits when it comes to situations like this. Let's revisit read
:
read :: Read a => String -> a
As you can see, a
doesn't depend on the input, but solely on the output, and therefore of the context of our function. If you use read a + read b
, then the additional Num
context will limit the types to Integer
or Double
due to default
rules. Let's see it in action:
> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a
Ok, a
is still not helpful. Is there any type that we can read without additional context? Sure, unit:
> read "()"
()
it :: Read a => a
That's still not helpful at all, so let's enable the monomorphism restriction:
> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer
Aha. In the end, we had an Integer
. Due to +
, we had to decide on a type. Now, with the MonomorphismRestriction
enabled, what happens on read "1234"
without additional context?
> read "1234"
<interactive>:20:1
No instance for (Read a0) arising from a use of 'read'
The type variable 'a0' is ambiguous
Now GHCi doesn't pick any (default) type and forces you to chose one. Which makes the underlying error much more clear.
So how do we fix this? As CSV can contain arbitrary fields at run-time and all types are determined statically, we have to cheat by introducing something like
data CSVField = CSVString String | CSVNumber Double | CSVUnknown
and then write
parse :: Field -> CSVField
After all, our type needs to cover all possible fields.
However, in your case, we can just restrict read's
type:
myRead :: String -> Double
myRead = read
But that's not wise, as we can still end up with errors if the column doesn't contain Double
s to begin with. So instead, let's use readMaybe
and mapM
:
columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe
That way, the type is fixed, and we're forced to check whether we have Just
something or Nothing
:
mean <$> columnAsNumbers (indexFields records 2)
If you find yourself often using columnAsNumbers
create an operator, though:
(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index
来源:https://stackoverflow.com/questions/53956462/get-column-in-haskell-csv-and-infer-the-column-type