Get Column in Haskell CSV and infer the column type

左心房为你撑大大i 提交于 2020-01-02 13:33:32

问题


I'm exploring a csv file in an interactive ghci session (in a jupyter notebook):

import Text.CSV
import Data.List
import Data.Maybe

dat <- parseCSVFromFile "/home/user/data.csv"
headers = head dat
records = tail dat

-- define a way to get a particular row by index
indexRow :: [[Field]] -> Int -> [Field]
indexRow csv index = csv !! index

indexRow records 1
-- this works! 

-- Now, define a way to get a particular column by index
indexField :: [[Field]] -> Int -> [Field]
indexField records index = map (\x -> x !! index) records

While this works if I know in advance the type of column 3:

map (\x -> read x :: Double) $ indexField records 3

How can I ask read to infer what the type might be when for example my columns could contain strings or num? I'd like it to try for me, but:

map read $ indexField records 3

fails with

Prelude.read: no parse

I don't care whether they are string or num, I just need that they are all the same and I am failing to find a way to specify that generally with the read function at least.

Weirdly, if I define a mean function like so:

mean :: Fractional a => [a] -> Maybe a
mean [] = Nothing
mean [x] = Just x
mean xs = Just (sum(xs) / (fromIntegral (length xs)))

This works:

mean $ map read $ indexField records 2
Just 13.501359655240003

But without the mean, this still fails:

map read $ indexField records 2
Prelude.read: no parse

回答1:


Unfortunately, read is at the end of its wits when it comes to situations like this. Let's revisit read:

read :: Read a => String -> a

As you can see, a doesn't depend on the input, but solely on the output, and therefore of the context of our function. If you use read a + read b, then the additional Num context will limit the types to Integer or Double due to default rules. Let's see it in action:

> :set +t
> read "1234"
*** Exception: Prelude.read: no parse
> read "1234" + read "1234"
2468
it :: (Num a, Read a) => a

Ok, a is still not helpful. Is there any type that we can read without additional context? Sure, unit:

> read "()"
()
it :: Read a => a

That's still not helpful at all, so let's enable the monomorphism restriction:

> :set -XMonomorphismRestriction
> read "1234" + read "1234"
2468
it :: Integer

Aha. In the end, we had an Integer. Due to +, we had to decide on a type. Now, with the MonomorphismRestriction enabled, what happens on read "1234" without additional context?

> read "1234"
<interactive>:20:1
   No instance for (Read a0) arising from a use of 'read'
   The type variable 'a0' is ambiguous

Now GHCi doesn't pick any (default) type and forces you to chose one. Which makes the underlying error much more clear.

So how do we fix this? As CSV can contain arbitrary fields at run-time and all types are determined statically, we have to cheat by introducing something like

data CSVField = CSVString String | CSVNumber Double | CSVUnknown

and then write

parse :: Field -> CSVField

After all, our type needs to cover all possible fields.

However, in your case, we can just restrict read's type:

myRead :: String -> Double
myRead = read

But that's not wise, as we can still end up with errors if the column doesn't contain Doubles to begin with. So instead, let's use readMaybe and mapM:

columnAsNumbers :: [Field] -> Maybe [Double]
columnAsNumbers = mapM readMaybe

That way, the type is fixed, and we're forced to check whether we have Just something or Nothing:

mean <$> columnAsNumbers (indexFields records 2)

If you find yourself often using columnAsNumbers create an operator, though:

(!!$) :: [[Field]] -> Maybe [Double]
records !!$ index = columnAsNumbers $ indexFields records index


来源:https://stackoverflow.com/questions/53956462/get-column-in-haskell-csv-and-infer-the-column-type

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!