问题
Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.
Here is my short code:
import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)
getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)
downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri
function which properly downloads the given link to a file which has the name of the rss item title.
I already tried to modify downloadUri
to work on (Maybe String,Maybe String)
with fmap
- ing on get
and writeFile
but failed with it horribly.
How can i apply my
downloadUri
function to the result of thegetTuples
function. I want to implement the following main functionmain :: IO ()
main = some magic incantation donwloadUri more incantation getTuples
The character encoding of the result of
getItemTitle
broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?
Edit:
Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:
downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
item <- get link
B.writeFile title item
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"
main = getTuples >>= downloadAllItems
The character encoding issue has been partially solved, i put decodeString
before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:
main = getTuples
回答1:
It sounds like it's the Maybe
s that are giving you trouble. There are many ways to deal with Maybe
values, and some useful library functions like fromMaybe
and fromJust
. However, the simplest way is to do pattern matching on the Maybe
value. We can tweak your downloadUri function to work with the Maybe
values. Here's an example:
downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".
Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:
downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)
Now the only Maybe
you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:
downloadAllItems (Just ts) = ??? -- hint: try a `mapM`
downloadAllItems Nothing = ??? -- don't do anything, or report an error, or...
As for your encoding issue, my guesses are:
- You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
- You are reading the information correctly, but it gets messed up when you output it.
In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.
回答2:
Your main could be something like the shown below. There may be some more concise way to compose these two operations though:
main :: IO ()
main = getTuples >>= process
where
process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst
process Nothing = return ()
download (Just t, Just l) = downloadUri (t,l)
download _ = return ()
来源:https://stackoverflow.com/questions/17038947/simple-rss-downloader-in-haskell