simple rss downloader in haskell

匿名 (未验证) 提交于 2019-12-03 01:05:01

问题:

Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.

Here is my short code:

import Control.Monad import Control.Applicative import Network.HTTP import Text.Feed.Import import Text.Feed.Query import Text.Feed.Types import Data.Maybe import qualified Data.ByteString as B import Network.URI (parseURI, uriToString)  getTitleAndUrl :: Item -> (Maybe String, Maybe String) getTitleAndUrl item = (getItemTitle item, getItemLink item)  downloadUri :: (String,String) -> IO () downloadUri (title,link) = do   file <- get link   B.writeFile title file     where       get url = let uri = case parseURI url of                       Nothing -> error $ "invalid uri" ++ url                       Just u -> u in                 simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody  getTuples :: IO (Maybe [(Maybe String, Maybe String)]) getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody) 

I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri function which properly downloads the given link to a file which has the name of the rss item title.

I already tried to modify downloadUri to work on (Maybe String,Maybe String) with fmap- ing on get and writeFile but failed with it horribly.

  • How can i apply my downloadUri function to the result of the getTuples function. I want to implement the following main function

    main :: IO ()
    main = some magic incantation donwloadUri more incantation getTuples

  • The character encoding of the result of getItemTitle broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?

Edit:

Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:

downloadUri :: (Maybe String,Maybe String) -> IO () downloadUri (Just title,Just link) = do   item <- get link   B.writeFile title item     where       get url = let uri = case parseURI url of                       Nothing -> error $ "invalid uri" ++ url                       Just u -> u in                 simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody downloadUri _ = print "Somewhere something went Nothing"  getTuples :: IO (Maybe [(Maybe String, Maybe String)]) getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)  downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO () downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist downloadAllItems _ = error "feed does not get parsed"  main = getTuples >>= downloadAllItems 

The character encoding issue has been partially solved, i put decodeString before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:

main = getTuples 

回答1:

It sounds like it's the Maybes that are giving you trouble. There are many ways to deal with Maybe values, and some useful library functions like fromMaybe and fromJust. However, the simplest way is to do pattern matching on the Maybe value. We can tweak your downloadUri function to work with the Maybe values. Here's an example:

downloadUri :: (Maybe String, Maybe String) -> IO () downloadUri (Just title, Just link) = do   file <- get link   B.writeFile title file     where       get url = let uri = case parseURI url of                       Nothing -> error $ "invalid uri" ++ url                       Just u -> u in                 simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody downloadUri _ = error "One of my parameters was Nothing". 

Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:

downloadUri (Nothing, Just link) = downloadUri (Just "", Just link) 

Now the only Maybe you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:

downloadAllItems (Just ts) = ??? -- hint: try a `mapM` downloadAllItems Nothing = ??? -- don't do anything, or report an error, or... 

As for your encoding issue, my guesses are:

  1. You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
  2. You are reading the information correctly, but it gets messed up when you output it.

In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.



回答2:

Your main could be something like the shown below. There may be some more concise way to compose these two operations though:

main :: IO () main = getTuples >>= process        where            process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst             process Nothing = return ()            download (Just t, Just l) = downloadUri (t,l)            download _ = return () 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!