HXT getting first element: refactor weird arrow

我是研究僧i 提交于 2020-01-15 06:43:23

问题


I need to get text contents of first <p> which is children of <div class="about">, wrote the following code:

tagTextS :: IOSArrow XmlTree String
tagTextS = getChildren >>> getText >>> arr stripString

parseDescription :: IOSArrow XmlTree String
parseDescription =
  (
   deep (isElem >>> hasName "div" >>> hasAttrValue "id" (== "company_about_full_description"))
   >>> (arr (\x -> x) /> isElem  >>> hasName "p") >. (!! 0) >>> tagTextS
  ) `orElse` (constA "")

Look at this arr (\x -> x) – without it I wasn't be able to reach result.

  • Is there a better way to write parseDescription?
  • Another question is why do I need parentheses before arr and after hasName "p"? (I actually found this solution here)

回答1:


Another proposal using hxt core as you demand.

To enforce the first child, cannot be done through getChildren output, since hxt arrows have a specific (>>>) that maps subsequent arrows to every list item of precedent output and not the output list, as explained in the haskellWiki hxt page although this is an old definition, actually it derives from Category (.) composition.

getNthChild can be hacked from getChildren of Control.Arrow.ArrowTree

import Data.Tree.Class (Tree)
import qualified Data.Tree.Class as T

-- if the nth element does not exist it will return an empty children list

getNthChild :: (ArrowList a, Tree t) => Int -> a (t b) (t b)
getNthChild n = arrL (take 1 . drop n . T.getChildren)

then your parseDescription could take this form:

-- importing Text.XML.HXT.Arrow.XmlArrow (hasName, hasAttrValue)

parseDescription = 
    deep (isElem >>> hasName "div" >>> hasAttrValue "class" (== "about") 
          >>> getNthChild 0 >>> hasName "p"
          ) 
    >>> getChildren >>> getText

Update. I found another way using changeChildren:

getNthChild :: (ArrowTree a, Tree t) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n) >>> getChildren

Update: avoid inter-element spacing-nodes filtering non-element children

import qualified Text.XML.HXT.DOM.XmlNode as XN

getNthChild :: (ArrowTree a, Tree t, XN.XmlNode b) => Int -> a (t b) (t b)
getNthChild n = changeChildren (take 1 . drop n . filter XN.isElem) >>> getChildren



回答2:


It could be something like this with XPath

import "hxt-xpath" Text.XML.HXT.XPath.Arrows (getXPathTrees)

...

xp = "//div[@class='about']/p[1]"

parseDescription = getXPathTrees xp >>> getChildren >>> getText


来源:https://stackoverflow.com/questions/23310769/hxt-getting-first-element-refactor-weird-arrow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!