Discard html tags within custom tags while getting text in XHTML using SAX Parser in Groovy

前端 未结 2 858
萌比男神i
萌比男神i 2021-01-24 21:56

So I am trying to get the text between the tags. So far I have been successful. But sometimes when there are special characters or html tags inside my custom tags I am unable to

2条回答
  •  萌比男神i
    2021-01-24 22:19

    Since you have been asking this question now for different libraries, here is a solution with XMLParser. The author of this XML had maybe not the best understanding how XML works. If I where you I'd rather put some filtering in place, to make this sane again (e.g. X to x).

    def xml = '''\
    
        
            Australia
            1.02 Accounting Terms.
        
        
            Isle of Man
            Smallest Street-Legal Car at 99cm wide and 59 kg in weight
        
        
            France
            Most Valuable Car at $15 million
        
    
    '''
    
    def underp = { l ->
        l.inject([texts: [:]]) { r, it ->
            if (it.respondsTo('name') && it.name().endsWith('Begin')) {
                r.texts[(r.last=it.name().replaceFirst(/Begin$/,''))] = ''
            } else if (it.respondsTo('name') && it.name().endsWith('End')) {
                r.last = null
            } else if (r.last) {
                r.texts[r.last] += (it instanceof String) ? it : it.text()
            }
            r
        }.texts
    }
    
    def root = new XmlParser().parseText(xml)
    root.car.each{
        println underp(it.children()).inspect()
    }
    

    prints

    ['ae_definedTermTitle':'Australia', 'ae_clauseTitle':'1.02 Accounting Terms.']
    ['ae_definedTermTitle':'Isle of Man', 'ae_clauseTitle':'Smallest Street-Legal Car at 99cm wide and 59 kg in weight']
    ['ae_definedTermTitle':'France', 'ae_clauseTitle':'Most Valuable Car at $15 million']
    

提交回复
热议问题