Parse HTML using with an Ant Script

后端 未结 4 1553
Happy的楠姐
Happy的楠姐 2020-12-06 14:05

I need to retrieve some values from an HTML file. I need to use Ant so I can use these values in other parts of my script.

Can this even be achieved in Ant?

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-06 14:22

    As stated in the other answers you can't do this in "pure" XML. You need to embed a programming language. My personal favourite is Groovy, it's integration with ANT is excellent.

    Here's a sample which retrieves the logo URL, from the groovy homepage:

    parse:
    
    print:
         [echo] 
         [echo]         Logo URL: http://groovy.codehaus.org/images/groovy-logo-medium.png
         [echo]     
    

    build.xml

    Build uses the ivy plug-in to retrieve all 3rd party dependencies.

    
    
        
            
            
        
    
        
            
    
            
            import org.htmlcleaner.*
    
            def address = 'http://groovy.codehaus.org/'
    
            // Clean any messy HTML
            def cleaner = new HtmlCleaner()
            def node = cleaner.clean(address.toURL())
    
            // Convert from HTML to XML
            def props = cleaner.getProperties()
            def serializer = new SimpleXmlSerializer(props)
            def xml = serializer.getXmlAsString(node)
    
            // Parse the XML into a document we can work with
            def page = new XmlSlurper(false,false).parseText(xml)
    
            // Retrieve the logo URL
            properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
            
        
    
        
            
            Logo URL: ${logo}
            
        
    
    
    

    The parsing logic is pure groovy programming. I love the way you can easily walk the page's DOM tree:

    // Retrieve the logo URL
    properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
    

    ivy.xml

    Ivy is similar to Maven. It manages your dependencies on 3rd party software. Here it's being used to pull down groovy and the HTMLCleaner library the groovy logic is using:

    
        
        
            
        
        
            
            
        
    
    

    How to install ivy

    Ivy is a standard ANT plugin. Download it's jar and place it in one of the following directories:

    $HOME/.ant/lib
    $ANT_HOME/lib
    

    I don't know why the ANT project doesn't ship with ivy.

提交回复
热议问题