How to parse an HTML string in Google Apps Script without using XmlService?

后端 未结 8 1842
清歌不尽
清歌不尽 2020-12-14 09:12

I want to create a scraper using Google Spreadsheets with Google Apps Script. I know it is possible and I have seen some tutorials and threads about it.

The main ide

8条回答
  •  既然无缘
    2020-12-14 09:39

    I had some good luck today just by massaging the html:

    // close unclosed tags
    html = html.replace(/(<(?=link|meta|br|input)[^>]*)(?/ig, '$1/>')
    // force script / style content into cdata
    html = html.replace(/(<(script|style)[^>]*>)/ig, '$1]*>)/ig, ']]>$1')
    // change & to &
    html = html.replace(/&(?!amp;)/g, '&')
    // now it works! (tested with original url)
    let document = XmlService.parse(html)
    

提交回复
热议问题