How to parse an HTML string in Google Apps Script without using XmlService?

后端 未结 8 1858
清歌不尽
清歌不尽 2020-12-14 09:12

I want to create a scraper using Google Spreadsheets with Google Apps Script. I know it is possible and I have seen some tutorials and threads about it.

The main ide

8条回答
  •  长情又很酷
    2020-12-14 09:37

    Please be aware that certain web sites may not permit automated scraping of their content, so please consult their terms or service before using Apps Script to extract the content.

    The XmlService only works against valid XML documents, and most HTML (especially HTML5), is not valid XML. A previous version of the XmlService, simply called Xml, allowed for "lenient" parsing, which would allow it to parse HTML as well. This service was sunset in 2013, but for the time being still functions. The reference docs are no longer available, but this old tutorial shows it's usage.

    Another alternative is to use a service like Kimono, which handles the scraping and parsing parts and provides a simple API you can call via UrlFetchApp to retrieve the structured data.

提交回复
热议问题