问题
I am struggling to put a script together to handle the scraping of a javascript rendered web page through Apps Script. Found this How to scrape Javascript rendered websites using Javascript? here, but I don't know how to put this together. Such as load puppeteer. Any help would be appreciated.
回答1:
You can try to scrape the initial HTML, since actually scraping the rendered HTML is extremely hard to do, you'd have to use a headless browser.
There is this library: https://github.com/tautologistics/node-htmlparser which you can use to parse HTML from JavaScript, it is in node, but because it doesn't use any dependencies, you can just copy and paste the functions you need.
Parsing it's not a very easy task I'm afraid.
回答2:
If you're trying to build something like scraping content generated by Javascript, I suggest you respecting the terms of use or trying to find an API.
来源:https://stackoverflow.com/questions/50124981/using-apps-script-to-scrape-javascript-rendered-web-page