问题
I'm trying to fetch all the visible text from a website, I'm using python-scrapy for this work. However what i observe scrapy only works with HTML tags such as div,body,head etc. and not with angular js tags such as ng-view, if there is any element within ng-view tags and when I do a right-click on the page and do view source then the content inside the tag doesn't appear and it displays like <ng-view> </ng-view>
, So how can I use python to scrap the elements within this ng-view tags.Thanks in advance..
回答1:
To answer your question
how can I use python to scrap the elements within this ng-view tags
You can't.
The content you want to scrape renders on the client side(browser), what scrapy get's you is just static content from server, your browser than interprets the HTML code and renders the JS code. And JS code than fetches different content from server again and makes some stuff with it.
Can it be done?
Yes!
One of the ways is to use some sort oh headless browser like http://phantomjs.org/ to fetch all the content. Once you have the content you can save it and scrape it as you wish. The thing is that this kind of web scraping is not as easy and straight forward as just scraping regular HTML. There is a reason why Google still doesn't scrape web pages that render their content via JS.
来源:https://stackoverflow.com/questions/30673447/fetch-text-from-web-with-angular-js-tags-such-as-ng-view