Should I use Yahoo-Pipes to scrape the contents of a div?

Given:

Wanted:

The innerHtml of div id='foo' must be fetched by the client (i.e. Javascript).
- It will be split into discrete items (i.e. div id='page1' to div id='pageN').
API Throttling prevents server-side code from pre-fetching the data, so the parsing and manipulation burden must be placed on the client.

Question:

Could Yahoo-Pipes help format the data for easier consumption?
- The lack of a DOM parser gives me pause.
Are there any existing pipes that could serve as an example?

You can use the YQL module, which allows you to fetch arbitrary URLs and then parse them with XPath. A sample YQL query:

select * from html where url="http://finance.yahoo.com/q?s=yhoo" and
  xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'

Yes, it's doable with Y! Pipes. You only need two modules from the 'Operators section':

First "Sub Element" to get only the content.

Then just use the "Regex" module to extract the div content and get it through JSON from your site:

Search:

^.*?<div id="foo">(.*?)</div>.*?$

Replace:

来源：https://stackoverflow.com/questions/1095557/should-i-use-yahoo-pipes-to-scrape-the-contents-of-a-div

标签

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!