How do I screen scrape a website and get data within div?

橙三吉。 提交于 2019-12-13 03:24:25

问题


How can I screen scrape a website using cURL and show the data within a specific div?


回答1:


Download the page using cURL (There are a lot of examples in the documentation). Then use a DOM Parser, for example Simple HTML DOM or PHPs DOM to extract the value from the div element.




回答2:


After downloading with cURL use XPath to select the div and extract the content.




回答3:


A possible alternative.

# We will store the web page in a string variable.
var string page

# Read the page into the string variable.
cat "http://www.abczyx.com/path/to/page.ext" > $page

# Output the portion in the third (3rd) instance of "<div...</div>"
stex -r -c "^<div&</div\>^3" $page

This code is in biterscripting. I am using the 3 as sample to extract 3rd div. If you want to extract the div that has say string "ABC", then use this command syntax.

stex -r -c "^<div&ABC&</div\>^" $page

Take a look at this script http://www.biterscripting.com/helppages/SS_ExtractTable.html . It shows how to extract an element (div, table, frame, etc.) when the elements are nested.




回答4:


Fetch the website content using a cURL GET request. There's a code sample on the curl_exec manual page.

Use a regular expression to search for the data you need. There's a code sample on the preg_match manual page, but you'll need to do some reading up on regular expressions to be able to build the pattern you need. As Yacoby mentioned which I hadn't thought of, a better idea may be to examine the DOM of the HTML page using PHP's Simple XML or DOM parser.

Output the information you've found from the regex/parser in the HTML of your page (within the required div.)



来源:https://stackoverflow.com/questions/2523096/how-do-i-screen-scrape-a-website-and-get-data-within-div

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!