Extract data from website via PHP

后端未结

关注

 6  544

难免孤独

I am trying to create a simple alert app for some friends.

Basically i want to be able to extract data \"price\" and \"stock availability\" from a webpage like the f

相关标签:

6条回答

无人共我

2020-12-23 16:13

$content = file_get_contents('http://www.sparkfun.com/commerce/product_info.php?products_id=9279');

preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match);
$price = $match[1];

preg_match('#<input type="hidden" name="quantity_on_hand" value="(.*?)">#', $content, $match);
$in_stock = $match[1];

echo "Price: $price - Availability: $in_stock\n";

0 讨论(0)

失恋的感觉

2020-12-23 16:15

What ever you do: Don't use regular expressions to parse HTML or bad things will happen. Use a parser instead.

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2020-12-23 16:17
It's called screen scraping, in case you need to google for it.

I would suggest that you use a dom parser and xpath expressions instead. Feed the HTML through HtmlTidy first, to ensure that it's valid markup.

For example:
```
$html = file_get_contents("http://www.example.com");
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//table[@class="pricing"]/th') as $node) {
  echo $node, "\n";
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

情深已故

2020-12-23 16:17

The simplest method to extract data from Website. I've analysed that my all data is covered within tag only, so I've prepared this one.

<?php
    include(‘simple_html_dom.php’);
        // Create DOM from URL, paste your destined web url in $page 
        $page = ‘http://facebook4free.com/category/facebookstatus/amazing-facebook-status/’;
        $html = new simple_html_dom();

       //Within $html your webpage will be loaded for further operation
        $html->load_file($page);

        // Find all links
        $links = array();
        //Within find() function, I have written h3 so it will simply fetch the content from <h3> tag only. Change as per your requirement.
       foreach($html->find(‘h3′) as $element) 
        {
            $links[] = $element;
        }
        reset($links);
        //$out will be having each of HTML element content you searching for, within that web page
        foreach ($links as $out) 
        {
            echo $out;
        }                

?>

0 讨论(0)

感动是毒

2020-12-23 16:20
1st, asking this question goes too into details. 2nd, extracting data from a website might not be legitimate. However, I have hints:
1. Use Firebug or Chrome/Safari Inspector to explore the HTML content and pattern of interesting information
2. Test your RegEx to see if the match. You may need do it many times (multi-pass parsing/extraction)
3. Write a client via cURL or even much simpler, use file_get_contents (NOTE that some hosting disable loading URLs with file_get_contents)
For me, I'd better use Tidy to convert to valid XHTML and then use XPath to extract data, instead of RegEx. Why? Because XHTML is not regular and XPath is very flexible. You can learn XSLT to transform.

Good luck!
0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-23 16:25

You are probably best off loading the HTML code into a DOM parser like this one and searching for the "pricing" table. However, any kind of scraping you do can break whenever they change their page layout, and is probably illegal without their consent.

The best way, though, would be to talk to the people who run the site, and see whether they have alternative, more reliable forms of data delivery (Web services, RSS, or database exports come to mind).

0 讨论(0)
发布评论:

提交评论
- 加载中...