simple-html-dom | 易学教程

getting an error reading simpleDomObject

阅读更多关于 getting an error reading simpleDomObject

问题 I have the following template file, named 'test.html' <div class='title'>TEST</div> And I have the following PHP code: <? include "simplehtmldom/simple_html_dom.php"; $dom = file_get_html( "test.html" ); echo $dom->outertext; ?> So far so good, this displays the file test.html. But when I try to change something I get an error: <? include "simplehtmldom/simple_html_dom.php"; $dom = file_get_html( "test.html" ); $dom->find('.title')->innertext = "changed"; echo $dom->outertext; ?> Warning :

PHP Simple HTML DOM Scrape External URL

阅读更多关于 PHP Simple HTML DOM Scrape External URL

问题 I'm trying to build a personal project of mine, however I'm a bit stuck when using the Simple HTML DOM class. What I'd like to do is scrape a website and retrieve all the content, and it's inner html, that matches a certain class. My code so far is: <?php error_reporting(E_ALL); include_once("simple_html_dom.php"); //use curl to get html content $url = 'http://www.peopleperhour.com/freelance-seo-jobs'; $html = file_get_html($url); //Get all data inside the <div class="item-list"> foreach(

Parsing html page that has two different format on the same elements

阅读更多关于 Parsing html page that has two different format on the same elements

问题 In the same html pageThere're two different format of the same contain : the first is : <div class="gs"><h3 class="gsr"><a href="http://www.example1.com/">title1</a> the second is : <div class="gs"><h3 class="gsr"><span class="gsc"></span><a href="http://www.example2.com/">title2</a> How to get links and titles in one code that can handle that two different format with simple_html_dom? I've tried this code, but it doesn't work : foreach($html->find('h3[class=gsr]') as $docLink){ $link =

PHP - Simple HTML DOM Parser - Table Issue

阅读更多关于 PHP - Simple HTML DOM Parser - Table Issue

问题 I'm receiving some data from cURL and want to grab informations so i can save in another database. The result of the cURL is a hole html page, so i'm using Simple HTML DOM Parser to get what i want. The problem is, i want the values of a table, but i'm getting just the tittles. Here's the page: <div id="conteudo"> <body> <div id="tab"> <ul> <li><a href="#tabs-a">Test1</a></li> <li><a href="#tabs-b">Test2</a></li> <li><a href="#tabs-c">Test3</a></li> </ul> <div id="tabs-1"> <div> <table id="d1

Extract doctype with simple_html_dom

阅读更多关于 Extract doctype with simple_html_dom

问题 I am using simple_html_dom to parse a website. Is there a way to extract the doctype? 回答1: You can use file_get_contents function to get all HTML data from website. For example <?php $html = file_get_contents("http://google.com"); $html = str_replace("\n","",$html); $get_doctype = preg_match_all("/(<!DOCTYPE.+\">)<html/i",$html,$matches); $doctype = $matches[1][0]; ?> 回答2: You can use $html->find('unknown') . This works - at least - in version 1.11 of the simplehtmldom library. I use it as

simplehtmldom - SSL operation failed with code 1. OpenSSL Error messages

阅读更多关于 simplehtmldom - SSL operation failed with code 1. OpenSSL Error messages

问题 I'm using http://simplehtmldom.sourceforge.net/ and file_get_contents() in my webApp. The file_get_contents() work fine on localhost. But when upload webApp on server(Windows server 2012 r2) i get this error. How to fix this error? > Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed in E:\cfnic.com\includes\class\PHP_Simple_HTML_DOM_Parser.php on line 75 Warning: file_get

Simple HTML Dom - find text between divs

阅读更多关于 Simple HTML Dom - find text between divs

问题 I need to extract the text in between divs here ("The third of four...") - using Simple HTML Dom PHP library. I have tried everything I think! next_sibling() returns the comment, and next_sibling()->next_sibling() returns the <br/> tag. Ideally I would like to get all the text from the end of the first comment and to the next </div> tag. <div class="left"> Bla-bla.. <div class="float">Bla-bla... </div> <br />The third of four performances in the Society's Morning

Can't separate cells properly with simplehtmldom

阅读更多关于 Can't separate cells properly with simplehtmldom

问题 I am trying to write a web scraper. I want to get all the cells in a row. The row before the one I want has THOROUGHBRED MEETINGS as its plain text value. I can successfully get this row. But I can't figure out how to get the next row's children which are the cells or <td> tags. if ($foundTag = FindTagByText("THOROUGHBRED MEETINGS", $html)) { $cell = $foundTag->parent(); $row = $cell->parent(); $nextRow = $row->next_sibling(); echo "Row: ".$row->plaintext."<br />\n"; echo "Next Row: ".

How to workaround PHP advanced html dom's conversion of entities?

阅读更多关于 How to workaround PHP advanced html dom's conversion of entities?

问题 How can I workaround advanced_html_dom.php str_get_html's conversion of HTML entities, short of applying htmlentities() on every element content? Despite http://archive.is/YWKYp#selection-971.0-979.95 The goal of this project is to be a DOM-based drop-in replacement for PHP's simple html dom library. ... If you use file/str_get_html then you don't need to change anything. I find on include 'simple_html_dom.php'; $set = str_get_html('<html><title> </title></html>'); echo ($set->find('title',0)

Removing unwanted elements from table simple_html_dom

阅读更多关于 Removing unwanted elements from table simple_html_dom

问题 I am fetching a page that is a page with some style tags, table and other non vital content. I'm storing this in a transient, and fetching it all with AJAX $result_match = file_get_contents( 'www.example.com' ); set_transient( 'match_results_details', $result_match, 60 * 60 * 12 ); $match_results = get_transient( 'match_results_details' ); if ( $match_results != '') { $html = new simple_html_dom(); $html->load($match_results); $out = ''; $out .= '<div class="match_info_container">'; if (