crawling a html page using php?

青春壹個敷衍的年華 提交于 2019-11-27 15:47:25

Regular expressions work well.

$page = // get the page
$page = preg_split("/\n/", $page);
for ($text in $page) {
    $matches = array();
    preg_match("/^<td>(.*)<\/td>$/", $text, $matches);
    // insert $matches[1] into the database
}

See the documentation for preg_match.

You can use this HTML parsing php library to achieve this :http://simplehtmldom.sourceforge.net/

Gordon

How to parse HTML has been asked and answered countless times before. While (for your specific UseCase) Regular Expressions will work, it is - in general - better and more reliable to use a proper parser for this task. Below is how to do it with DOM:

$dom = new DOMDocument;
$dom->loadHTMLFile('http://courses.westminster.ac.uk/CourseList.aspx');
foreach($dom->getElementsByTagName('td') as $title) {
    echo $title->nodeValue;
}

For inserting the data into MySql, you should use the mysqli extension. Examples are plentiful on StackOverflow. so please use the search function.

I encountered the same problem. Here is a good class library called the html dom http://simplehtmldom.sourceforge.net/. This like jquery

Just for fun, here's a quick shell script to do the same thing.

curl http://courses.westminster.ac.uk/CourseList.aspx \
| sed '/<td>\(.*\)<\/td>/ { s/.*">\(.*\)<\/a>.*/\1/; b }; d;' \
| uniq > courses.txt
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!