Extract data from a Google Chrome bookmarks export with PHP

流过昼夜 提交于 2019-12-04 15:23:09
Yahel

Don't use regex, since HTML, even if provided by Chrome, isn't a regular language.

Use an XML Parser, like SimpleXML.

If the string above was $s,

$bookmarks = simplexml_load_string($s);

echo $bookmarks["HREF"]; //URL
echo $bookmarks[0]; //Name

object(SimpleXMLElement)#1 (2) { ["@attributes"]=> array(3) { ["HREF"]=> string(31) "http://snipt.net/public/tag/css" ["ADD_DATE"]=> string(10) "1271801059" ["ICON"]=> string(1026) "data:image/png;base64,iVBh....=" } [0]=> string(64) "Snipt - public - css | Share and store code or command snippets." }

In general, this php-based tutorial on data extraction from html will probably help you:

xpath is definitely something worth being proficient at if you have to work much with html or xml in general. W3schools has good reference material:

Another option (forgoing PHP) is to use jQuery and CSS selectors. I much prefer CSS selectors to xpath for most purposes, and this method allows you to take advantage of the wonderful SelectorGadget tool.

Here's a recent guide: http://blog.dtrejo.com/scraping-made-easy-with-jquery-and-selectorga
Note: They link out to the original jQuerify. There's an actively maintained jQuerify Chrome extension and a newer, better jQuerify.

SelectorGadget is demo'ed starting at about 5:35 in this screencast.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!