问题
I'm OK skills-wise at selecting all sorts of HTML content. So all confident creating some code that should be ripping content of a site I stumbled across some strange JavaScript code where the source puts its prices in.
<script>
var productConfig = {"attributes":{"178":{"id":"178","code":"bp_flavour","label":"Smaak","options":[{"id":"28","label":"Aardbeien","oldPrice":"0","products":["2292","2294","2296","2702"]}
.... more gibberish and than 4 of each product variation: (so like 80 different lines like this:)
,"childProducts":{
"2292":"price":"64.99","finalPrice":"64.99","no_of_servings":"166","178":"27","179":"34"},
"2292":"price":"17.99","finalPrice":"17.99","no_of_servings":"33","178":"28","179":"25"}
}
</script>
Apparently 2292 is the id of the product at hand. I would like to read out the "finalPrice".
My PHP code:
$file = $this->curl_get_file_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($file);
$doc->preserveWhiteSpace = false;
$finder = new DomXPath($doc);
$price_query = $finder->query("//script[contains(.,'finalPrice')]");
$price_raw = $price_query->item(0)->nodeValue;
However my query //script[contains(.,"finalPrice")] blasts out the whole script I cant find a way to dig deeper and more specifically in the JavaScript. Does anyone know more/could give me a hint?
回答1:
You may try regular expression:
preg_match_all("/finalPrice\\":\\"([0-9.]{1,10})\\"/", $page_html, $output_array);
回答2:
You can read properties from object like this.
var obj = {"2292":{"price":"64.99","finalPrice":"64.99","no_of_servings":"166","178":"27","179":"34"}};
obj['2292']['finalPrice']
回答3:
So what I did: read out the script with the provided XPATH query. Than: strstr till i got the json parts I wanted. Next up was: PHP's json_decode function. Puts it in an array than searched the arrays for what i wanted. This is my code for the parsing:
$price_query = $finder->query("//script[contains(.,'finalPrice')]");
$price_raw = $price_query->item(0)->nodeValue;
$price_1 = strstr($price_raw, "childProducts");
$price_2 = str_replace('childProducts":', '', $price_1);
$price_3 = strstr($price_2, ',"priceFromLabel"', true);
$price_data = json_decode($price_3, true);
Looks like crap with the str str but works. Thanks all for your thoughts. json_decode ftw!
来源:https://stackoverflow.com/questions/31718783/can-xpath-be-used-to-search-a-script-block