How to read JavaScript object with XPath/HTMLAgilityPack

﹥>﹥吖頭↗ 提交于 2019-12-02 06:38:39

问题


For my crawler project, I need to get product details from JavaScript object.

How can I effectively get object details from the following JavaScript? I Use XPath and HTMLAgilityPack.

<script type="text/javascript">
    var product = {
        identifier: '2051189775',     //PRODUCT ID
        fn: 'Fit- Whiskered Dark Wash Skirt',
        category: ['sale'],
        brand: 'Brand Name',
        price: '22.90',  // this would be the discount price
        amount: '31.80',  // this would be the original price
        currency: 'USD',
        //List can me even more.
    };
</script>

I've not tried getting details from JavaScript objects before. I was getting details directly from HTML for other crawlers.


回答1:


Since the HTML Agility Pack doesn't evaluate any of the contents of the HTML, the javascript code should just be considered plain text. Use the SelectSingleNode method to find the piece of Javascript, then just grab the InnerHtml to get to the contents.

Either find a C# javascript parser (Iron JS for example) or write a parser using standard text manipulation techniques (String.* or Regex to extract the bits you're after.

Once you have the bits between the curly brackets you could parse them using a before mentioned parser or a library like Json.NET, since the pieces between the curly brackets seems to be valid json.



来源:https://stackoverflow.com/questions/17740821/how-to-read-javascript-object-with-xpath-htmlagilitypack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!