Get JSON string from within javascript on a html page using shell script

后端 未结 2 806
刺人心
刺人心 2021-01-24 07:13

There\'s valid json in a javascript on a html page that I want to parse with a shell script. First of all I would like to get the entire json string from { to

相关标签:
2条回答
  • 2021-01-24 07:57

    Usually it is not recommended to use unix command line tools for parsing HTML. But if you know your marker string foo.bar.Processor.message, then you may use this sed + jq solution:

    sed -n 's/foo\.bar\.Processor\.message(\([^)]*\).*/\1/p' file.html |
    jq -r '.head.url | split(";")[1] | split("=")[1]'
    

    347EDAFA2B136D7825745B0A490DE32
    

    In the absence of jq, you may use this sed + gnu grep solution:

    sed -n 's/foo\.bar\.Processor\.message(\([^)]*\).*/\1/p' file.html |
    grep -oP ';barid=\K\w+'
    
    0 讨论(0)
  • 2021-01-24 08:07

    One option might be to use pup, at least for parsing the HTML:

    < input.html pup 'script:not(:empty) text{}' |
      grep foo.bar.Processor.message | grep -o '{.*}' |
      jq -r '.head.url
             | split(";")[]
             | select(test("barid="))
             | sub("barid=";"")'
    

    With your HTML (adjusted to ensure the JSON in the HTML is valid), this produces:

    347EDAFA2B136D7825745B0A490DE32
    

    Of course there are many caveats. YMMV.

    0 讨论(0)
提交回复
热议问题