how to use a regular expression to extract json fields?

前端 未结 4 1728
天涯浪人
天涯浪人 2020-12-03 02:20

Beginner RegExp question. I have lines of JSON in a textfile, each with slightly different Fields, but there are 3 fields I want to extract for each line if it has it, ignor

4条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-03 02:28

    Why does it have to be a Regular Expression object?

    Here we can just use a Hash object first and then go search it.

    mh = {"url":"http://www.netcharles.com/orwell/essays.htm","domain":"netcharles.com","title":"Orwell Essays & Journalism Section - Charles' George Orwell Links","tags":["orwell","writing","literature","journalism","essays","politics","essay","reference","language","toread"],"index":2931,"time_created":1345419323,"num_saves":24}
    

    The output of which would be

    => {:url=>"http://www.netcharles.com/orwell/essays.htm", :domain=>"netcharles.com", :title=>"Orwell Essays & Journalism Section - Charles' George Orwell Links", :tags=>["orwell", "writing", "literature", "journalism", "essays", "politics", "essay", "reference", "language", "toread"], :index=>2931, :time_created=>1345419323, :num_saves=>24}
    

    Not that I want to avoid using Regexp but don't you think it would be easier to take it a step at a time until your getting the data you want to further search through? Just MHO.

    mh.values_at(:url, :title, :tags)
    

    The output:

    ["http://www.netcharles.com/orwell/essays.htm", "Orwell Essays & Journalism Section - Charles' George Orwell Links", ["orwell", "writing", "literature", "journalism", "essays", "politics", "essay", "reference", "language", "toread"]]
    

    Taking the pattern that FrankieTheKneeman gave you:

    pattern = /"(url|title|tags)":"((\\"|[^"])*)"/i
    

    we can search the mh hash by converting it to a json object.

    /#{pattern}/.match(mh.to_json)
    

    The output:

    => #
    

    Of course this is all done in Ruby which is not a tag that you have but relates I hope.

    But oops! Looks like we can't do all three at once with that pattern so I will do them one at a time just for sake.

    pattern = /"(title)":"((\\"|[^"])*)"/i
    
    /#{pattern}/.match(mh.to_json)
    
    #
    
    pattern = /"(tags)":"((\\"|[^"])*)"/i
    
    /#{pattern}/.match(mh.to_json)
    
    => nil
    

    Sorry about that last one. It will have to be handled differently.

提交回复
热议问题