Unique combinations of different values in json using jq

帅比萌擦擦* 提交于 2019-12-12 03:44:04

问题


I have a json file(input.json) which looks like this :

{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"} (repeat of line 5)

I want to filter out only the unique combinations of each of the values jq. Results should look like:

{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"}

I tried doing group by of header1 with the other headers but it didn't generate unique results. I've used unique but that didnt generate the proper results.

How can I get this? Im new to jq and not finding many tutorials on it.

Thanks


回答1:


  1. The sample lines you give are not valid JSON. Since your preamble introduces them as JSON, the following will assume that you intended to present an array of JSON objects.

  2. The question is unclear in several respects, but from the example, it looks as though unique might be what you're looking for, so consider:

Invocation: jq -c 'unique[]' input.json

Output:

{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
  1. If you need the output in some other format, you could do that using jq as well, but the requirements are not so clear, so let's leave that as an exercise :-)



回答2:


Since as peak indicated your input isn't legal JSON I've taken the liberty of correcting it and converting to a list of individual objects:

{"header1":"a","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"b","header2":"2a", "header3":"2a", "header4":"orange"}
{"header1":"c","header2":"1a", "header3":"2a", "header4":"banana"}
{"header1":"d","header2":"2a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
{"header1":"b","header2":"1a", "header3":"2a", "header4":"orange"}
{"header1":"b","header2":"1a", "header3":"1a", "header4":"orange"}
{"header1":"d","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}

If this data is in data.json and you run

jq -M -s -f filter.jq data.json

with the following filter.jq

foreach .[] as $r (
  {}
; ($r | map(.)) as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)

it will generate the following output in the original order with no duplicates.

{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}

Note that the

($r | map(.))

is used to generate an array containing just the values from each row which is assumed to always produce a unique key path. This is true for the sample data but may not be true for more complex values.

A slower but more robust filter.jq is

foreach .[] as $r (
  {}
; [$r | tojson] as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)

which uses the json representation of the entire row as a unique key to determine if a row has been previously seen.



来源:https://stackoverflow.com/questions/42479103/unique-combinations-of-different-values-in-json-using-jq

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!