问题
I have a json file(input.json) which looks like this :
{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"} (repeat of line 5)
I want to filter out only the unique combinations of each of the values jq. Results should look like:
{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"}
I tried doing group by of header1 with the other headers but it didn't generate unique results.
I've used unique
but that didnt generate the proper results.
How can I get this? Im new to jq and not finding many tutorials on it.
Thanks
回答1:
The sample lines you give are not valid JSON. Since your preamble introduces them as JSON, the following will assume that you intended to present an array of JSON objects.
The question is unclear in several respects, but from the example, it looks as though
unique
might be what you're looking for, so consider:
Invocation: jq -c 'unique[]' input.json
Output:
{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
- If you need the output in some other format, you could do that using jq as well, but the requirements are not so clear, so let's leave that as an exercise :-)
回答2:
Since as peak indicated your input isn't legal JSON I've taken the liberty of correcting it and converting to a list of individual objects:
{"header1":"a","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"b","header2":"2a", "header3":"2a", "header4":"orange"}
{"header1":"c","header2":"1a", "header3":"2a", "header4":"banana"}
{"header1":"d","header2":"2a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
{"header1":"b","header2":"1a", "header3":"2a", "header4":"orange"}
{"header1":"b","header2":"1a", "header3":"1a", "header4":"orange"}
{"header1":"d","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
If this data is in data.json
and you run
jq -M -s -f filter.jq data.json
with the following filter.jq
foreach .[] as $r (
{}
; ($r | map(.)) as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)
it will generate the following output in the original order with no duplicates.
{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
Note that the
($r | map(.))
is used to generate an array containing just the values from each row which is assumed to always produce a unique key path. This is true for the sample data but may not be true for more complex values.
A slower but more robust filter.jq
is
foreach .[] as $r (
{}
; [$r | tojson] as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)
which uses the json representation of the entire row as a unique key to determine if a row has been previously seen.
来源:https://stackoverflow.com/questions/42479103/unique-combinations-of-different-values-in-json-using-jq