I'm working with JQ and I absolutely love it so far. I'm running into an issue I've yet to find a solution to anywhere else, though, and wanted to see if the community had a way to do this.
Let's presume we have a JSON file that looks like so:
{"author": "Gary", "text": "Blah"}
{"author": "Larry", "text": "More Blah"}
{"author": "Jerry", "text": "Yet more Blah"}
{"author": "Barry", "text": "Even more Blah"}
{"author": "Teri", "text": "Text on text on text"}
{"author": "Bob", "text": "Another thing to say"}
Now, we want to select rows where the value of author
is equal to either "Gary" OR "Larry", but no other case. In reality, I have several thousand names I'm checking against, so simply stating the direct or conditional (e.g. cat blah.json | jq -r 'select(.author == "Gary" or .author == "Larry")'
) isn't sufficient. I'm trying to do this via the inside
function like so but get an error dialog:
cat blah.json | jq -r 'select(.author | inside(["Gary", "Larry"]))'
jq: error (at <stdin>:1): array (["Gary","La...) and string ("Gary") cannot have their containment checked
What would be the best method for doing something like this?
inside
and contains
are a bit weird. Here are some more straightforward solutions:
index/1
select( .author as $a | ["Gary", "Larry"] | index($a) )
any/2
["Gary", "Larry"] as $whitelist
| select( .author as $a | any( $whitelist[]; . == $a) )
Using a dictionary
If performance is an issue and if "author" is always a string, then a solution along the lines suggested by @JeffMercado should be considered. Here is a variant (to be used with the -n command-line option):
["Gary", "Larry"] as $whitelist
| ($whitelist | map( {(.): true} ) | add) as $dictionary
| inputs
| select($dictionary[.author])
IRC user gnomon answered this on the jq channel as follows:
jq 'select([.author] | inside(["Larry", "Garry", "Jerry"]))'
The intuition behind this approach, as stated by the user was: "Literally your idea, only wrapping .author
as [.author]
to coerce it into being a single-item array so inside()
will work on it." This answer produces the desired result of filtering for a series of names provided in a list as the original question desired.
You can use objects as if they're sets to test for membership. Methods operating on arrays will be inefficient, especially if the array may be huge.
You can build up a set of values prior to reading your input, then use the set to filter your inputs.
$ jq -n --argjson names '["Larry","Garry","Jerry"]' '
(reduce $names[] as $name ({}; .[$name] = true)) as $set
| inputs | select($set[.author])
' blah.json
来源:https://stackoverflow.com/questions/44704404/select-entries-based-on-multiple-values-in-jq