jq --stream filter on multiple values of same key

前端 未结 1 482
醉梦人生
醉梦人生 2020-12-18 15:01

I am processing a very large JSON wherein I need to filter the inner JSON objects using a value of a key. My JSON looks like as follows:

{\"userActivities\":         


        
相关标签:
1条回答
  • 2020-12-18 15:41

    From the jq Cookbook, let's borrow def atomize(s):

    # Convert an object (presented in streaming form as the stream s) into
    # a stream of single-key objects
    # Examples:
    #   atomize({a:1,b:2}|tostream)
    #   atomize(inputs) (used in conjunction with "jq -n --stream")
    def atomize(s):
      fromstream(foreach s as $in ( {previous:null, emit: null};
          if ($in | length == 2) and ($in|.[0][0]) != .previous and .previous != null
          then {emit: [[.previous]], previous: $in|.[0][0]}
          else { previous: ($in|.[0][0]), emit: null}
          end;
          (.emit // empty), $in) ) ;
    

    Since the top-level object described by the OP contains just one key, we can select the August 2018 objects as follows:

    atomize(1|truncate_stream(inputs))
    | select( .[].localDate[0:7] == "2018-08")
    

    If you want these collected into a composite object, you might have to be careful about memory, so you might want to pipe the selected objects to another program (e.g. awk or jq). Otherwise, I'd go with:

    def add(s): reduce s as $x (null; .+$x);
    
    {"userActivities": add(
        atomize(1|truncate_stream(inputs | select(.[0][0] == "userActivities")))
        | select( .[].localDate[0:7] =="2018-01") ) }
    

    Variation

    If the top-level object has more than one key, then the following variation would be appropriate:

    atomize(1|truncate_stream(inputs | select(.[0][0] == "userActivities")))
    | select( .[].localDate[0:7] =="2018-08")
    
    0 讨论(0)
提交回复
热议问题