问题
I'm reading in an array from a JSON file because I need to perform a reduce on it before turning it into a DataFrame
for further manipulation. For the sake of argument, let's say this is it
a = [Dict("A" => 1, "B" => 1, "C" => "a")
Dict("A" => 1, "B" => 2, "C" => "b")
Dict("A" => 2, "B" => 1, "C" => "b")
Dict("A" => 2, "B" => 2, "C" => "a")]
Now, the reduce I'm performing would be greatly simplified if I could group the array by one or more keys (say, A and C), perform a simpler reduce on each group, and recombine the rows later into a larger array of Dict
s that I can then easily turn into a DataFrame
.
One solution would be to turn this into a DataFrame
, split it into groups, turn individual groups into matrices, do the reduce (with some difficulty, because now I've lost the ability to refer to elements by their name), turn the reduced matrices back into (Sub?)DataFrame
s (with some more difficulty because names), and hope it all comes together nicely into one giant DataFrame
.
Any easier and/or more practical way of doing this?
EDIT Before somebody suggests I look at Query.jl
, the reduce I'm running returns an array, with fewer rows because I'm squashing certain pairs of subsequent rows. If I can do such a thing with Query.jl
, could somebody hint at how, because the documentation isn't exactly clear on how to "aggregate" with anything that doesn't return a single value. Example:
A B C
-----------
1 a
2 1 a
3 b
4 2 b
should group by "C" and turn that table into something like
A B C
-----------
1 1 a
3 2 b
To clarify, the reduce is working, I only want to simplify it by not having to check if a row belongs to the same group of the previous row before doing the squashing.
回答1:
It's still experimental, but SplitApplyCombine.jl might do the trick. You can group arbitrary iterables using any key function you want, and get a key -> group dict out at the end.
julia> ## Pkg.clone("https://github.com/JuliaData/SplitApplyCombine.jl.git")
julia> using SplitApplyCombine
julia> group(x->x["C"], a)
Dict{Any,Array{Dict{String,Any},1}} with 2 entries:
"b" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b")), Dict{String,Any}(Pair{String,Any}("…
"a" => Dict{String,Any}[Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a")), Dict{String,Any}(Pair{String,Any}("…
Then you can use standard [map]reduce
operations (here using the SAC @_
macro for piping):
julia> @_ a |> group(x->x["C"], _) |> values(_) |> reduce(vcat, _)
4-element Array{Dict{String,Any},1}:
Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 1),Pair{String,Any}("C", "b"))
Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 2),Pair{String,Any}("C", "b"))
Dict{String,Any}(Pair{String,Any}("B", 1),Pair{String,Any}("A", 1),Pair{String,Any}("C", "a"))
Dict{String,Any}(Pair{String,Any}("B", 2),Pair{String,Any}("A", 2),Pair{String,Any}("C", "a"))
来源:https://stackoverflow.com/questions/47460140/can-i-group-by-an-array-of-dictionaries-in-julia