问题
as per example:
A B C D E F G ∞
|======|=======|=====|=====|=====|=====|=====|=====
1 | |AVERAGE| | | | | |
|======|=======|=====|=====|=====|=====|=====|=====
2 | xx 1 | | 1 | 2 | 0.5 | 10 | |
|======|=======|=====|=====|=====|=====|=====|=====
3 | xx 2 | | 7 | 1 | | | |
|======|=======|=====|=====|=====|=====|=====|=====
4 | | | 0 | | | | |
|======|=======|=====|=====|=====|=====|=====|=====
5 | xx 3 | | 9 | 8 | 7 | 6 | |
|======|=======|=====|=====|=====|=====|=====|=====
6 | xx 4 | | 0 | 1 | 2 | 1 | |
|======|=======|=====|=====|=====|=====|=====|=====
7 | | | 1 | | 4 | | |
|======|=======|=====|=====|=====|=====|=====|=====
8 | xx 5 | | | | | | |
|======|=======|=====|=====|=====|=====|=====|=====
9 | | | | | | | 5 |
|======|=======|=====|=====|=====|=====|=====|=====
∞ | | | | | | | |
what's the most optimal way of getting AVERAGE
for every valid row in the dynamic sense of terms (unknown quantity of rows & unknown quantity of columns) ?
回答1:
QUERY
level 1:
if all 5 cells in range C2:G have values:
=QUERY(QUERY(C2:G, "select (C+D+E+F+G)/5"), "offset 1", )
if not, then rows are skipped:
if empty cells are considered as zeros:
=INDEX(QUERY(QUERY({C2:G*1}, "select (Col1+Col2+Col3+Col4+Col5)/5"), "offset 1", ))
to remove zero values we use IFERROR(1/(1/...))
wrapping:
=INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1},
"select (Col1+Col2+Col3+Col4+Col5)/5"), "offset 1", ))))
to make Col
references dynamic we can do:
=INDEX(IFERROR(1/(1/QUERY(QUERY({C2:G*1},
"select "&
"("&JOIN("+", "Col"&ROW(INDIRECT("1:"&COLUMNS(C:G))))&")/"&COLUMNS(C:G)),
"offset 1", ))))
level 2:
if empty cells are not considered as zeros and shouldn't be skipped:
=INDEX(TRANSPOSE(QUERY(TRANSPOSE(E2:I),
"select "&TEXTJOIN(",", 1, IF(A2:A="",,
"avg(Col"&ROW(A2:A)-ROW(A2)+1&")")))),, 2)
note that this is column A dependant, so missing values in column A will offset the results
fun fact !! we can swap avg
to max
or min
:
to free it from confinement of column A and make it work for any valid row:
=INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
IF(TRIM(TRANSPOSE(QUERY(TRANSPOSE(C2:G),,9^9)))="", C2:G*0, C2:G)),
"select "&TEXTJOIN(",", 1,
"avg(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)
if present 0's in range shouldn't be averaged we can add a small IF statement:
=INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
IF(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C2:G>0, C2:G, )),,9^9)))="", C2:G*0,
IF(C2:G>0, C2:G, ))),
"select "&TEXTJOIN(",", 1,
"avg(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)
here we used so-called "vertical query smash" which takes all values in a given range and concentrates it to one single column, where all cells per each row are joined with empty space as a byproduct:
=FLATTEN(QUERY(TRANSPOSE(C2:G),,9^9))
apart from this, there is also "horizontal query smash":
=QUERY(C2:G,,9^9)
and also "ultimate 360° double query smash" which puts all cells from range into one single cell:
=QUERY(FLATTEN(QUERY(TRANSPOSE(C2:G),,9^9)),,9^9)
and finally "the infamous negative 360° reverse double query smash" which prioritizes columns over rows:
=QUERY(FLATTEN(QUERY(C2:G,,9^9)),,9^9)
all query smash names are copyrighted of course
back to the topic... as mentioned above all cells per row in range are joined with empty space even those empty ones, so we got a situation where we getting double or multiple spaces between values. to fix this we use TRIM
and introduce a simple IF
statement to assign 0 values for empty rows in a given range eg. to counter the offset:
MMULT
level 3:
MMULT
is a kind of heavy class formula that is able to perform addition, subtraction, multiplication, division even running total on arrays/matrixes... however, bigger the dataset = slower the formula calculation (because in MMULT
even empty rows take time to perform + - × ÷
operation) ...unless we use truly dynamic range infinite in both directions...
to get the last row with values of a given range:
=INDEX(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE(
INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))))
to get the last column with values of a given range:
=INDEX(MAX(IF(TRIM(QUERY(INDIRECT("C2:"&ROWS(A:A)),,9^9))="",,COLUMN(C2:2))))
now we can construct it in a simple way:
=INDIRECT("C2:"&ADDRESS(9, 7))
which is the same as:
=INDEX(INDIRECT("C2:"&ADDRESS(MAX(IF(TRIM(FLATTEN(QUERY(TRANSPOSE(
INDIRECT("C2:"&ROWS(A:A))),,9^9)))="",,ROW(A2:A))),
MAX(IF(TRIM(QUERY(INDIRECT("C2:"&ROWS(A:A)),,9^9))="",,COLUMN(C2:2))))))
or shorter alternative:
=INDEX(INDIRECT("C2:"&ADDRESS(
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)),
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2)))))
therefore simplified MMULT formula would be:
=ARRAYFORMULA(IFERROR(
MMULT(N( C2:G9), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)/
MMULT(N(IF(C2:G9<>"", 1, )), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)))
in case we want to exclude zero values from range, the formula would be:
=ARRAYFORMULA(IFERROR(
MMULT(N( C2:G9), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)/
MMULT(N(IF(C2:G9>0, 1, )), ROW(INDIRECT("C1:C"&COLUMNS(C:G)))^0)))
level 4:
putting together all above to make it infinitely dynamic and still restricted to valid dataset:
=INDEX(IFERROR(
MMULT(N( INDIRECT("C2:"&ADDRESS(
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)),
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))), ROW(INDIRECT("C1:C"&
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))-(COLUMN(C2)-1)))^0)/
MMULT(N(IF(INDIRECT("C2:"&ADDRESS(
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*ROW(A2:A)),
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))))<>"", 1, )), ROW(INDIRECT("C1:C"&
MAX((INDIRECT("C2:"&ROWS(A:A))<>"")*COLUMN(C2:2))-(COLUMN(C2)-1)))^0)))
again, not including cells with zeros in range:
honorable mentions:
@Erik Tyler level:
the polar opposite of the previous formula would be to run the MMULT
on
- total area of
C2:?
(all rows, all columns)
instead of - valid area
C2:?
(excluding empty rows and columns)
which avoids mass-calculations of0 × 0 = 0
including zeros:
=INDEX(IFERROR(
MMULT( INDIRECT("C2:"&ROWS(C:C))*1, SEQUENCE(COLUMNS(C2:2))^0)/
MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>"", 1)*1, SEQUENCE(COLUMNS(C2:2))^0)))
excluding zeros:
=INDEX(IFERROR(
MMULT( INDIRECT("C2:"&ROWS(C:C))*1, SEQUENCE(COLUMNS(C2:2))^0)/
MMULT(IF(INDIRECT("C2:"&ROWS(C:C))>0, 1)*1, SEQUENCE(COLUMNS(C2:2))^0)))
@kishkin level:
for a fixed range C2:G9
the MMULT
average would be:
=INDEX(IFERROR(
MMULT( C2:G9*1, FLATTEN(COLUMN(C:G))^0)/
MMULT((C2:G9>0)*1, FLATTEN(COLUMN(C:G))^0)))
=INDEX(IFNA(VLOOKUP(ROW(C2:C),
QUERY(SPLIT(FLATTEN(ROW(C2:C)&"×"&C2:J), "×"),
"select Col1,avg(Col2)
where Col2 is not null
group by Col1"), 2, )))
回答2:
You put a ton of time into this. I hope people appreciate it, more so that you did it for everyone else and not for yourself.
Looking at your final formulas, these should produce the same results (give data in C2:? as in your examples):
In B2 (include zeros):
=ArrayFormula(IFERROR(MMULT(INDIRECT("C2:"&ROWS(C:C))*1,SEQUENCE(COLUMNS(C1:1),1,1,0))/ MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>"",1,0),SEQUENCE(COLUMNS(C1:1),1,1,0))))
In B2 (exclude zeros):
=ArrayFormula(IFERROR(MMULT(INDIRECT("C2:"&ROWS(C:C))*1,SEQUENCE(COLUMNS(C1:1),1,1,0))/ MMULT(IF(INDIRECT("C2:"&ROWS(C:C))<>0,1,0),SEQUENCE(COLUMNS(C1:1),1,1,0))))
回答3:
I will try to make a little addition to @player0's answer. And I will really appreciate any comments on optimizing this.
In case there is a lot of empty rows and columns inside the data range those might as well be excluded from MMULT
.
Step 1 - Filter out empty rows
We've got a data range: from C2
down to the last row and right to the last column (which is J:J
). I will use C2:K
, see details below for explanation.
This formula will give us an array of row numbers where there is at least one non empty cell. Also it will have a 0
if there are empty rows, but it won't matter for searching in this array, or we will filter it out when it does matter:
=ARRAYFORMULA(
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K)))
)
So, to filter out empty rows from the data range we use FILTER
which will check if a row is in our array from above and leave if be in that case:
=ARRAYFORMULA(
FILTER(
C2:K*1,
MATCH(
ROW(C2:K),
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
0
)
)
)
Step 2 - Filter out empty columns
To get an array of only non-empty column numbers we can use almost the same formula:
=ARRAYFORMULA(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2))))
)
Why SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2))
is used instead of COLUMN(C2:K)
see details at the end.
To filter out empty columns we also use FILTER
with MATCH
condition to search for column numbers in our array:
=ARRAYFORMULA(
FILTER(
C2:K*1,
MATCH(
SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
0
)
)
)
And to filter out empty rows and empty columns we just use two FILTER
s:
=ARRAYFORMULA(
FILTER(
FILTER(
C2:K*1,
MATCH(
ROW(C2:K),
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
0
)
),
MATCH(
SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
0
)
)
)
Original data range will internally become:
Step 3 - Do the MMULT
Now we can use MMULT
with that data set to calculate average:
=ARRAYFORMULA(
MMULT(
FILTER(
FILTER(
C2:K*1,
MATCH(
ROW(C2:K),
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
0
)
),
MATCH(
SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
0
)
),
SEQUENCE(
ROWS(
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)
),
1,
1,
0
)
) /
MMULT(
FILTER(
FILTER(
(C2:K <> "")*1,
MATCH(
ROW(C2:K),
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
0
)
),
MATCH(
SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
0
)
),
SEQUENCE(
ROWS(
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)
),
1,
1,
0
)
)
)
It is a bit off regarding original data rows.
Step 4 - Fill the AVERAGE column
To make averages consistent with the original data rows we can use VLOOKUP
like this:
=ARRAYFORMULA(
IFNA(VLOOKUP(
SEQUENCE(MAX((C2:K <> "") * ROW(C2:K)) - 1, 1, ROW(C2)),
{
QUERY(UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))), "WHERE Col1 <> 0"),
MMULT(
...
) /
MMULT(
...
)
},
2,
0
))
)
Where
SEQUENCE(MAX((C2:K <> "") * ROW(C2:K)) - 1, 1, ROW(C2))
is an array of row numbers from the 2nd one to the last none-empty one. We won't be filling all the rows down with empty strings.QUERY(UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))), "WHERE Col1 <> 0")
is an array of non-empty row numbers with that0
filtered out used as keys for search.IFNA
will return an empty string to put alongside an empty data row.
FINAL FORMULA
Putting it all together:
=ARRAYFORMULA(
IFNA(VLOOKUP(
SEQUENCE(MAX((C2:K <> "") * ROW(C2:K)) - 1, 1, ROW(C2)),
{
QUERY(UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))), "WHERE Col1 <> 0"),
MMULT(
FILTER(
FILTER(
C2:K*1,
MATCH(
ROW(C2:K),
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
0
)
),
MATCH(
SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
0
)
),
SEQUENCE(
ROWS(
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)
),
1,
1,
0
)
) /
MMULT(
FILTER(
FILTER(
(C2:K <> "")*1,
MATCH(
ROW(C2:K),
UNIQUE(FLATTEN((C2:K <> "") * ROW(C2:K))),
0
)
),
MATCH(
SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)),
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
0
)
),
SEQUENCE(
ROWS(
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)
),
1,
1,
0
)
)
},
2,
0
))
)
A few details
INDEX
could be used instead ofARRAYFORMULA
for brevity (thanks @player0, taught me that a few months ago), but I like unambiguity ofARRAYFORMULA
.- I use
SEQUENCE
to construct a column or a row of1
s to be explicit, for clarity. For example, this one
SEQUENCE(
ROWS(
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)
),
1,
1,
0
)
could be replaced with
SIGN(
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)
)
which is a bit shorter. There is also a way demonstrated here by @player0 of raising to the power of 0
:
QUERY(
UNIQUE(FLATTEN((C2:K <> "") * SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2)))),
"WHERE Col1 <> 0"
)^0
but (it is just my speculation) I think SEQUENCE
's internal implementation should be simpler then the operation of raising to a power.
- I use range
C2:K
which is one column more than there actually exist on the sheet. Not only it gives a range of all the columns to the right ofC2
and all the rows down from it, but it also updates in case of adding another column to the right of the sheet: a demo. Though it does not get to be highlighted. ThisC2:K
can almost perfectly (there will be a problem in case there is actuallyZZZ
column present on a sheet) replace those approaches:
INDIRECT("C2:" & ROWS(C:C))
OFFSET(C2,,, ROWS(C2:C), COLUMNS(C2:2))
- There is a small drawback in using
C2:K
:=ARRAYFORMULA(COLUMN(C2:K))
will return an array of column numbers even for non-existing ones, so we need to use=SEQUENCE(1, COLUMNS(C2:K), COLUMN(C2))
instead.
回答4:
I think there is a simple answer for row-wise average using VLOOKUP
and QUERY
.
This one is in B2
:
=ARRAYFORMULA(
IFNA(
VLOOKUP(
ROW(B2:B),
QUERY(
{
FLATTEN(ROW(C2:J) + SEQUENCE(1, COLUMNS(C2:J),,)),
FLATTEN(C2:J)
},
"SELECT Col1, AVG(Col2)
WHERE Col2 IS NOT NULL
GROUP BY Col1"
),
2,
0
)
)
)
- This could be easily changed for max, min, sum, count - just change aggregation function inside
QUERY
statement. - Same approach could be used for column-wise aggregation.
FLATTEN(C2:J)
could be changed to:FLATTEN(--C2:J)
to treat empty cells as0
s;FLATTEN(IFERROR(1/(1/C2:J)))
to exclude0
s from average.
- If there are no intermediate empty rows,
VLOOKUP
could be removed from the formula, as well asCol1
fromSELECT
statement.
I use C2:J
range having columns up to I:I
, some details on that:
- Range
C2:J
which is one column more than there actually exist on the sheet. Not only it gives a range of all the columns to the right ofC2
and all the rows down from it, but it also updates in case of adding another column to the right of the sheet: a demo. Though it does not get to be highlighted. ThisC2:J
can almost perfectly (there will be a problem in case there is actuallyZZZ
column present on a sheet) replace those approaches:
INDIRECT("C2:" & ROWS(C:C))
OFFSET(C2,,, ROWS(C2:C), COLUMNS(C2:2))
- There is a small drawback in using
C2:J
:=ARRAYFORMULA(0 * COLUMN(C2:J))
will return an array of column numbers even for non-existing ones (multiplied by0
), so we need to use=SEQUENCE(1, COLUMNS(C2:J),,)
instead.
@player0, any thoughts on this?
来源:https://stackoverflow.com/questions/65435313/arrayformula-of-average-on-infinite-truly-dynamic-range-in-google-sheets