Get Large and Small for every group where a column value is equal and return a 3rd column value

房东的猫 提交于 2021-02-11 06:57:22

问题


I have a large set of data (approx. 500,000 rows) with water level information. There are 3 columns.

A: the water level (i.e. 0.7)
B: the date (i.e. 03/01/16)
C: the time (i.e 6:06:00)

I need to get the 2 largest and 2 smallest values from A for every day B and return A & C.

So basically for all the rows where the date is equal find the largest and smaller tell me those values and give me the associated time.

Here is a bit of data:

2.730 | 03/04/16 | 3:54:00
2.734 | 03/04/16 | 3:36:00
2.735 | 03/04/16 | 3:48:00
2.736 | 03/04/16 | 3:42:00
0.046 | 03/05/16 | 10:30:00
0.047 | 03/05/16 | 10:36:00
0.048 | 03/05/16 | 10:24:00
0.050 | 03/05/16 | 10:42:00
0.052 | 03/05/16 | 10:18:00
0.056 | 03/05/16 | 10:48:00

There are approximately 240 rows for every day for 5 years. In the end I want a table with just the highs and lows from every day with the time.

I have tried various solutions like

=LARGE(A2:A241,1)

on column A and

=VLOOKUP(F2,A2:C241,2,FALSE)

to grab the associated data, but I have no idea how to do it for multiple days without manually selecting each day and doing the formula 1826 times. Please Help. Thanks.


回答1:


With such a large data set you want to avoid array formulas, so an approach that limits the lookup range for a Small() or Large() is better.

Consider the screenshot and the results of the formulas. I entered the first date in E2 and used the fill handle to drag down and auto-increment. My dates display in DMY order.

The formulas are

F2 =LARGE(INDEX($A:$A,MATCH($E2,$B:$B,0)):INDEX($A:$A,MATCH($E2,$B:$B,1)),1)

G2 =LARGE(INDEX($A:$A,MATCH($E2,$B:$B,0)):INDEX($A:$A,MATCH($E2,$B:$B,1)),2)

H2 =SMALL(INDEX($A:$A,MATCH($E2,$B:$B,0)):INDEX($A:$A,MATCH($E2,$B:$B,1)),1)

I2 =SMALL(INDEX($A:$A,MATCH($E2,$B:$B,0)):INDEX($A:$A,MATCH($E2,$B:$B,1)),2)

.... copied down. This approach requires that the data is sorted ascending by the dates in column B.

To return the matching value from column C, wrap the formula in an Index on column C with a Match on column A, for example the time for the 2nd Smallest is

=INDEX($C:$C,MATCH(SMALL(INDEX($A:$A,MATCH($E2,$B:$B,0)):INDEX($A:$A,MATCH($E2,$B:$B,1)),2),$A:$A,0))



回答2:


Try this:

In E2 the only Array formula:

=IFERROR(INDEX($B$1:$B$10,MATCH(0,IF(COUNTIF($E$1:$E1,$B$1:$B$10)=0,0,1),0)),"")

This one formula must be confirmed with Ctrl-Shift-Enter.

In F2:

=IF(E2<>"",AGGREGATE(15,6,$A$1:$A$10/($B$1:$B$10=E2),1),"")

In G2:

=IF(E2<>"",INDEX($C$1:$C$10,MATCH(AGGREGATE(15,6,$A$1:$A$10/($B$1:$B$10=E2),1),$A$1:$A$10,0)),"")

In H2:

=IF(E2<>"",AGGREGATE(14,6,$A$1:$A$10/($B$1:$B$10=E2),1),"")

In I2:

=IF(E2<>"",INDEX($C$1:$C$10,MATCH(AGGREGATE(14,6,$A$1:$A$10/($B$1:$B$10=E2),1),$A$1:$A$10,0)),"")

Then copy down. The aggregate function was introduced in 2010.

Then copy the formula down as far as needed. In the picture the formulas are copied down to row 15.

This method does not care if the list is ordered or not.

If it is ordered than I believe that @teylyn answer is a quicker compute.




回答3:


I am adding this as another answer, not trying to provide a solution to the question, but because I want to show my findings about the calculation speed comparison between the Index and the Aggregate approaches provided in previous answers.

Setup:

Excel "data" sheet with 500,000 rows of data, columns "value", "date", "time" Each date is represented multiple times. The data ranges from 1/Jan/2000 to 5/Apr/2014. There are 98 entries for each day.

On another sheet Column E has the 5209 dates, one date per row. Two different formulas are used in columns F and G to find the largest value for each date in the "data" sheet.

Column H compares the output of the two formulas and cell K1 counts the differences in the results. There is no difference. Both formulas provide exactly the same result.

The sheet has 5209 rows with formulas. Each of these formulas evaluates 500,000 rows of data.

The formula in the "Index" columns is

=LARGE(INDEX(data!$A:$A,MATCH($E3,data!$B:$B,0)):INDEX(data!$A:$A,MATCH($E3,data!$B:$B,1)),1)

Note: This approach depends on the source data to be sorted ascending by the date column. Chances are that the data is generated by some monitoring system that will put one reading after the other. Unless there is human or programmatic intervention, I will assume that the data is sorted by date.

The formula in the "Aggregate" column is

=AGGREGATE(14,6,data!$A$2:$A$500000/(data!$B$2:$B$500000=E2),1)

The goal is to work out which formula is more efficient, i.e. calculates faster. I am using code written by Charles Williams, who specializes in performance of formulas and VBA. I have used his Range Timer, as published in this MSDN article.

Here is a screenshot of my setup:

I selected the column with the Index formula and ran the timer three times. The results were in the range of 19 seconds.

I then selected the column with the Aggregate formula and ran the timer. The first pass took 411 seconds, which is 6:52 minutes. The second pass took 425 seconds (7:05 mins).

I did not run a third pass on the Aggregate formula results, because my laptop fan was going into overdrive and getting quite hysterical and high-pitched.

Why am I posting this?

I would like to draw attention to a few things:

  • data samples in questions often contain only a few rows of data
  • hence, formulas suggested here are often only tested on a small data sample, but when used in the real environment may not perform as well as expected.
  • the suggested Index formula looks complicated and is quite a mouthful. The suggested Aggregate formula is much shorter and looks much neater, BUT, the formula with Index performs a LOT better than the formula with Aggregate. Hence: a shorter formula does not always result in a faster calculation

My sample file can be accessed here, if you want to give it a spin. Just be aware that it's around 20 MB, since it has so much data. To run the macro, select a range and click the blue button.



来源:https://stackoverflow.com/questions/35215590/get-large-and-small-for-every-group-where-a-column-value-is-equal-and-return-a-3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!