aggregate

Pandas groupby(),agg() - how to return results without the multi index?

天大地大妈咪最大 提交于 2019-12-18 03:17:52
问题 I have a dataframe: pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ] Out[67]: EVENT_ID SELECTION_ID ODDS 0 100429300 5297529 18.00 1 100429300 5297529 20.00 2 100429300 5297529 21.00 3 100429300 5297529 22.00 4 100429300 5297529 23.00 5 100429300 5297529 24.00 6 100429300 5297529 25.00 When I use groupby and agg, I get results with a multi-index: pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ) Out[68]: amin amax EVENT_ID SELECTION_ID 100428417 5490293 1

Aggregate Relational Algebra (Maximum)

寵の児 提交于 2019-12-17 22:59:29
问题 I am currently working on a homework assignment that requires a selection to occur that pulls out an element containing a specific attribute of maximum value compared to all other records. I've read a number of sources online that reference an "aggregate" relational algebra function called maximum, but they don't describe how it works using the basic operators. How does one select the attribute containing a maximum value? 回答1: You can very well express aggregate functions with only basic

Aggregate R sum

瘦欲@ 提交于 2019-12-17 20:26:53
问题 I'm writting my first program in R and as a newbie I'm having some troubles, hope you can help me. I've got a data frame like this: > v1<-c(1,1,2,3,3,3,4) > v2<-c(13,5,15,1,2,7,4) > v3<-c(0,3,6,13,8,23,5) > v4<-c(26,25,11,2,8,1,0) > datos<-data.frame(v1,v2,v3,v4) > names(datos)<-c("Position","a1","a2","a3") > datos posicion a1 a2 a3 1 1 13 0 26 2 1 5 3 25 3 2 15 6 11 4 3 1 13 2 5 3 2 8 8 6 3 7 23 1 7 4 4 5 0 What I need is to sum the data in a1 , a2 and a3 (in my real case from a1 to a51 )

Performance of COUNT SQL function

风格不统一 提交于 2019-12-17 18:42:06
问题 I have two choices when writing an SQL statement with the COUNT function. SELECT COUNT(*) FROM <table_name> SELECT COUNT(some_column_name) FROM <table_name> In terms of performance, what is the best SQL statement? Can I obtain some performance gain by using option 1? 回答1: Performance should not matter because they do 2 different aggregates COUNT(*) is all rows, including NULLs COUNT(some_column_name) , excludes NULL in " some_column_name " See the "Count(*) vs Count(1)" question for more 回答2:

How do I concatenate strings in Entity Framework Query?

…衆ロ難τιáo~ 提交于 2019-12-17 18:23:22
问题 How do I concatenate strings in Entity Framework 4 I have a data from a column and I want to save as a string a comma separated string like "value1, value2, value3" Is there a method or an operator do do this in EF4? Example: lets say that I have two columns Fruit and Farms with the following values: Apples Bananas Strawberries If I do like this var dataSource = this.context .Farms .Select(f => new { f.Id, Fruits = string.Join(", ", f.Fruits) }); Sure I will get this error LINQ to Entities

pandas' transform doesn't work sorting groupby output

蓝咒 提交于 2019-12-17 17:32:11
问题 Another pandas question. Reading Wes Mckinney's excellent book about Data Analysis and Pandas, I encountered the following thing that I thought should work: Suppose I have some info about tips. In [119]: tips.head() Out[119]: total_bill tip sex smoker day time size tip_pct 0 16.99 1.01 Female False Sun Dinner 2 0.059447 1 10.34 1.66 Male False Sun Dinner 3 0.160542 2 21.01 3.50 Male False Sun Dinner 3 0.166587 3 23.68 3.31 Male False Sun Dinner 2 0.139780 4 24.59 3.61 Female False Sun Dinner

Summing all columns by group [duplicate]

跟風遠走 提交于 2019-12-17 17:08:16
问题 This question already has answers here : Aggregate / summarize multiple variables per group (e.g. sum, mean) (6 answers) Closed 10 months ago . I'm positive that this is an incredibly easy answer but I can't seem to get my head around aggregating or casting with Multiple conditions I have a table that looks like this: > head(df, n=10L) STATE EVTYPE FATALITIES INJURIES 1 AL TORNADO 0 15 3 AL TORNADO 0 2 4 AL TORNADO 0 2 5 AL TORNADO 0 2 6 AL TORNADO 0 6 7 AL TORNADO 0 1 9 AL TORNADO 1 14 11 AL

How can I use functions returning vectors (like fivenum) with ddply or aggregate?

最后都变了- 提交于 2019-12-17 16:56:14
问题 I would like to split my data frame using a couple of columns and call let's say fivenum on each group. aggregate(Petal.Width ~ Species, iris, function(x) summary(fivenum(x))) The returned value is a data.frame with only 2 columns and the second being a matrix. How can I turn it into normal columns of a data.frame? Update I want something like the following with less code using fivenum ddply(iris, .(Species), summarise, Min = min(Petal.Width), Q1 = quantile(Petal.Width, .25), Med = median

Combining duplicated rows in R and adding new column containing IDs of duplicates

两盒软妹~` 提交于 2019-12-17 16:37:24
问题 I have a data frame that looks like this: Chr start stop ref alt Hom/het ID chr1 5179574 5183384 ref Del Het 719 chr1 5179574 5184738 ref Del Het 915 chr1 5179574 5184738 ref Del Het 951 chr1 5336806 5358384 ref Del Het 376 chr1 5347979 5358384 ref Del Het 228 I would like to merge any duplicate rows, combining the last ID column so that all IDs are in one row/column, like this: Chr start stop ref alt Hom/het ID chr1 5179574 5183384 ref Del Het 719 chr1 5179574 5184738 ref Del Het 915, 951

Aggregate and Weighted Mean in R

僤鯓⒐⒋嵵緔 提交于 2019-12-17 16:28:12
问题 I'm trying to calculate asset-weighted returns by asset class. For the life of me, I can't figure out how to do it using the aggregate command. My data frame looks like this dat <- data.frame(company, fundname, assetclass, return, assets) I'm trying to do something like (don't copy this, it's wrong): aggregate(dat, list(dat$assetclass), weighted.mean, w=(dat$return, dat$assets)) 回答1: For starters, w=(dat$return, dat$assets)) is a syntax error. And plyr makes this a little easier: > set.seed