mean | 易学教程

pandas 之 groupby 聚合函数

阅读更多关于 pandas 之 groupby 聚合函数

import numpy as np import pandas as pd 聚合函数 Aggregations refer to any data transformation that produces scalar values from arrays(输入是数组, 输出是标量值). The preceding examples have used several of them, including mean, count, min, and sum You may wonder what is going on when you invoke mean() on a GroupBy object, Many common aggregations such as those found in Table 10-1, have optimized implementations. However, you are not limited to only this set of methods. count sum mean median std, var min, max prod first, last You can use aggregations of your own devising and additionally call any method that

Element wise mean of multiple lists in R

阅读更多关于 Element wise mean of multiple lists in R

问题 I have ten huge lists(each list has seven element but elements are huge) and I need to calculate the element wise mean of these lists. So if there are A1, A2, A3,..., A10 lists. I need to calculate : mean1 = mean(A1[[1]], A2[[1]], A3[[1]], ...,A10[[1]]) . . . mean7 = mean(A1[[7]], A2[[7]], A3[[7]], ....A10[[7]]) I have done it with for loop but I wanted to know if there is a better solution provided by R. Thank you in advance. 回答1: Assuming your A s are lists of vectors: Anames <- paste0("A",

Calculate a series of weighted means in R for groups with different weightings

阅读更多关于 Calculate a series of weighted means in R for groups with different weightings

问题 I have the following dataset (simple version of my actual data), 'data', and would like to calculate weighted means for variables x1 and x2, using weightings w1 and w2 respectively, split up into two groups (groups determined by the variable n). data <- data.frame(n = c(1,1,1,2,2,2), x1 = c(4,5,4,7,5,5), x2 = c(7,10,9,NaN,11,12), w1 = c(0,1,1,1,1,1), w2 = c(1,1,1,0,0,1)) I'm trying to do it using with() but get an error when I run this: with(data, aggregate(x = list(x1=x1, x2=x2), by = list(n

Cumulative mean with conditionals

阅读更多关于 Cumulative mean with conditionals

问题 New to R. Small rep of my df: PTS_TeamHome <- c(101,87,94,110,95) PTS_TeamAway <- c(95,89,105,111,121) TeamHome <- c("LAL", "HOU", "SAS", "MIA", "LAL") TeamAway <- c("IND", "LAL", "LAL", "HOU", "NOP") df <- data.frame(cbind(TeamHome, TeamAway,PTS_TeamHome,PTS_TeamAway)) df TeamHome TeamAway PTS_TeamHome PTS_TeamAway LAL IND 101 95 HOU LAL 87 89 SAS LAL 94 105 MIA HOU 110 111 LAL NOP 95 121 Imagine these are the first four games of a season with 1230 games. I want to calculate the cumulative

Conditional mean statement

阅读更多关于 Conditional mean statement

问题 I have a dataset named bwght which contains the variable cigs (cigarattes smoked per day) When I calculate the mean of cigs in the dataset bwght using: mean(bwght$cigs) , I get a number 2.08. Only 212 of the 1388 women in the sample smoke (and 1176 does not smoke): summary(bwght$cigs>0) gives the result: Mode FALSE TRUE NA's logical 1176 212 0 I'm asked to find the average of cigs among the women who smoke (the 212). I'm having a hard time finding the right syntax for excluding the non

Summarize data.table by group

阅读更多关于 Summarize data.table by group

问题 I am working with a huge data table in R containing monthly measurements of temperature for multiple locations, taken by different sources. The dataset looks like this: library(data.table) # Generate random data: loc <- 1:10 dates <- seq(as.Date("2000-01-01"), as.Date("2004-12-31"), by="month") mods <- c("A","B", "C", "D", "E") temp <- runif(length(loc)*length(dates)*length(mods), min=0, max=30) df <- data.table(expand.grid(Location=loc,Date=dates,Model=mods),Temperature=temp) So basically,

Create an array with a pre determined mean and standard deviation

阅读更多关于 Create an array with a pre determined mean and standard deviation

问题 I am attempting to create an array with a predetermined mean and standard deviation value using Numpy. The array needs random numbers within it. So far I can produce an array and calculate the mean and std. but can not get the array to be controlled by the values: import numpy as np x = np.random.randn(1000) print("Average:") mean = x.mean() print(mean) print("Standard deviation:") std = x.std() print(std) How to control the array values through the mean and std? 回答1: Use numpy.random.normal.

“circular” mean in R

阅读更多关于 “circular” mean in R

问题 Given a dataset of months, how do I calculate the "average" month, taking into account that months are circular? months = c(1,1,1,2,3,5,7,9,11,12,12,12) mean(months) ## [1] 6.333333 In this dummy example, the mean should be in January or December. I see that there are packages for circular statistics, but I'm not sure whether they suit my needs here. 回答1: I think months <- c(1,1,1,2,3,5,7,9,11,12,12,12) library("CircStats") conv <- 2*pi/12 ## months -> radians Now convert from months to

09 线性回归及矩阵运算

阅读更多关于 09 线性回归及矩阵运算

线性回归定义：通过一个或者多个自变量与因变量之间进行建模的回归分析。其中可以为一个或者多个自变量之间的线性组合。一元线性回归：涉及到的变量只有一个多元线性回归：变量两个或以上通用公式：h(w) = w0 + w1x1 + w2x2 + ....= wTx 其中w,x 为矩阵：wT=(w0, w1, w2) x=（1，x1, x2)T 回归的应用场景（连续型数据）房价预测销售额预测（广告，研发成本，规模等因素）贷款额度线性关系模型定义：通过属性 (特征) 的线性组合来进行预测的函数： f(x) = w1x1 + w2x2 + w3x3 + ...... + wdxd + b w : weight (权重） b: bias (偏置项）多个特征： (w1:房子的面积， w2:房子的位置 ..) 损失函数（误差）《统计学习方法》 - 算法，策略，优化线性回归，最小二乘法，正规方程 & 梯度下降损失函数（误差大小） yi 为第i个训练样本的真实值 hw(xi)为第i个训练样本特征值组合预测函数（预测值）寻找最优化的w 最小二乘法之正规方程（直接求解到最小值，特征复杂时可能没办法求解）求解：w= (xTx）-1 xTy X 为特征值矩阵，y为目标值矩阵缺点: 特征过于复杂时，求解速度慢最小二乘法之梯度下降使用场景

Conditional cumulative mean for each group in R

阅读更多关于 Conditional cumulative mean for each group in R

问题 I have a data set that looks like this: id a b 1 AA 2 1 AB 5 1 AA 1 2 AB 2 2 AB 4 3 AB 4 3 AB 3 3 AA 1 I need to calculate the cumulative mean for each record within each group and excluding the case where a == 'AA' , So sample output should be: id a b mean 1 AA 2 - 1 AB 5 5 1 AA 1 5 2 AB 2 2 2 AB 4 (4+2)/2 3 AB 4 4 3 AB 3 (4+3)/2 3 AA 1 (4+3)/2 3 AA 4 (4+3)/2 I tried to achieve it using dplyr and cummean by getting an error. df <- df %>% group_by(id) %>% mutate(mean = cummean(b[a != 'AA']))