weighted-average

Taking np.average while ignoring NaN's?

梦想与她 提交于 2019-12-22 04:54:36
问题 I have a matrix with shape (64,17) correspond to time & latitude. I want to take a weighted latitude average, which I know np.average can do because, unlike np.nanmean, which I used to average the longitudes, weights can be used in the arguments. However, np.average doesn't ignore NaN like np.nanmean does, so my first 5 entries of each row are included in the latitude averaging and make the entire time series full of NaN. Is there a way I can take a weighted average without the NaN's being

Display weighted mean by group in the data.frame

不羁岁月 提交于 2019-12-17 19:38:34
问题 Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to data mining language than programming. I have a data frame with for each individual (observation/row) the income, education level and sample weight. I want to calculate the weighted mean of income by education level, and I want the result to be associated to each individual in a new column of my original data frame, like this: obs income education

How to calculate the weighted average over a cell-array of arrays?

混江龙づ霸主 提交于 2019-12-13 05:40:37
问题 In generalisation of my previous question, how can a weighted average over cell elements (that are and shall remain arrays themselves) be performed? I'd start by modifying gnovice's answer like this: dim = ndims(c{1}); %# Get the number of dimensions for your arrays M = cat(dim+1,c{:}); %# Convert to a (dim+1)-dimensional matrix meanArray = sum(M.*weigth,dim+1)./sum(weigth,dim+1); %# Get the weighted mean across arrays And before that make sure weight has the correct shape. The three cases

Multiple response analysis in weighted survey data using srvyr

ぐ巨炮叔叔 提交于 2019-12-12 23:28:13
问题 I'm trying to analyse a multiple response question from a weighted survey dataset. I like the srvyr package because it allows me to use the dplyr pipes, but I can't find the reference material on how to handle multiple response questions. I have a simple dataset looking at different sources of income. Here's an example of how the data looks like ID <- c(1,2,3,4,5,6,7,8,9,10) rent_income <- c("Yes", "Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "No") salary_income <- c( "No", "Yes", "No"

Vectorize weighted sum matlab

一个人想着一个人 提交于 2019-12-12 02:11:43
问题 I was trying to vectorize a certain weighted sum but couldn't figure out how to do it. I have created a simple minimal working example below. I guess the solution involves either bsxfun or reshape and kronecker products but I still have not managed to get it working. rng(1); N = 200; T1 = 5; T2 = 7; A = rand(N,T1,T2); w1 = rand(T1,1); w2 = rand(T2,1); B = zeros(N,1); for i = 1:N for j1=1:T1 for j2=1:T2 B(i) = B(i) + w1(j1) * w2(j2) * A(i,j1,j2); end end end A = B; 回答1: You could use a

Weighted average value in the presence of NA values

谁说胖子不能爱 提交于 2019-12-11 23:45:49
问题 Here's a very simple example of what I'm dealing with: data_stack <- data.table(CompA_value = c(10,20,30,40), CompB_value = c(60,70,80,80), CompC_value = c(NA, NA, NA, 100), CompA_weight = c(0.2, 0.3,0.4,0.4), CompB_weight = c(0.8,0.7,0.6,0.4), CompC_weight = c(NA, NA, NA,0.2)) CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight 1: 10 60 NA 0.2 0.8 NA 2: 20 70 NA 0.3 0.7 NA 3: 30 80 NA 0.4 0.6 NA 4: 40 80 100 0.4 0.4 0.2 What I want to do is calculate the weighted

python pandas multiply dataframe by weights that vary with category in vectorized fashion

瘦欲@ 提交于 2019-12-11 05:01:39
问题 My problem is very similar to the one outlined here Except for that my main data frame has a category column, as do my weights: df Out[3]: Symbol var_1 var_2 var_3 var_4 Category Index 1903 0.000443 0.006928 0.000000 0.012375 A 1904 -0.000690 -0.007873 0.000171 0.014824 A 1905 -0.001354 0.001545 0.000007 -0.008195 C 1906 -0.001578 0.008796 -0.000164 0.015955 D 1907 -0.001578 0.008796 -0.000164 0.015955 A 1909 -0.001354 0.001545 0.000007 -0.008195 B wgt_df Out[4]: Category var_1_wgt var_2_wgt

Pandas rolling weighted average

痴心易碎 提交于 2019-12-11 00:26:31
问题 I want to apply a weighted rolling average to a large timeseries, set up as a pandas dataframe, where the weights are different for each day. Here's a subset of the dataframe DF: Date v_std vertical 2010-10-01 1.909 545.231 2010-10-02 1.890 538.610 2010-10-03 1.887 542.759 2010-10-04 1.942 545.221 2010-10-05 1.847 536.832 2010-10-06 1.884 538.858 2010-10-07 1.864 538.017 2010-10-08 1.833 540.737 2010-10-09 1.847 537.906 2010-10-10 1.881 538.210 2010-10-11 1.868 544.238 2010-10-12 1.856 534

How to calculate weighted average for 0 values

流过昼夜 提交于 2019-12-10 12:18:42
问题 I am facing an issue in performing a weighted average logic in excel. I am looking at 4 fields for different deliverables: Total, Complete, Pending and Weight. The weighted average for a particular deliverable is calculated as : (Complete/Total) * Weight for example ((5/10) * 0.20) = 10% For each of the deliverable, I have calculated the % and then added all of the % together. Deliverable 1 - 10 = 10% + 20% + 5% + .... = 65% My question is: if for a particular deliverable, the available field

Data structure/algorithm to efficiently save weighted moving average

我怕爱的太早我们不能终老 提交于 2019-12-09 18:50:23
问题 I'd like to sum up moving averages for a number of different categories when storing log records. Imagine a service that saves web server logs one entry at a time. Let's further imagine, we don't have access to the logged records. So we see them once but don't have access to them later on. For different pages, I'd like to know the total number of hits (easy) a "recent" average (like one month or so) a "long term" average (over a year) Is there any clever algorithm/data model that allows to