apply

Why does apply() return incorrect column types?

我是研究僧i 提交于 2019-12-01 19:00:26
I've recently started using R and the apply() function is tripping me up. I'd appreciate help with this: is.numeric(iris$Sepal.Length) # returns TRUE is.numeric(iris$Sepal.Width) # returns TRUE is.numeric(iris$Petal.Length) # returns TRUE is.numeric(iris$Petal.Width) # returns TRUE but, apply(iris, 2, FUN = is.numeric) returns Sepal.Length Sepal.Width Petal.Length Petal.Width Species FALSE FALSE FALSE FALSE FALSE What's going on? They are all FALSE because apply() coerces iris to a matrix before it applies the is.numeric() function. From help(apply) regarding the first argument, X - If X is

Why do I get an AttributeError when using pandas apply?

十年热恋 提交于 2019-12-01 16:30:14
问题 How should I convert NaN value into categorical value based on condition. I am getting error while trying to convert Nan value. category gender sub-category title health&beauty NaN makeup lipbalm health&beauty women makeup lipstick NaN NaN NaN lipgloss My DataFrame looks like this. And my function to convert NaN values in gender to categorical value looks like def impute_gender(cols): category=cols[0] sub_category=cols[2] gender=cols[1] title=cols[3] if title.str.contains('Lip') and gender

how to create many linear models at once and put the coefficients into a new matrix?

╄→гoц情女王★ 提交于 2019-12-01 14:17:07
I have 365 columns. In each column I have 60 values. I need to know the rate of change over time for each column (slope or linear coefficient). I created a generic column as a series of numbers from 1:60 to represent the 60 corresponding time intervals. I want to create 356 linear regression models using the generic time stamp column with each of the 365 columns of data. In other words, I have many columns and I would like to create many linear regression models at once, extract the coefficients and put those coefficients into a new matrix. First of all, statistically this might not be the

Median imputation using sapply

喜夏-厌秋 提交于 2019-12-01 11:30:35
I want to replace missing values in columns of a dataframe. I have written the following code MedianImpute <- function(data=data) { for(i in 1:ncol(data)) { if(class(data[,i]) %in% c("numeric","integer")) { if(sum(is.na(data[,i]))) { data[is.na(data[,i]),i] <- median(data[,i],na.rm = TRUE) } } } return(data) } This returns the dataframe with the NAs replaced by the column median. I do no want to use for loop, how can I get the same result using any of the apply functions in R? This is actually a subtle problem, so worth a bit of discussion (IMO). You have a data frame and want to impute

Data.table: how to get the blazingly fast subsets it promises and apply to a second data.table

ぃ、小莉子 提交于 2019-12-01 11:16:52
I'm trying to enrich one dataset (adherence) based on subsets from another (lsr). For each individual row in adherence, I want to calculate (as a third column) the medication available for implementing the prescribed regimen. I have a function that returns the relevant result, but it runs for days on just a subset of the total data I have to run it on. The datasets are: library(dplyr) library(tidyr) library(lubridate) library(data.table) adherence <- cbind.data.frame(c("1", "2", "3", "1", "2", "3"), c("2013-01-01", "2013-01-01", "2013-01-01", "2013-02-01", "2013-02-01", "2013-02-01")) names

Calculate correlation by aggregating columns of data frame

邮差的信 提交于 2019-12-01 10:45:12
I have the following data frame: y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) ) How to get a data frame which gives me the correlation between columns a,b and c,d for each row? something like: sapply(y, function(x) {cor(x[2:3],x[4:5])}) Thank you, S You could use apply > apply(y[,-1],1,function(x) cor(x[1:2],x[3:4])) [1] -1 -1 1 -1 1 Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows): > ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d))) group

Median imputation using sapply

只愿长相守 提交于 2019-12-01 09:32:10
问题 I want to replace missing values in columns of a dataframe. I have written the following code MedianImpute <- function(data=data) { for(i in 1:ncol(data)) { if(class(data[,i]) %in% c("numeric","integer")) { if(sum(is.na(data[,i]))) { data[is.na(data[,i]),i] <- median(data[,i],na.rm = TRUE) } } } return(data) } This returns the dataframe with the NAs replaced by the column median. I do no want to use for loop, how can I get the same result using any of the apply functions in R? 回答1: This is

Data.table: how to get the blazingly fast subsets it promises and apply to a second data.table

自作多情 提交于 2019-12-01 09:15:13
问题 I'm trying to enrich one dataset (adherence) based on subsets from another (lsr). For each individual row in adherence, I want to calculate (as a third column) the medication available for implementing the prescribed regimen. I have a function that returns the relevant result, but it runs for days on just a subset of the total data I have to run it on. The datasets are: library(dplyr) library(tidyr) library(lubridate) library(data.table) adherence <- cbind.data.frame(c("1", "2", "3", "1", "2"

rolling computations in xts by month

时间秒杀一切 提交于 2019-12-01 08:52:32
I am familiar with the zoo function rollapply which allows you to do rolling computations on zoo or xts objects and you can specify the rolling increment via the by parameter. I am specifically interested in applying a function every month but using all of the past daily data in the computation. For example say my data set looks like this: dte, val 1/01/2001, 10 1/02/2001, 11 ... 1/31/2001, 2 2/01/2001, 54 2/02/2001, 34 ... 2/30/2001, 29 I would like to select the end of each month and apply a function that uses all the daily data. This doesn't seem like it would work with rollapply since the

Running sum on a column conditional on value

孤人 提交于 2019-12-01 08:23:37
I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions. promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0)) So in other words: if promo.flag is same as previous period then running.total + 1 , else running.total is reset to 1 I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-( The output I need is: promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0) rolling.sum = c(1,2,1,1,1,2,1,2,3,1,1,2,0)