plyr

How can I use variable names to refer to data frame columns with ddply?

痞子三分冷 提交于 2019-12-13 11:49:31
问题 I am trying to write a function that takes as arguments the name of a data frame holding time series data and the name of a column in that data frame. The function performs various manipulations on that data, one of which is adding a running total for each year in a column. I am using plyr. When I use the name of the column directly with ddply and cumsum I have no problems: require(plyr) df <- data.frame(date = seq(as.Date("2007/1/1"), by = "month", length.out = 60), sales = runif(60, min =

%dopar% in R does not work properly

醉酒当歌 提交于 2019-12-13 10:33:32
问题 I just start to use the foreach and %dopar% methodes for parallel processing in R , but the results I'm getting are confusing and not the same as a for loop; here is the code I used to test those methodes and resultes I'm getting: library(plyr); library(doParallel); library(foreach) cs <- makeCluster(2) registerDoParallel(cs) sfor_start <- Sys.time() s_for=as.numeric() for (i in 1:1000) { s_for[i] = sqrt(i) } print(Sys.time() - sfor_start) sdopar_start <- Sys.time() sdopar=as.numeric()

R data subset restructuring [closed]

a 夏天 提交于 2019-12-13 09:47:58
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . I am fairly new to R/Rstudio and I am still learning how to do certain operations. I have the following data set. For columns I have Operating Region, type of element(CA,OBU), sub-element and Net Revenue. Currently the data is quite big(50 000 rows) and I want to get a summary of

dplyr Error: length(rows) == 1 is not TRUE in R

被刻印的时光 ゝ 提交于 2019-12-13 09:15:33
问题 As some background, the data I'm working with is from ranking top 3 of certain variables. I need to be able to count the 1s, 2s,3s, and the NAs (# ppl who did not include it in the top 3). I have my data frame LikelyRenew_ReasonB and I used dplyr to filter for a particular year and status, which works correctly/no errors. LikelyRenew_ReasonB <- LikelyRenew_Reason %>% filter(year ==1, status ==2) > LikelyRenew_ReasonB cost products commun reimburse policy discount status year 1 NA NA NA NA NA

Using R's plyr package to reorder groups within a dataframe

ε祈祈猫儿з 提交于 2019-12-13 04:43:22
问题 I have a data reorganization task that I think could be handled by R 's plyr package. I have a dataframe with numeric data organized in groups. Within each group I need to have the data sorted largest to smallest. The data looks like this (code to generate below) group value 2 b 0.1408790 6 b 1.1450040 #2nd b is smaller than 1st 1 c 5.7433568 3 c 2.2109819 4 d 0.5384659 5 d 4.5382979 What I would like is this. group value b 1.1450040 #1st b is largest b 0.1408790 c 5.7433568 c 2.2109819 d 4

Replacing NA depending on distribution type of gender in R

笑着哭i 提交于 2019-12-13 03:04:46
问题 When i selected NA value here data[data=="na"] <- NA data[!complete.cases(data),] i must replace it, but depending on type of distribution. If using Shapiro.test the distribution by variables not normal, then missing value must be replace by median, If it's normal, than replace by mean. But distribution for each gender(1 girl, 2 -man) data=structure(list(sex = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), emotion = c(20L, 15L, 49L, NA, 34L, 35L, 54L, 45L), IQ = c(101L, 98L, 105L, NA, 123L, 120L, 115L,

R function contain plyr--ddply(): parameters in ddply() cannot be past correctly

你离开我真会死。 提交于 2019-12-13 02:57:02
问题 my data as follows: >df2 id calmonth product 1 101 01 apple 2 102 01 apple&nokia&htc 3 103 01 htc 4 104 01 apple&htc 5 104 02 nokia para=c('apple','htc','nokia') I wanna get the number of ids who's product contain apple&htc,apple&nokia ,etc. I make a function as follows: xandy=function(a,b){ ddply(df2,.(calmonth),summarise, csum=length(grep(paste0('apple','.*','htc'),product)), coproduct=paste0('apple','&','htc') ) } This function give me a perfect result as follows: > xandy(para[1],para[3])

Select (multiple) integers with n occurrences per row

旧时模样 提交于 2019-12-13 01:26:47
问题 I have a data.frame where the data entries are entered in this format 1,2,3,10 . That is, they are comma separated integers that range from 0-20, and do not need to be consecutive. Each is currently considered a factor. I have four variables that contain these values, and I'd like to create a new variable, that includes a given integer only if it is in three of the the four variables, if there are not three occurrences of an integer, then use 0. M1 M2 M3 M4 M_NEW 1 1,2 0 1 1 3,4 3,4 1,2,3,4 4

Extract & combine multiple substrings using multiple patterns from some but not all strings contained in list & return to list in R

匆匆过客 提交于 2019-12-13 01:14:58
问题 I'd like to find an elegant and easily manipulable way to: extract multiple substrings from some, but not all, strings that are contained as elements of a list (each list element consists of just one long string) replace the respective original long string with these multiple substrings collapse the substrings in each list element into 1 string return a list of same length containing the replacement substrings and the untouched long strings as appropriate. This question is a follow-on (though

mapvalues in plyr gives unexpected output when “to” argument is a factor…is it a bug?

江枫思渺然 提交于 2019-12-13 00:34:38
问题 When I use mapvalues in the plyr package (plyr v1.8, R v2.15.1 Roasted Marshmallows), I get an odd result when the "to" argument is a factor. For example, v1 = c(1,2,2,1,2) mapvalues(v1, from = c(1, 2), to = factor( c('A', 'B') ) ) returns [1] 1 2 2 1 2 instead of [1] A B B A B Levels: A B To me it looks like it might be a bug, but I wanted to check with other people before bothering the developer. Is this a bug? 回答1: This most likely isn't a bug. Factors are stored internally as integers. If