plyr

Subsetting DataFrame in R by duplicate values for Year by lowest value for Rating

我是研究僧i 提交于 2019-12-12 19:23:00
问题 I have a data frame which looks like this > fitchRatings Country Month Year FitchLongTerm LongTermTransformed 1 Abu Dhabi 7 2007 AA 22 2 Angola 5 2012 BB- 12 3 Angola 5 2011 BB- 12 4 Angola 5 2010 B+ 11 5 Argentina 7 2010 B 10 6 Argentina 12 2008 RD 3 7 Argentina 8 2006 RD 3 8 Argentina 12 2005 RD 3 9 Argentina 6 2005 DDD 2 10 Argentina 1 2005 D 0 As you can see, for some Countries, there are multiple observations for a single year. I want to subset the DF so that I keep only one observation

Summarizing data in table by group for each variable in r

我的梦境 提交于 2019-12-12 18:13:14
问题 I have some data that I'd like to properly format with some summary values in R. I've played with aggregate and other things such as summaryBy , but none produced what I wanted to. Here's the data data <- data.frame(id = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48), x1 = c(0.2846,0.3741,0.4208,0.3756,0.3476,0.3664,0.2852,0.3537,0.3116,0.3124,0.364,0.3934,0.3456,0.3034,0.3139,0.2766,0.3034,0.3159,0

replace median for category by condition of three zero before and three after separated by groups in R

烂漫一生 提交于 2019-12-12 10:18:29
问题 Say, i have dataset mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "25481МСК", class = "factor"), item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13164L, 13164L, 13164L, 13164L,

Make regressions and predictions for groups in R

懵懂的女人 提交于 2019-12-12 09:16:04
问题 I have the following data.frame d from an experiment: - Variable y (response, continuous) - Factor f (500 levels) - Time t (posixct) In the last 8 years, y was measured roughly once a month (exact date in t) for each level of f. Sometimes there are 2 measures per month, sometimes a couple of month passed without any measures. Sorry for not providing example data, but making up unregular time series goes beyond my R knowledge. ;) I'd like to do the following with this data: make a regression

Mean of elements in a list of data.frames

喜夏-厌秋 提交于 2019-12-12 07:09:17
问题 Suppose I had a list of data.frames (of equal rows and columns) dat1 <- as.data.frame(matrix(rnorm(25), ncol=5)) dat2 <- as.data.frame(matrix(rnorm(25), ncol=5)) dat3 <- as.data.frame(matrix(rnorm(25), ncol=5)) all.dat <- list(dat1=dat1, dat2=dat2, dat3=dat3) How can I return a single data.frame that is the mean (or sum, etc.) for each element in the data.frames across the list (e.g., mean of first row and first column from lists 1, 2, 3 and so on)? I have tried lapply and ldply in plyr but

Removing duplicate rows with ddply

ⅰ亾dé卋堺 提交于 2019-12-12 04:05:25
问题 I have a dataframe df containing two factor variables (Var and Year) as well as one (in reality several) column with values. df <- structure(list(Var = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), Year = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 3L), .Label = c("2000", "2001", "2002"), class = "factor"), Val = structure(c(1L, 2L, 2L, 4L, 1L, 3L, 3L, 5L, 6L, 6L), .Label = c("2", "3", "4", "5", "8", "9"), class = "factor")), .Names = c

Custom Function not recognized by ddply {plyr}, it tells me that my function is not a function

烈酒焚心 提交于 2019-12-12 03:57:07
问题 I have a matrix called (b2) that contains 3565 rows and 125 columns with only dichotomous values (0 and 1) I designed a function to compare row i and row i+1 and store the number of differences in a new vector. loopPhudcf <- function(x){ ## create a vector to store the results of your for loop output <- as.vector(rep(0, length(x[,1]))) for (i in 1:(nrow(x))-1) { output[i]<-as.vector(table(x[i,]==x[i+1,]))[1] } a<-nrow(x) b<-nrow(x)-1 output<-t(as.matrix(output[c(a,1:b)])) output[output==ncol

calculation of anomalies on time-series

雨燕双飞 提交于 2019-12-12 03:53:26
问题 I'd like to calculate monthly temperature anomalies on a time-series with several stations. I call here "anomaly" the difference of a single value from a mean calculated on a period. My data frame looks like this (let's call it "data"): Station Year Month Temp A 1950 1 15.6 A 1980 1 12.3 A 1990 2 11.4 A 1950 1 15.6 B 1970 1 12.3 B 1977 2 11.4 B 1977 4 18.6 B 1980 1 12.3 B 1990 11 7.4 First, I made a subset with the years comprised between 1980 and 1990: data2 <- subset(data, Year>=1980& Year<

Compare frequencies of samples in r

霸气de小男生 提交于 2019-12-12 03:53:23
问题 I would like to compare the frequency of samples from two different observations. The problem is that the first doesn't contain the whole range of numbers of the second. How could I combine these without writing a for loop sorting them based on the x values returned by count? Here's a MWE for clarification: library(plyr) a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4) b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5) a.count

Subsetting a data frame by ddply, then applying a function with adply on the subset R

放肆的年华 提交于 2019-12-12 02:58:07
问题 I am having some trouble with formulating a logical piece of code using plyr. My problem involves two big dataframes of different lengths, with sample as below: dfSample <- structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L, 33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L, 16L, 75L), .Label = c("Alumni Services", "Anti-Virus and Malware", "Application Integration", "Application Monitoring", "Application Testing", "Audio Visual Support", "Audio Visual Support - CLS",