calculating the outliers in R

前端 未结 5 1627
慢半拍i
慢半拍i 2020-12-05 01:21

I have a data frame like this:

x

Team 01/01/2012  01/02/2012  01/03/2012  01/01/2012 01/04/2012 SD Mean
A     100         50           40        NA           


        
5条回答
  •  眼角桃花
    2020-12-05 01:54

    I have seen that you've asked some questions on doing things by row. You should avoid that. R follows the concept that columns represent variables and rows represent observations. Many functions are optimized according to this concept. If you need a wide or transposed output to a file you can rearrange your data just before writing to the file.

    I assume that your data actually looks as shown in the question, but that you have more than one row.

    df <- read.table(text="Team 01/01/2012  01/02/2012  01/03/2012  01/01/2012 01/04/2012 SD 
    
    Mean
    A     100         50           40        NA         30       60  80
    B     200         40           5         8          NA       NA  NA",check.names = FALSE,header=TRUE)
    
    #needed because one date appears twice
    df <- df[,]
    
    #reshape the data
    library(reshape2)
    df <- melt(df,id="Team")
    names(df)[2] <- "Date"
    
    #remove the SD and Mean
    df <- df[!df$Date %in% c("SD","Mean"),]
    
    #function to detect outliers
    outfun <- function(x) {
      abs(x-mean(x,na.rm=TRUE)) > 3*sd(x,na.rm=TRUE)
    }
    
    #test if function works
    outfun(c(200,rnorm(10)))
    
    #use function over all data
    df3$outlier.all <- outfun(df3$value)
    
    #apply function for each team 
    library(plyr)
    df3 <- ddply(df3,.(Team),transform,outlier.team=outfun(value))
    

    Result:

               Date Team value outlier.all outlier.team
    1    01/01/2012    A   100       FALSE        FALSE
    2    01/02/2012    A    50       FALSE        FALSE
    3    01/03/2012    A    40       FALSE        FALSE
    4  01/01/2012.1    A    NA          NA           NA
    5    01/04/2012    A    30       FALSE        FALSE
    6    01/01/2012    B   200       FALSE        FALSE
    7    01/02/2012    B    40       FALSE        FALSE
    8    01/03/2012    B     5       FALSE        FALSE
    9  01/01/2012.1    B     8       FALSE        FALSE
    10   01/04/2012    B    NA          NA           NA
    

提交回复
热议问题