calculating the outliers in R

前端未结

关注

 5  1627

慢半拍i 2020-12-05 01:21

I have a data frame like this:

Team 01/01/2012  01/02/2012  01/03/2012  01/01/2012 01/04/2012 SD Mean
A     100         50           40        NA


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   眼角桃花
                                             
                
                
                (楼主)
            
              
              
                2020-12-05 01:54
              

            
            
                        
I have seen that you've asked some questions on doing things by row. You should avoid that. R follows the concept that columns represent variables and rows represent observations. Many functions are optimized according to this concept. If you need a wide or transposed output to a file you can rearrange your data just before writing to the file.

I assume that your data actually looks as shown in the question, but that you have more than one row.

df <- read.table(text="Team 01/01/2012  01/02/2012  01/03/2012  01/01/2012 01/04/2012 SD 

Mean
A     100         50           40        NA         30       60  80
B     200         40           5         8          NA       NA  NA",check.names = FALSE,header=TRUE)

#needed because one date appears twice
df <- df[,]

#reshape the data
library(reshape2)
df <- melt(df,id="Team")
names(df)[2] <- "Date"

#remove the SD and Mean
df <- df[!df$Date %in% c("SD","Mean"),]

#function to detect outliers
outfun <- function(x) {
  abs(x-mean(x,na.rm=TRUE)) > 3*sd(x,na.rm=TRUE)
}

#test if function works
outfun(c(200,rnorm(10)))

#use function over all data
df3$outlier.all <- outfun(df3$value)

#apply function for each team 
library(plyr)
df3 <- ddply(df3,.(Team),transform,outlier.team=outfun(value))


Result:

           Date Team value outlier.all outlier.team
1    01/01/2012    A   100       FALSE        FALSE
2    01/02/2012    A    50       FALSE        FALSE
3    01/03/2012    A    40       FALSE        FALSE
4  01/01/2012.1    A    NA          NA           NA
5    01/04/2012    A    30       FALSE        FALSE
6    01/01/2012    B   200       FALSE        FALSE
7    01/02/2012    B    40       FALSE        FALSE
8    01/03/2012    B     5       FALSE        FALSE
9  01/01/2012.1    B     8       FALSE        FALSE
10   01/04/2012    B    NA          NA           NA

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复