outliers

R: How to remove outliers from a smoother in ggplot2?

折月煮酒 提交于 2019-12-21 14:11:12
问题 I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I expect there is already a function to do this, I just have not found it. I've looked at stat_sum_df("median_hilow", geom = "smooth") from some examples in the

Identifying the outliers in a data set in R

末鹿安然 提交于 2019-12-19 04:44:14
问题 So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data set which lie above the number or below the number? 回答1: You can get this using boxplot . If your variable is x, OutVals = boxplot(x)$out which(x %in% OutVals) If you are annoyed by the plot, you could use OutVals = boxplot(x, plot=FALSE)$out 回答2:

Deleting the same outliers in two timeseries

大兔子大兔子 提交于 2019-12-13 01:26:47
问题 I have a question about eliminating outliers from two-time series. One time series includes spot market prices and the other includes power outputs. The two series are from 2012 to 2016 and are both CSV files with the with a timestamp and then a value. As example for the power output: 2012-01-01 00:00:00,2335.2152646951617 and for the price: 2012-01-01 00:00:00,17.2 Because the spot market prices are very volatile and have a lot of outliers, I have filtered them. For the second time series, I

Removing dataframe outliers in R with `boxplot.stats`

吃可爱长大的小学妹 提交于 2019-12-13 00:16:13
问题 I'm relatively new at R, so please bear with me. I'm using the Ames dataset (full description of dataset here; link to dataset download here). I'm trying to create a subset data frame that will allow me to run a linear regression analysis, and I'm trying to remove the outliers using the boxplot.stats function. I created a frame that will include my samples using the following code: regressionFrame <- data.frame(subset(ames_housing_data[,c('SalePrice','GrLivArea','LotArea')] , BldgType ==

How to remove records from dataframe that fall outside variable-specific ranges? [R]

我与影子孤独终老i 提交于 2019-12-12 10:52:35
问题 I have a dataframe and a predictive model that I want to apply to the data. However, I want to filter out records for which the model might not apply very well. To do this, I have another dataframe that contains for every variable the minimum and maximum observed in the training data. I want to remove those records from my new data for which one or more values fall outside the specified range. To make my question clear, this is what my data might look like: id x y ---- ---- --------- 1 2

How to replace outliers with the 5th and 95th percentile values in R

风流意气都作罢 提交于 2019-12-12 08:12:15
问题 I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely. Any advice would be much appreciated, I can't find any information on how to do this anywhere else. 回答1: This would do it. fun <- function(x){ quantiles <- quantile( x, c(.05, .95 ) ) x[ x < quantiles[1] ] <- quantiles[1] x[ x > quantiles[2] ] <-

Changing whisker definition in facet'ed geom_boxplot

蓝咒 提交于 2019-12-12 04:56:58
问题 I created a facet_grid with boxplots of multiple variables. To give an example, the graph can be reproduced by following dummy data require(ggplot2) require(plyr) library(reshape2) set.seed(1234) x<- rnorm(100) y.1<-rnorm(100) y.2<-rnorm(100) y.3<-rnorm(100) y.4<-rnorm(100) df<- (as.data.frame(cbind(x,y.1,y.2,y.3,y.4))) dfmelt<-melt(df, measure.vars = 2:5) and creating the resulting graph as dfmelt$bin <- factor(round_any(dfmelt$x,0.5)) ggplot(dfmelt, aes(x=bin, y=value, fill=variable))+ geom

How to add text to a plotly boxplot in r

只谈情不闲聊 提交于 2019-12-12 04:22:18
问题 I would like to mark the outlier that appears on my chart writing where it is. Is this possible with plotly? The code of my graph is here: library(plotly) set.seed(1234) plot_ly(y = rnorm(50), type = 'box') %>% add_trace(y = rnorm(50, 1)) %>% layout(title = 'Box Plot', xaxis = list(title = "cond", showgrid = F), yaxis = list(title = "rating")) 回答1: It's not clear what you tried and what's not working, but one way to identify outliers is to use boxplot.stats() and then you can use that

Syntax error with tsoutliers package using Nile dataset

谁说胖子不能爱 提交于 2019-12-11 18:11:41
问题 I'm trying to locate outliers in a time series using the tsoutliers package. I'm using the classic Nile dataset (which you can find here: https://vincentarelbundock.github.io/Rdatasets/datasets.html) and I'm unsucessfully getting the tso() function to work. My code is: nile.outliers <- tso(Nile,types = c("AO","LS","TC")) However, I get this syntax error, or what I assume is a syntax error: Error in tso0(x = y, xreg = xreg, cval = cval, delta = delta, n.start = n.start, : trying to get slot "y

number of rows of result is not a multiple of vector length (arg 2) in R

前提是你 提交于 2019-12-11 07:27:24
问题 I have new question related with this my topic deleting outlier in r with account of nominal var. In new case variables x and x1 has different lenght x <- c(-10, 1:6, 50) x1<- c(-20, 1:5, 60) z<- c(1,2,3,4,5,6,7,8) bx <- boxplot(x) bx$out bx1 <- boxplot(x1) bx1$out x<- x[!(x %in% bx$out)] x1 <- x1[!(x1 %in% bx1$out)] x_to_remove<-which(x %in% bx$out) x <- x[!(x %in% bx$out)] x1_to_remove<-which(x1 %in% bx1$out) x1 <- x1[!(x1 %in% bx1$out)] z<-z[-unique(c(x_to_remove,x1_to_remove))] z data