outliers

Remove unsorted/outlier elements in nearly-sorted array

只谈情不闲聊 提交于 2019-12-06 02:02:01
Given an array like [15, 14, 12, 3, 10, 4, 2, 1] . How can I determine which elements are out of order and remove them (the number 3 in this case). I don't want to sort the list, but detect outliers and remove them. Another example: [13, 12, 4, 9, 8, 6, 7, 3, 2] I want to be able to remove #4 and #7 so that I end up with: [13, 12, 9, 8, 6, 3, 2] There's also a problem that arises when you have this scenario: [15, 13, 12, 7, 10, 5, 4, 3] You could either remove 7 or 10 to make this array sorted. In general, the problem I'm trying to solve, is that given a list of numerical readings (some could

Labeling outliers on boxplot in R

断了今生、忘了曾经 提交于 2019-12-05 00:15:32
问题 I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example: vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F) rownames(vv)=c("one","two","three","four","five","six","seven") boxplot(vv) I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to

ggplot2 Color Scale Over Affected by Outliers

萝らか妹 提交于 2019-12-04 17:48:12
问题 I'm having difficulty with a few outliers making the color scale useless. My data has a Length variable that is based in a range, but will usually have a few much larger values. The below example data has 95 values between 500 and 1500, and 5 values over 50,000. The resulting color legends tend to use 10k, 20k, ... 70k for the color changes when I want to see color changes between 500 and 1500. Really, anything over around 1300 should be the same solid color (probably median +/- mad), but I

R: outlier cleaning for each column in a dataframe by using quantiles 0.05 and 0.95

限于喜欢 提交于 2019-12-04 14:07:35
I am a R-novice. I want to do some outlier cleaning and over-all-scaling from 0 to 1 before putting the sample into a random forest. g<-c(1000,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,10) If i do a simple scaling from 0 - 1 the result would be: > round((g - min(g))/abs(max(g) - min(g)),1) [1] 1.0 0.1 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 So my idea is to replace the values of each column that are greater than the 0.95-quantile with the next value smaller than the 0.95-quantile - and the same for the 0.05-quantile. So the pre-scaled result would be: g<-c(**70**,60,50

R: How to remove outliers from a smoother in ggplot2?

家住魔仙堡 提交于 2019-12-04 07:44:17
I have the following data set that I am trying to plot with ggplot2, it is a time series of three experiments A1, B1 and C1 and each experiment had three replicates. I am trying to add a stat which detects and removes outliers before returning a smoother (mean and variance?). I have written my own outlier function (not shown) but I expect there is already a function to do this, I just have not found it. I've looked at stat_sum_df("median_hilow", geom = "smooth") from some examples in the ggplot2 book, but I didn't understand the help doc from Hmisc to see if it removes outliers or not. Is

Remove outliers (+/- 3 std) and replace with np.nan in Python/pandas

非 Y 不嫁゛ 提交于 2019-12-04 01:46:40
问题 I have seen several solutions that come close to solving my problem link1 link2 but they have not helped me succeed thus far. I believe that the following solution is what I need, but continue to get an error (and I don't have the reputation points to comment/question on it): link (I get the following error, but I don't understand where to .copy() or add an " inplace=True " when administering the following command df2=df.groupby('install_site').transform(replace) : SettingWithCopyWarning: A

Labeling outliers on boxplot in R

Deadly 提交于 2019-12-03 16:12:04
I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example: vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F) rownames(vv)=c("one","two","three","four","five","six","seven") boxplot(vv) I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to this asked but none seemed to have worked the way I want it to. 42- In the example given it's a bit boring

How to replace outliers with the 5th and 95th percentile values in R

流过昼夜 提交于 2019-12-03 15:47:08
I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely. Any advice would be much appreciated, I can't find any information on how to do this anywhere else. This would do it. fun <- function(x){ quantiles <- quantile( x, c(.05, .95 ) ) x[ x < quantiles[1] ] <- quantiles[1] x[ x > quantiles[2] ] <- quantiles[2] x } fun( yourdata ) You can do it in one line of code using squish() : d2 <- squish(d, quantile(d, c(

Outlier detection of time series data in R [closed]

江枫思渺然 提交于 2019-12-03 15:04:23
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . What are the steps needed to detect outliers in business sales data (which means there might be trends and seasonality) in R? I have learnt about ACF, PACF, residual, ARIMA model (basically, time series analysis and modelling). Can I use this knowledge to help me identify outliers? Is it also possible to ask R

How to remove outliers in boxplot in R? [duplicate]

房东的猫 提交于 2019-12-03 07:44:14
问题 This question already has answers here : Closed 6 years ago . Possible Duplicate: Changing the outlier rule in a boxplot I need to visualize my result using box-plot. x<-rnorm(10000) boxplot(x,horizontal=TRUE,axes=FALSE) How can i filter outliers during visualisation? (1) So that i can have full image on screen without having ugly outliers. http://postimage.org/image/szzbez0h1/a610666d/ (2) Is there any way to show outliers upto certain range? http://postimage.org/image/np28oee0b/8251d102/