outliers

How to replace outliers with NA having a particular range of values in R?

六月ゝ 毕业季﹏ 提交于 2020-01-24 21:51:06
问题 I have climate data and I'm trying to replace outliers with NA . I'm not using boxplot(x)$out is because I have a range of values to be considered to compute the outlier. temp_range <- c(-15, 45) wind_range <- c(0, 15) humidity_range <- c(0, 100) My dataframe looks like this df with outliers (I highlighted values that should be replaced with NA according to ranges.) So temp1 and temp2 outliers must be replaced to NA according to temp_range , wind 's outliers should be replaced to NA according

Remove outlier in Point Cloud

我怕爱的太早我们不能终老 提交于 2020-01-24 21:18:40
问题 With OpenCV/Matlab, I'm computing a disparity map. I use OpenCV SGBM function to get it. The result are good. I got a bit of noise in my image. With medfilt2 in Matlab, I remove a lot of bad pixels. But where the noise is more present than the real data, That create outliers zone (Thing under the plant). I would like to remove all. I'm looking for a better way to do it ? With the median filter, at least the image get less point projected on the ground plane and less point generated in the top

ELKI - input distance matrix

℡╲_俬逩灬. 提交于 2020-01-16 14:46:23
问题 I'm trying to use ELKI for outlier detection ; I have my custom distance matrix and I'm trying to input it to ELKI to perform LOF (for example, in a first time). I try to follow http://elki.dbs.ifi.lmu.de/wiki/HowTo/PrecomputedDistances but it is not very clear to me. What I do: I don't want to load data from database so I use: -dbc DBIDRangeDatabaseConnection -idgen.count 100 (where 100 is the number of objects I'll be analyzing) I use LOF algo and call the external distance file -algorithm

Remove outliers fully from multiple boxplots made with ggplot2 in R and display the boxplots in expanded format

痞子三分冷 提交于 2020-01-11 15:38:11
问题 I have some data here [in a .txt file] which I read into a data frame df, df <- read.table("data.txt", header=T,sep="\t") I remove the negative values in the column x (since I need only positive values) of the df using the following code, yp <- subset(df, x>0) Now I want plot multiple box plots in the same layer. I first melt the data frame df , and the plot which results contains several outliers as shown below. # Melting data frame df df_mlt <-melt(df, id=names(df)[1]) # plotting the

Identify outliers in a dataframe in R

做~自己de王妃 提交于 2020-01-06 06:32:49
问题 Current data frame consists of numerical values. I am identifying outliers in my dataframe column by column, can I identify the outliers in the column at once and remove them in one go? Right now I am changing the values to NA My Code: quantiles<-tapply(var1,names,quantile) minq <- sapply(names, function(x) quantiles[[x]]["25%"]) maxq <- sapply(names, function(x) quantiles[[x]]["75%"]) var1[var1<minq | var1>maxq] <- NA Data. Data posted by the OP in a comment in dput format. df1 <- structure

Identify outliers in a dataframe in R

房东的猫 提交于 2020-01-06 06:32:31
问题 Current data frame consists of numerical values. I am identifying outliers in my dataframe column by column, can I identify the outliers in the column at once and remove them in one go? Right now I am changing the values to NA My Code: quantiles<-tapply(var1,names,quantile) minq <- sapply(names, function(x) quantiles[[x]]["25%"]) maxq <- sapply(names, function(x) quantiles[[x]]["75%"]) var1[var1<minq | var1>maxq] <- NA Data. Data posted by the OP in a comment in dput format. df1 <- structure

Changing outliers for NA in all columns in a dataset in R

风流意气都作罢 提交于 2020-01-03 03:17:08
问题 I'm a beginner with R and can't manage to change outliers for ALL columns in a dataset in R. I succeeded changing one column at a time with dataset$column[dataset$column %in% boxplot.stats(dataset$column)$out] <- NA But I have 21 columns on which I need to change the outliers for NA. How would you do that? How would you do it for a column range? Specific columns? 回答1: You can use apply over the columns. Example: set.seed(1) x = matrix(rnorm(20), ncol = 2) x[2, 1] = 100 x[4, 2] = 200 apply(x,

Finding very large jumps in data

拜拜、爱过 提交于 2020-01-03 01:45:08
问题 I need to find very large jumps only so that I can find clusters and later the noise as well. The sample data is as under: 0.000000 0.000500 0.001500 0.003000 0.005500 0.008700 0.012400 0.000000 0.000500 0.001500 0.003000 0.005500 0.008700 0.012400 0.000000 0.000500 0.001500 0.003000 0.005500 0.008700 0.012400 0.000000 0.000500 0.001500 0.003000 0.005500 0.008700 0.012400 0.000000 0.000500 0.001500 0.003000 0.005500 0.008700 0.012400 0.000000 0.000500 0.001500 0.003000 0.005500 0.008700 0

deleting outlier in r with account of nominal var

我与影子孤独终老i 提交于 2020-01-02 20:09:14
问题 Say, i have three columns x <- c(-10, 1:6, 50) x1<- c(-20, 1:6, 60) z<- c(1,2,3,4,5,6,7,8) check outliers for x bx <- boxplot(x) bx$out check outliers for x1 bx1 <- boxplot(x1) bx1$out now we must delete outliers x <- x[!(x %in% bx$out)] x x1 <- x1[!(x1 %in% bx1$out)] x1 but we have variable Z(nominal) and we must remove observations, which correspond to the outlier of variables x and x1, in our case it is 1 and 8 obs. of Z How to do it? in output we must have x x1 z Na Na Na 1 1 2 2 2 3 3 3

Outlier detection with k-means algorithm

僤鯓⒐⒋嵵緔 提交于 2020-01-01 03:03:48
问题 I am hoping you can help me with my problem. I am trying to detect outliers with use of the kmeans algorithm. First I perform the algorithm and choose those objects as possible outliers which have a big distance to their cluster center. Instead of using the absolute distance I want to use the relative distance, i.e. the ration of absolute distance of the object to the cluster center and the average distance of all objects of the cluster to their cluster center. The code for outlier detection