outliers

Replace outliers by quantiles in R

﹥>﹥吖頭↗ 提交于 2019-12-11 06:52:49
问题 I have been trying to replace outliers 1.5*IQR +/- upper/lower quantile by the upper and lower quantile with the following code: `lower.quantile <- as.numeric(summary(loans$dINC_A)[2]) lower.quantile [1] 9000 upper.quantile <- as.numeric(summary(loans$dINC_A)[5]) > upper.quantile [1] 21240 IQR <- upper.quantile - lower.quantile # I replace outliers by the lower/upper bound values loans$INC_A[ loans$dINC_A < (lower.quantile-1.5*IQR) ] <- lower.quantile loans$INC_A[ loans$dINC_A > (upper

python pandas How to remove outliers from a dataframe and replace with an average value of preceding records

久未见 提交于 2019-12-10 18:13:12
问题 I have a dataframe 16k records and multiple groups of countries and other fields. I have produced an initial output of the a data that looks like the snipit below. Now i need to do some data cleansing, manipulating, remove skews or outliers and replace it with a value based on certain rules. i.e. on the below how could i identify the skewed points (any value greater than 1) and replace them with the average of the next two records or previous record if there no later records.(in that group)

Remove remains in a letter image with Python

﹥>﹥吖頭↗ 提交于 2019-12-10 17:51:42
问题 I have a set of images that represent letters extracted from an image of a word. In some images there are remains of the adjacent letters and I want to eliminate them but I do not know how. Some samples I'm working with openCV and I've tried two ways and none works. With findContours: def is_contour_bad(c): return len(c) < 50 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) edged = cv2.Canny(gray, 50, 100) contours = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

OpenCV Surf and Outliers detection

这一生的挚爱 提交于 2019-12-10 15:54:00
问题 I know there are already several questions with the same subject asked here, but I couldn't find any help. So I want to compare 2 images in order to see how similar they are and I'm using the well known find_obj.cpp demo to extract surf descriptors and then for the matching I use the flannFindPairs. But as you know this method doesn't discard the outliers and I'd like to know the number of true positive matches so I can figure how similar those two images are. I have already seen this

R Language - Sorting data into ranges; averaging; ignore outliers

家住魔仙堡 提交于 2019-12-10 14:42:34
问题 I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers. The data consists of 2 columns WindSpeed and Power , so far I have arrived at importing the data from a CSV file and scatter-plotted the two against each other. What I would like to do next is to sort the data into ranges; for example all data where WindSpeed is between x

How can I use the index-structures in ELKI?

痴心易碎 提交于 2019-12-10 10:28:15
问题 These are quotes form http://elki.dbs.ifi.lmu.de/ : "Essentially, we bind the abstract distance query to a database, and then get a nearest neighbor search for this distance. At this point, ELKI will automatically choose the most appropriate kNN query class. If there exist an appropriate index for our distance function (not every index can accelerate every distance!), it will automatically be used here." "The getKNNForDBID method may boil down to a slow linear scan, but when the database has

Boxplot : Outliers Labels Python

房东的猫 提交于 2019-12-08 03:27:49
问题 I'm making a time series boxplot using seaborn package but I can't put a label on my outliers. My data is a dataFrame of 3 columns : [Month , Id , Value] that we can fake like that : ### Sample Data ### Month = numpy.repeat(numpy.arange(1,11),10) Id = numpy.arange(1,101) Value = numpy.random.randn(100) ### As a pandas DataFrame ### Ts = pandas.DataFrame({'Value' : Value,'Month':Month, 'Id': Id}) ### Time series boxplot ### ax = seaborn.boxplot(x="Month",y="Value",data=Ts) I have one boxplot

Remove unsorted/outlier elements in nearly-sorted array

会有一股神秘感。 提交于 2019-12-07 12:13:01
问题 Given an array like [15, 14, 12, 3, 10, 4, 2, 1] . How can I determine which elements are out of order and remove them (the number 3 in this case). I don't want to sort the list, but detect outliers and remove them. Another example: [13, 12, 4, 9, 8, 6, 7, 3, 2] I want to be able to remove #4 and #7 so that I end up with: [13, 12, 9, 8, 6, 3, 2] There's also a problem that arises when you have this scenario: [15, 13, 12, 7, 10, 5, 4, 3] You could either remove 7 or 10 to make this array

deleting outlier in r with account of nominal var

南笙酒味 提交于 2019-12-06 08:09:22
Say, i have three columns x <- c(-10, 1:6, 50) x1<- c(-20, 1:6, 60) z<- c(1,2,3,4,5,6,7,8) check outliers for x bx <- boxplot(x) bx$out check outliers for x1 bx1 <- boxplot(x1) bx1$out now we must delete outliers x <- x[!(x %in% bx$out)] x x1 <- x1[!(x1 %in% bx1$out)] x1 but we have variable Z(nominal) and we must remove observations, which correspond to the outlier of variables x and x1, in our case it is 1 and 8 obs. of Z How to do it? in output we must have x x1 z Na Na Na 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 Na Na Na Try this solution: x_to_remove<-which(x %in% bx$out) x <- x[!(x %in% bx

R: outlier cleaning for each column in a dataframe by using quantiles 0.05 and 0.95

六眼飞鱼酱① 提交于 2019-12-06 07:49:58
问题 I am a R-novice. I want to do some outlier cleaning and over-all-scaling from 0 to 1 before putting the sample into a random forest. g<-c(1000,60,50,60,50,40,50,60,70,60,40,70,50,60,50,70,10) If i do a simple scaling from 0 - 1 the result would be: > round((g - min(g))/abs(max(g) - min(g)),1) [1] 1.0 0.1 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 So my idea is to replace the values of each column that are greater than the 0.95-quantile with the next value smaller than the 0