outliers | 易学教程

Labeling one class for cross validation in libsvm matlab

阅读更多关于 Labeling one class for cross validation in libsvm matlab

问题 I want to use one-class classification using LibSVM in MATLAB. I want to train data and use cross validation, but I don't know what I have to do to label the outliers. If for example I have this data: trainData = [1,1,1; 1,1,2; 1,1,1.5; 1,1.5,1; 20,2,3; 2,20,2; 2,20,5; 20,2,2]; labelTrainData = [-1 -1 -1 -1 0 0 0 0]; (The first four are examples of the 1 class, the other four are examples of outliers, just for the cross validation) And I train the model using this: model = svmtrain

How to detect outliers in an ArrayList

阅读更多关于 How to detect outliers in an ArrayList

I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values." Example: 100 105 102 13 104 22 101 How would I be able to write the code to detect that (in this case) 13 and 22 don't fall within the "good values" of around 100? There are several criteria for detecting outliers. The simplest ones, like Chauvenet's criterion , use the mean and standard deviation calculated from the sample to determine a "normal" range for values. Any value outside of this range is deemed an outlier. Other criterions are Grubb's

How to detect outliers in an ArrayList

阅读更多关于 How to detect outliers in an ArrayList

问题 I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values." Example: 100 105 102 13 104 22 101 How would I be able to write the code to detect that (in this case) 13 and 22 don't fall within the "good values" of around 100? 回答1: There are several criteria for detecting outliers. The simplest ones, like Chauvenet's criterion, use the mean and standard deviation calculated from the sample to determine a

How to group boxplot outliers in gnuplot

阅读更多关于 How to group boxplot outliers in gnuplot

I have a large set of data points. I try to plot them with a boxplot, but some of the outliers are the exact same value and they are represented on a line beside each other. I found How to set the horizontal distance between outliers in gnuplot boxplot , but it doesn't help too much, as it is apparently not possible. Is it possible to group the outliers together, print one point and then print a number in brackets beside it to indicate how many points there are? I think this would make it more readable in a graph. For information, I have three boxplots for one x value and that times six in one

Removing univariate outliers from data frame (+-3 SDs)

阅读更多关于 Removing univariate outliers from data frame (+-3 SDs)

I'm so new to R that I'm having trouble finding what I need in other peoples' questions. I think my question is so easy that nobody else has bothered to ask it. What would be the simplest code to create a new data frame which excludes data which are univariate outliers(which I'm defining as points which are 3 SDs from their condition's mean), within their condition, on a certain variable? I'm embarrassed to show what I've tried but here it is greaterthan <- mean(dat$var2[dat$condition=="one"]) + 2.5*(sd(dat$var2[dat$condition=="one"])) lessthan <- mean(dat$var2[dat$condition=="one"]) - 2.5*(sd

How to group boxplot outliers in gnuplot

阅读更多关于 How to group boxplot outliers in gnuplot

问题 I have a large set of data points. I try to plot them with a boxplot, but some of the outliers are the exact same value and they are represented on a line beside each other. I found How to set the horizontal distance between outliers in gnuplot boxplot, but it doesn't help too much, as it is apparently not possible. Is it possible to group the outliers together, print one point and then print a number in brackets beside it to indicate how many points there are? I think this would make it more

Removing univariate outliers from data frame (+-3 SDs)

阅读更多关于 Removing univariate outliers from data frame (+-3 SDs)

问题 I'm so new to R that I'm having trouble finding what I need in other peoples' questions. I think my question is so easy that nobody else has bothered to ask it. What would be the simplest code to create a new data frame which excludes data which are univariate outliers(which I'm defining as points which are 3 SDs from their condition's mean), within their condition, on a certain variable? I'm embarrassed to show what I've tried but here it is greaterthan <- mean(dat$var2[dat$condition=="one"]

matplotlib: disregard outliers when plotting

阅读更多关于 matplotlib: disregard outliers when plotting

I'm plotting some data from various tests. Sometimes in a test I happen to have one outlier (say 0.1), while all other values are three orders of magnitude smaller. With matplotlib, I plot against the range [0, max_data_value] How can I just zoom into my data and not display outliers, which would mess up the x-axis in my plot? Should I simply take the 95 percentile and have the range [0, 95_percentile] on the x-axis? There's no single "best" test for an outlier. Ideally, you should incorporate a-priori information (e.g. "This parameter shouldn't be over x because of blah..."). Most tests for

How to use Outlier Tests in R Code

阅读更多关于 How to use Outlier Tests in R Code

As part of my data analysis workflow, I want to test for outliers, and then do my further calculation with and without those outliers. I've found the outlier package, which has various tests, but I'm not sure how best to use them for my workflow. If you're worried about outliers, instead on throwing them out, use a robust method. For example, instead of lm, use rlm. I agree with Dirk, It's hard. I would recomend first looking at why you might have outliers. An outlier is just a number that someone thinks is suspicious, it's not a concrete 'bad' value, and unless you can find a reason for it to

How to repeat the Grubbs test and flag the outliers

阅读更多关于 How to repeat the Grubbs test and flag the outliers

I am wanting to apply the Grubbs test to a set of data repeatedly until it ceases to find outliers. I want the outliers flagged rather than removed so that I can plot the data as a histogram with the outliers a different colour. I have used grubbs.test from the outliers package to manually identify outliers but cannot figure out how to cycle through them and flag them successfully. The sort of output I am aiming for is like the following: X Outlier 152.36 Yes 130.38 Yes 101.54 No 96.26 No 88.03 No 85.66 No 83.62 No 76.53 No 74.36 No 73.87 No 73.36 No 73.35 No 68.26 No 65.25 No 63.68 No 63.05