How to detect outliers in an ArrayList

匿名 (未验证) 提交于 2019-12-03 01:10:02

问题:

I'm trying to think of some code that will allow me to search through my ArrayList and detect any values outside the common range of "good values."

Example: 100 105 102 13 104 22 101

How would I be able to write the code to detect that (in this case) 13 and 22 don't fall within the "good values" of around 100?

回答1:

There are several criteria for detecting outliers. The simplest ones, like Chauvenet's criterion, use the mean and standard deviation calculated from the sample to determine a "normal" range for values. Any value outside of this range is deemed an outlier.

Other criterions are Grubb's test and Dixon's Q test and may give better results than Chauvenet's for example if the sample comes from a skew distribution.



回答2:

package test;  import java.util.ArrayList; import java.util.Collections; import java.util.List;  public class Main {     public static void main(String[] args) {         List<Double> data = new ArrayList<Double>();         data.add((double) 20);         data.add((double) 65);         data.add((double) 72);         data.add((double) 75);         data.add((double) 77);         data.add((double) 78);         data.add((double) 80);         data.add((double) 81);         data.add((double) 82);         data.add((double) 83);         Collections.sort(data);         System.out.println(getOutliers(data));     }      public static List<Double> getOutliers(List<Double> input) {         List<Double> output = new ArrayList<Double>();         List<Double> data1 = new ArrayList<Double>();         List<Double> data2 = new ArrayList<Double>();         if (input.size() % 2 == 0) {             data1 = input.subList(0, input.size() / 2);             data2 = input.subList(input.size() / 2, input.size());         } else {             data1 = input.subList(0, input.size() / 2);             data2 = input.subList(input.size() / 2 + 1, input.size());         }         double q1 = getMedian(data1);         double q3 = getMedian(data2);         double iqr = q3 - q1;         double lowerFence = q1 - 1.5 * iqr;         double upperFence = q3 + 1.5 * iqr;         for (int i = 0; i < input.size(); i++) {             if (input.get(i) < lowerFence || input.get(i) > upperFence)                 output.add(input.get(i));         }         return output;     }      private static double getMedian(List<Double> data) {         if (data.size() % 2 == 0)             return (data.get(data.size() / 2) + data.get(data.size() / 2 - 1)) / 2;         else             return data.get(data.size() / 2);     } }

Output: [20.0]

Explanation:

  • Sort a list of integers, from low to high
  • Split a list of integers into 2 parts (by a middle) and put them into 2 new separate ArrayLists (call them "left" and "right")
  • Find a middle number (median) in both of those new ArrayLists
  • Q1 is a median from left side, and Q3 is the median from the right side
  • Applying mathematical formula:
  • IQR = Q3 - Q1
  • LowerFence = Q1 - 1.5*IQR
  • UpperFence = Q3 + 1.5*IQR
  • More info about this formula: http://www.mathwords.com/o/outlier.htm
  • Loop through all of my original elements, and if any of them are lower than a lower fence, or higher than an upper fence, add them to "output" ArrayList
  • This new "output" ArrayList contains the outliers


回答3:

  • find the mean value for your list
  • create a Map that maps the number to the distance from mean
  • sort values by the distance from mean
  • and differentiate last n number, making sure there is no injustice with distance


回答4:

Use this algorithm. This algorithm uses the average and standard deviation. These 2 number optional values (2 * standardDeviation).

 public static List<int> StatisticalOutLierAnalysis(List<int> allNumbers)             {                 if (allNumbers.Count == 0)                     return null;                  List<int> normalNumbers = new List<int>();                 List<int> outLierNumbers = new List<int>();                 double avg = allNumbers.Average();                 double standardDeviation = Math.Sqrt(allNumbers.Average(v => Math.Pow(v - avg, 2)));                 foreach (int number in allNumbers)                 {                     if ((Math.Abs(number - avg)) > (2 * standardDeviation))                         outLierNumbers.Add(number);                     else                         normalNumbers.Add(number);                 }                  return normalNumbers;             }


回答5:

An implementation of the Grubb's test can be found at MathUtil.java. It will find a single outlier, of which you can remove from your list and repeat until you've removed all outliers.

Depends on commons-math, so if you're using Gradle:

dependencies {   compile 'org.apache.commons:commons-math:2.2' }


回答6:

It is just a very simple implementation which fetches the information which numbers are not in the range:

List<Integer> notInRangeNumbers = new ArrayList<Integer>(); for (Integer number : numbers) {     if (!isInRange(number)) {         // call with a predefined factor value, here example value = 5         notInRangeNumbers.add(number, 5);     } }

Additionally inside the isInRange method you have to define what do you mean by 'good values'. Below you will find an examplary implementation.

private boolean isInRange(Integer number, int aroundFactor) {    //TODO the implementation of the 'in range condition'    // here the example implementation    return number <= 100 + aroundFactor && number >= 100 - aroundFactor; }


回答7:

As Joni already pointed out , you can eliminate outliers with the help of Standard Deviation and Mean. Here is my code, that you can use for your purposes.

    public static void main(String[] args) {      List<Integer> values = new ArrayList<>();     values.add(100);     values.add(105);     values.add(102);     values.add(13);     values.add(104);     values.add(22);     values.add(101);      System.out.println("Before: " + values);     System.out.println("After: " + eliminateOutliers(values,1.5f));  }  protected static double getMean(List<Integer> values) {     int sum = 0;     for (int value : values) {         sum += value;     }      return (sum / values.size()); }  public static double getVariance(List<Integer> values) {     double mean = getMean(values);     int temp = 0;      for (int a : values) {         temp += (a - mean) * (a - mean);     }      return temp / (values.size() - 1); }  public static double getStdDev(List<Integer> values) {     return Math.sqrt(getVariance(values)); }  public static List<Integer> eliminateOutliers(List<Integer> values, float scaleOfElimination) {     double mean = getMean(values);     double stdDev = getStdDev(values);      final List<Integer> newList = new ArrayList<>();      for (int value : values) {         boolean isLessThanLowerBound = value < mean - stdDev * scaleOfElimination;         boolean isGreaterThanUpperBound = value > mean + stdDev * scaleOfElimination;         boolean isOutOfBounds = isLessThanLowerBound || isGreaterThanUpperBound;          if (!isOutOfBounds) {             newList.add(value);         }     }      int countOfOutliers = values.size() - newList.size();     if (countOfOutliers == 0) {         return values;     }      return eliminateOutliers(newList,scaleOfElimination); }
  • eliminateOutliers() method is doing all the work
  • It is a recursive method, which modifies the list with every recursive call
  • scaleOfElimination variable, which you pass to the method, defines at what scale you want to remove outliers: Normally i go with 1.5f-2f, the greater the variable is, the less outliers will be removed

The output of the code:

Before: [100, 105, 102, 13, 104, 22, 101]

After: [100, 105, 102, 104, 101]



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!