outliers

Extract rows with highest and lowest values from a data frame

ぐ巨炮叔叔 提交于 2021-02-18 07:01:47
问题 I'm quite new to R, I use it mainly for visualising statistics using ggplot2 library. Now I have faced a problem with data preparation. I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest and lowest values in specified column and put them into another data frame, and do this for each combination of two factors (in my case: for each day and server). Up to this point, I have done the following steps (MWE using esoph example dataset). I

Find outlier using z score

不想你离开。 提交于 2021-02-10 20:23:23
问题 I am trying to create a function in R. The function should find outliers from a matrix using z score. The function should have two arguments as input (x which is a matrix and zs which is an integer). For each raw of the matrix, the function should calculate the zscore for each element and if zscore is bigger than zs or smaller than -zs , then the function should print that element. I know that I can use: z<- (x-mean(x))/sd(x) or z<- scale(x) for the calculations of z score but as I am a

Find outlier using z score

穿精又带淫゛_ 提交于 2021-02-10 20:20:16
问题 I am trying to create a function in R. The function should find outliers from a matrix using z score. The function should have two arguments as input (x which is a matrix and zs which is an integer). For each raw of the matrix, the function should calculate the zscore for each element and if zscore is bigger than zs or smaller than -zs , then the function should print that element. I know that I can use: z<- (x-mean(x))/sd(x) or z<- scale(x) for the calculations of z score but as I am a

How to get indices of outliers in a dataframe boxplot?

拈花ヽ惹草 提交于 2021-02-05 09:59:26
问题 I have a dataframe and I want to get each columns of outliers indices. Here is part of my dataframe; mediamarkt[,48] [1] 7126 4012 3711 3237 3432 2671 2861 7065 3158 4023 4770 3861 [13] 4108 7408 9071 3596 3889 4093 4446 6059 8345 10291 5546 5129 [25] 4683 4670 5694 8619 11047 5743 5775 5216 5283 4854 7871 9944 [37] 3797 3821 3834 3999 4577 8898 11396 4508 5459 3668 3885 4021 [49] 7491 8831 3513 3606 3332 3189 3656 6859 9167 3306 3305 3379 [61] 3507 3912 6562 8245 3420 3445 3530 3404 3847

Remove outliers from data frame in R?

别来无恙 提交于 2021-01-29 10:01:32
问题 I am trying to remove outliers from my data. The outliers in my case are the values that are away from rest of the data when plotted on a boxplot. After removing outliers, I will save data in new file and run some prediction model to see the results. How different they are from the original data. I used one tutorial and adopted it to remove outliers from my data. The tutorial uses boxplotting to figure out the outliers. It works fine when I run it on a column that has outliers. But it raises

Delete outliers automatically of a calculated agglomerative hierarchical clustering data

不打扰是莪最后的温柔 提交于 2021-01-28 08:09:14
问题 in the cluster analysis the outliers of a dataset can be easily identified by the single-linkage method. Now I would like to remove the outliers automatically. My idea is to remove the data which exceed a specified distance value. Here is my code with the example data of mtcars: library(cluster) library(dendextend) cluster<-agnes(mtcars,stand=FALSE,method="single") dend = as.dendrogram(cluster) In the Plot you can see the resulting dendrogram. The last 4 cars ("Duster 360", "Camaro Z28",

Outliers in Axes in D3 (Mixing numerical and categorical specifications)

自古美人都是妖i 提交于 2021-01-03 06:50:47
问题 I am trying to set something up in D3 where I have an axis for some collection of datapoints. In the case of outliers for the datapoints, however, I'd like to put those outliers in a bucket on an axis. Is there a way that I could specify an "outlier tickmark" for the axis to serve as a partition for placing those datapoints? Example: [1,3, 7, 12, 2048] * * * * * --1--2--3--4--5--6--7--8--9--10--11--12--13--14--15--O-- This following is the current code I have. It seems to me that scales only