Search This Blog

Tuesday, November 12, 2013

Delete outliers from analysis or the data set

Delete outliers from analysis or the data set
There are no specific R functions to remove outliers . You will first have to find out what observations are outliers and then remove them , i.e. finding the first and third quartile (the hinges) and the interquartile range to define numerically the inner fences.
One way of getting the inner fences is to use

id1 <- boxplot.stats="" gnpgrow="" td="">
id2 <- boxplot.stats="" coef="2)</td" gnpgrow="">Uses a step of 2, instead of the default 1.5
id1$statsDisplay the 5 values (see below)
id1$stats[1]The lower adjacent value
id1$stats[5]The upper adjacent value
The boxplot.stats function; is a ancillary function that produces statistics for drawing boxplots. It returns among other information a vector stats with five elements: the extreme of the lower whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and the extreme of the upper whisker, the extreme of the whiskers are the adjacent values (last non-missing value, i.e. every value beyond is an outlier.

Outliers are then all values outside the interval id1$stats[1] < and id1$stats[5]
The adjacent values are in fact not the inner fences, they are further outside, but before an outlier. You can of course define the inner fences precisely, you only need the lower and upper hinger (first and third quartile).

lh <- gnpgrow="" probs="0.25)</td" quantile="">Lower hinge (first quartile)
uh <- gnpgrow="" probs="0.75)</td" quantile="">Upper hinge (third quartile)
step<- 1.5="" td="" uh-lh="">Define the step as 1.5×IQR

Outliers are then all values outside the interval lh-step < and lh+step, the logical expression: gnpgrow lh+step can be used to select the outliers for further processing.

No comments:

Post a Comment

Thank you