Delete outliers from analysis or the data set
There are no specific R functions to remove outliers . You will first have to find out what observations are outliers and then remove them , i.e. finding the first and third quartile (the hinges) and the interquartile range to define numerically the inner fences.
One way of getting the inner fences is to use
id1 <- boxplot.stats="" gnpgrow="" td=""> | -> |
id2 <- boxplot.stats="" coef="2)</td" gnpgrow=""> | Uses a step of 2, instead of the default 1.5 | ->
id1$stats | Display the 5 values (see below) |
id1$stats[1] | The lower adjacent value |
id1$stats[5] | The upper adjacent value |
The boxplot.stats function; is a ancillary function that produces statistics for drawing boxplots. It returns among other information a vector stats with five elements: the extreme of the lower whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and the extreme of the upper whisker, the extreme of the whiskers are the adjacent values (last non-missing value, i.e. every value beyond is an outlier.
Outliers are then all values outside the interval id1$stats[1] < and id1$stats[5]
The adjacent values are in fact not the inner fences, they are further outside, but before an outlier. You can of course define the inner fences precisely, you only need the lower and upper hinger (first and third quartile).
lh <- gnpgrow="" probs="0.25)</td" quantile=""> | Lower hinge (first quartile) | ->
uh <- gnpgrow="" probs="0.75)</td" quantile=""> | Upper hinge (third quartile) | ->
step<- 1.5="" td="" uh-lh=""> | Define the step as 1.5×IQR | ->
Outliers are then all values outside the interval lh-step < and lh+step, the logical expression: gnpgrow lh+step can be used to select the outliers for further processing.
No comments:
Post a Comment
Thank you