While exploring data that was collected for an investigation of resources needed for software testing and improvement, you have found an outlier. The study is focused on two variables: the number of bugs found in the code and the time (in person-hours) required to fix them. The outlier corresponds to a very large project with many bugs that required a large amount of time to repair them (and the numbers appear to be correct). Your supervisor recalls something about 'outliers being bad' and recommends that you remove this observation from the data set. A co-worker comments that the outlier actually seems to be typical for a large project. All agree that large projects are indeed part of the universe under study. Should the outlier be removed? Justify the answer.
I was thinking about this problem, and here is my opinion. Normally from a statistically point of view, outliers are removed. It is done so to remove any bias in the data, especially if you are looking at means. What the mean does is take into account all data points and will average them out. So if the majority of your data is around a specific value, and then you have a few numbers that don't fit with the data, it will skew the mean. Thus, from a statistical ...
This solution discusses the removal of an outlier for a dataset.