What is outlier removal?

Real-world datasets often include outliers, i.e. data points that are far outside the distribution of the dataset under consideration. In most cases these outliers should be included in the analysis as they more often than not reflect the real outcomes you are trying to analyze and predict. 

However, in the case of clustering, outliers can throw off the clustering solution by becoming a cluster or series of clusters themselves. In this case, you may find it useful to exclude outliers prior to clustering so your resulting clusters are not cluttered by what you know to be noise. When the outlier removal setting is turned on, the Analyzr platform will remove the 5% of the dataset deemed most "outside" the norm using kNN (proximity-based) anomaly detection. 

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.