What is the estimated sample size?

The estimated sample size feature in the Select Variables step is particularly useful with sparse datasets. Real life datasets often have many variables and missing data. Machine learning models in many cases need complete datasets without missing data or nulls. Business analysts often run into this conundrum: they select a broad set of independent variables and because of the lack of overlap between the data available for the different variables they end up with a very small sample size. 

A common strategy to deal with this problem is to impute missing values, i.e. fill in the blanks, using zeros, median values, or simply 'Unknown'. You can select a specific in-fill strategy for each variable using the Missing Values field in the variables list. Clicking on the cycle icon next to the Estimated sample size will refresh the estimated sample size of the dataset that will be processed by the analytics engine. You can select different in-fill settings while monitoring their impact on sample size. 

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.