When should I downsample my dataset?

Downsampling is commonly used at the development stage of a model, when you want to run your model quickly using different settings to try out what is or isn't working. We recommend in most cases downsampling down to less than 10,000 records to try your model and make sure you data gets processed correctly. 

At that stage you will generally uncover data issues, such as sparseness or data integrity. Running your model on a small sample is key to quickly validate your data quality and in-fill strategy for missing data. Once you are ready to run the larger dataset through, start increasing the size of your dataset to get a sense of how long it will take each time you increase the size of the dataset. 

Failing to do so, you may end up training your model for hours, only to then realize you did not configure it in the way you intended to. A common ramp-up strategy is to increase the size of your dataset by 2x to 10x from one run to the next once you feel like your model is ready to run on the larger dataset. 

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.