Which algorithm should I pick for my regression model?

f you're not sure which algorithm to use for your regression model you should start with linear regression and try the following:

  • Linear regression (multivariate). The Linear Regression algorithm is very efficient and is an easy way to compute a regression quickly on any dataset. It is a great place to start. However, it tends to be less accurate with datasets that have complex / non-linear relationships, or collinear variables. In that case you should try other algorithms such as the Random Forest regression.
  • Random forest. When in doubt use the Random Forest algorithm if you've already tried the Linear Regression. Random Forest is a robust algorithm that does well on a variety of tabular datasets. It tends to be somewhat slower but will do a great job with most datasets.
  • Gradient Boosting. In most cases the Gradient Boosting regression will not perform as well as Random Forest. It is more sensitive to overfitting and slower than the latter. However, it occasionally does better in cases where Random Forest may be biased or limited, e.g., with categorical variables with many levels.
  • XGBoost. The XGBoost regression algorithm performs similarly to Random Forest. In general XGBoost will be less robust in terms of overfitting data and/or handling messy data but it will do better and run faster in many other cases.

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.