When a regression model is trained using a linear method (such as linear regression, LASSO, ridge, Bayesian ridge) the results will include coefficient estimates for each independent variable. Note that these estimates are typically not produced for machine learning methods such as random forest, gradient boosting or XGBoost. These methods focus on pure prediction. They tend to be better at prediction but do not lend themselves to the type of causal analysis provided by coefficients with linear methods. Coefficient estimates, when generated, include the following:
- Coefficient: the coefficient itself for a given variable. It measures the expected change in outcome given a unit change in the independent variable, everything else being constant. It is a mean estimate and its standard deviation is captured through the standard error below.
- Standard Error: the standard error is defined here as an estimate of the standard deviation of the coefficient above.
- t Value: the Student t value of the estimate, defined as the ratio of the coefficient by its standard error. Look for high t values, typically greater than 2, to ensure the coefficient is meaningful and not null.
- p Value: the probability the independent variable has no impact at all, also known as the null hypothesis. Look for very small values of p. If p is greater than 0.05, it means there is at least a 5% chance this independent variable is irrelevant and not statistically significant.
- Elasticity: The ratio of a percentage change in outcome by a percentage change in the independent variable. An elasticity of 1.0 means a 3% change in the independent variable would yield a 3% change in the outcome variable.