What variables should I select?

When building a model the first step of the Train stage will be to select variables. Depending on the type of model you are building you will need to select the following variables:

  • Independent variables: all models will require at least one or two independent variables to be selected. These will be the inputs on which your model depends to make a prediction. Independent variables can be of any type (numerical, categorical, boolean) but need to have a known type. When selecting independent variables you will notice Analyzr automatically detects variable type. In some instances the variable type cannot be detected. You then need to set it manually using the dropdown in the variables list before selecting it as an independent variable.
  • Index variable: designating an index variable is not strictly required but is strongly recommended. It is used to identify specific data records throughout the modeling process, from data ingestion to training and prediction. It provides an audit trail that ensure you always know which record you are looking at once model predictions are generated and used for any downstream purpose. 
  • Dependent variable: when building a propensity or a regression model you will need to select a dependent variable quantifying the outcome you are trying to predict. For propensity models the dependent variable should be boolean. For regression models the dependent variable should be numerical. 

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.