Once a propensity model is trained the following error metrics are generated:
- Accuracy: the accuracy score tells you how often the model was correct overall. It is defined as the number of correct predictions divided by the total number of predictions made. It is usually a meaningful number when the dataset is balanced, i.e. the number of positive outcomes is roughly equal to the number of negative outcomes. However, with a dataset with say only 10% of positive outcomes, a dumb model predicting the same zero outcome every time would have an accuracy of 95% (half the 10% positives are predicted correctly, all the negatives are predicted correctly). Clearly accuracy has its limitations.
- Precision: the precision score tells you how good your model is at predicting positive outcomes. It is defined as the number of correct positive predictions divided by the total number of positive predictions. It will help you understand how reliable your positive predictions are.
- Recall: the recall score tells you how often your model was able to detect a positive outcome. It is defined as the number of correct positive predictions divided by the total number of actual positive outcomes. It will help understand how good you are at actually detecting positive outcomes and is especially helpful with unbalanced datasets, i.e. datasets for which the positive outcome occurs rarely (think 10% of the time or less)
- F1 score: the F1 score is the harmonic mean of the precision and recall scores. Think of it as a composite of precision and recall, and is a good overall metric to use to understand your model's performance. It will generally be a better indicator than accuracy.
- AUC: the Area Under Curve refers to the area under the Receiver Operating Characteristic (ROC) curve. It will be 1 for a perfect model and 0.5 for a random (terrible) model.
- Gini coefficient: the Gini coefficient is a scaled version of the AUC that range from -1 to +1. It is defined as 2 times the AUC minus 1.