Link to the previous post : https://statinfer.com/204-4-3-more-on-sensitivity-and-specificity/
- If we consider all the possible threshold values and the corresponding specificity and sensitivity rate what will be the final model accuracy.
- ROC(Receiver operating characteristic) curve is drawn by taking False positive rate on X-axis and True positive rate on Y- axis.
- ROC tells us, how many mistakes are we making to identify all the positives?
ROC Curve – Interpretation
- How many mistakes are we making to identify all the positives?
- How many mistakes are we making to identify 70%, 80% and 90% of positives?
- 1-Specificty(false positive rate) gives us an idea on mistakes that we are making
- We would like to make 0% mistakes for identifying 100% positives
- We would like to make very minimal mistakes for identifying maximum positives
- We want that curve to be far away from straight line
- Ideally we want the area under the curve as high as possible
ROC and AUC
- We want that curve to be far away from the straight line. Ideally, we want the area under the curve as high as possible.
- ROC comes with a connected topic, AUC. Area Under the Curve.
- ROC Curve Gives us an idea on the performance of the model under all possible values of threshold.
- We want to make almost 0% mistakes while identifying all the positives, which means we want to see AUC value near to 1.
- AUC is near to 1 for a good model
ROC and AUC Calculation
Building a Logistic Regression Model
###for visualising the plots use matplotlib and import roc_curve,auc from sklearn.metrics from sklearn.metrics import roc_curve, auc import matplotlib.pyplot as plt %matplotlib inline actual = Fiber_df[['active_cust']] false_positive_rate, true_positive_rate, thresholds = roc_curve(actual, predicted_values1) plt.title('Receiver Operating Characteristic') plt.plot(false_positive_rate, true_positive_rate) plt.plot([0,1],[0,1],'r--') plt.xlim([-0.1,1.2]) plt.ylim([-0.1,1.2]) plt.ylabel('True Positive Rate(Sensitivity)') plt.xlabel('False Positive Rate(Specificity)') plt.show()
###Threshold values used for the roc_curve can be viewed from threshold array thresholds
array([ 2.00000000e+00, 1.00000000e+00, 9.99978894e-01, ..., 8.28263852e-03, 8.28015047e-03, 9.42770507e-04])
###Area under Curve-AUC roc_auc = auc(false_positive_rate, true_positive_rate) roc_auc
0.83503740455417319 The next post is about what is a best model. Link to the next post : https://statinfer.com/204-4-5-what-is-a-best-model/