• No products in the cart.

203.4.2 Calculating Sensitivity and Specificity in R

Building a model, creating Confusion Matrix and finding Specificity and Sensitivity.

Calculating Sensitivity and Specificity

In previous section, we studied about  Model Selection and Cross Validation

Building Logistic Regression Model

Fiberbits <- read.csv("C:\\Amrita\\Datavedi\\Fiberbits\\Fiberbits.csv")
Fiberbits_model_1<-glm(active_cust~., family=binomial, data=Fiberbits)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(Fiberbits_model_1)
## 
## Call:
## glm(formula = active_cust ~ ., family = binomial, data = Fiberbits)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.4904  -0.8752   0.4055   0.7619   2.9465  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                -1.761e+01  3.008e-01  -58.54   <2e-16 ***
## income                      1.710e-03  8.213e-05   20.82   <2e-16 ***
## months_on_network           2.880e-02  1.005e-03   28.65   <2e-16 ***
## Num_complaints             -6.865e-01  3.010e-02  -22.81   <2e-16 ***
## number_plan_changes        -1.896e-01  7.603e-03  -24.94   <2e-16 ***
## relocated                  -3.163e+00  3.957e-02  -79.93   <2e-16 ***
## monthly_bill               -2.198e-03  1.571e-04  -13.99   <2e-16 ***
## technical_issues_per_month -3.904e-01  7.152e-03  -54.58   <2e-16 ***
## Speed_test_result           2.222e-01  2.378e-03   93.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 136149  on 99999  degrees of freedom
## Residual deviance:  98359  on 99991  degrees of freedom
## AIC: 98377
## 
## Number of Fisher Scoring iterations: 8

Confusion Matrix

threshold=0.5
predicted_values<-ifelse(predict(Fiberbits_model_1,type="response")>threshold,1,0)
actual_values<-Fiberbits_model_1$y
conf_matrix<-table(predicted_values,actual_values)
conf_matrix
##                 actual_values
## predicted_values     0     1
##                0 29492 10847
##                1 12649 47012

Code-Sensitivity and Specificity

library(caret)
## Warning: package 'caret' was built under R version 3.1.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.1.3
sensitivity(conf_matrix)
## [1] 0.699841
specificity(conf_matrix)
## [1] 0.812527

Changing Threshold

threshold=0.8
predicted_values<-ifelse(predict(Fiberbits_model_1,type="response")>threshold,1,0)
actual_values<-Fiberbits_model_1$y
conf_matrix<-table(predicted_values,actual_values)
conf_matrix
##                 actual_values
## predicted_values     0     1
##                0 37767 30521
##                1  4374 27338

Changed Sensitivity and Specificity

sensitivity(conf_matrix)
## [1] 0.8962056
specificity(conf_matrix)
## [1] 0.4724935

Sensitivity and Specificity

  • By changing the threshold, the good and bad customers classification will be changed hence the sensitivity and specificity will be changed
  • Which one of these two we should maximize? What should be ideal threshold?
  • Ideally we want to maximize both Sensitivity & Specificity. But this is not possible always. There is always a tradeoff.
  • Sometimes we want to be 100% sure on Predicted negatives, sometimes we want to be 100% sure on Predicted positives.
  • Sometimes we simply don’t want to compromise on sensitivity sometimes we don’t want to compromise on specificity
  • The threshold is set based on business problem

When Sensitivity is a High Priority

  • Predicting a bad customers or defaulters before issuing the loan
  • Predicting a bad defaulters before issuing the loan
  • The profit on good customer loan is not equal to the loss on one bad customer loan
  • The loss on one bad loan might eat up the profit on 100 good customers
  • In this case one bad customer is not equal to one good customer.
  • If p is probability of default then we would like to set our threshold in such a way that we don’t miss any of the bad customers.
  • We set the threshold in such a way that Sensitivity is high
  • We can compromise on specificity here. If we wrongly reject a good customer, our loss is very less compared to giving a loan to a bad customer.
  • We don’t really worry about the good customers here, they are not harmful hence we can have less Specificity

When Specificity is a High Priority

  • Testing a medicine is good or poisonous
  • Testing a medicine is good or poisonous
  • In this case, we have to really avoid cases like , Actual medicine is poisonous and model is predicting them as good.
  • We can’t take any chance here.
  • The specificity need to be near 100.
  • The sensitivity can be compromised here. It is not very harmful not to use a good medicine when compared with vice versa case

Sensitivity vs Specificity – Importance

  • There are some cases where Sensitivity is important and need to be near to 1
  • There are business cases where Specificity is important and need to be near to 1
  • We need to understand the business problem and decide the importance of Sensitivity and Specificity

ROC Curve

  • If we consider all the possible threshold values and the corresponding specificity and sensitivity rate what will be the final model accuracy.
  • ROC(Receiver operating characteristic) curve is drawn by taking False positive rate on X-axis and True positive rate on Y- axis
  • ROC tells us, how many mistakes are we making to identify all the positives?

The next post is about  ROC and AUC.

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.