• No products in the cart.

204.4.2 Calculating Sensitivity and Specificity in Python

Building a model, creating Confusion Matrix and finding Specificity and Sensitivity.

Link to the previous post: https://statinfer.com/204-4-1-model-section-and-cross-validation/

 

This post is an extension of the previous post. Here, we will look at a way to calculate Sensitivity and Specificity of the model in python.

Calculating Sensitivity and Specificity

Building Logistic Regression Model

In [1]:
#Importing necessary libraries
import sklearn as sk
import pandas as pd
import numpy as np
import scipy as sp
In [2]:
#Importing the dataset
Fiber_df= pd.read_csv("datasets\\Fiberbits\\Fiberbits.csv")
###to see head and tail of the Fiber dataset
Fiber_df.head(5)
Out[2]:
active_cust income months_on_network Num_complaints number_plan_changes relocated monthly_bill technical_issues_per_month Speed_test_result
0 0 1586 85 4 1 0 121 4 85
1 0 1581 85 4 1 0 133 4 85
2 0 1594 82 4 1 0 118 4 85
3 0 1594 82 4 1 0 123 4 85
4 1 1609 80 4 1 0 177 4 85
In [3]:
#Name of the columns/Variables
Fiber_df.columns
Out[3]:
Index(['active_cust', 'income', 'months_on_network', 'Num_complaints',
       'number_plan_changes', 'relocated', 'monthly_bill',
       'technical_issues_per_month', 'Speed_test_result'],
      dtype='object')
In [4]:
#Building and training a Logistic Regression model
import statsmodels.formula.api as sm
logistic1 = sm.logit(formula='active_cust~income+months_on_network+Num_complaints+number_plan_changes+relocated+monthly_bill+technical_issues_per_month+Speed_test_result', data=Fiber_df)
fitted1 = logistic1.fit()
fitted1.summary()
Optimization terminated successfully.
         Current function value: 0.493647
         Iterations 9
Out[4]:
Logit Regression Results
Dep. Variable: active_cust No. Observations: 100000
Model: Logit Df Residuals: 99991
Method: MLE Df Model: 8
Date: Fri, 18 Nov 2016 Pseudo R-squ.: 0.2748
Time: 19:16:40 Log-Likelihood: -49365.
converged: True LL-Null: -68074.
LLR p-value: 0.000
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -17.6101 0.301 -58.538 0.000 -18.200 -17.020
income 0.0017 8.21e-05 20.820 0.000 0.002 0.002
months_on_network 0.0288 0.001 28.654 0.000 0.027 0.031
Num_complaints -0.6865 0.030 -22.811 0.000 -0.746 -0.628
number_plan_changes -0.1896 0.008 -24.940 0.000 -0.205 -0.175
relocated -3.1626 0.040 -79.927 0.000 -3.240 -3.085
monthly_bill -0.0022 0.000 -13.995 0.000 -0.003 -0.002
technical_issues_per_month -0.3904 0.007 -54.581 0.000 -0.404 -0.376
Speed_test_result 0.2222 0.002 93.435 0.000 0.218 0.227
In [5]:
###predicting values
predicted_values1=fitted1.predict(Fiber_df[["income"]+['months_on_network']+['Num_complaints']+['number_plan_changes']+['relocated']+['monthly_bill']+['technical_issues_per_month']+['Speed_test_result']])
predicted_values1[1:10]
Out[5]:
array([ 0.83701059,  0.83271114,  0.83117449,  0.80896979,  0.8520262 ,
        0.82713018,  0.85504571,  0.85131352,  0.85537857])
In [6]:
### Converting predicted values into classes using threshold
threshold=0.5

predicted_class1=np.zeros(predicted_values1.shape)
predicted_class1[predicted_values1>threshold]=1
predicted_class1
Out[6]:
array([ 1.,  1.,  1., ...,  1.,  1.,  1.])
In [7]:
#Confusion matrix, Accuracy, sensitivity and specificity
from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix(Fiber_df[['active_cust']],predicted_class1)
print('Confusion Matrix : \n', cm1)

total1=sum(sum(cm1))
#####from confusion matrix calculate accuracy
accuracy1=(cm1[0,0]+cm1[1,1])/total1
print ('Accuracy : ', accuracy1)

sensitivity1 = cm1[0,0]/(cm1[0,0]+cm1[0,1])
print('Sensitivity : ', sensitivity1 )

specificity1 = cm1[1,1]/(cm1[1,0]+cm1[1,1])
print('Specificity : ', specificity1)
Confusion Matrix : 
 [[29492 12649]
 [10847 47012]]
Accuracy :  0.76504
Sensitivity :  0.699841009943
Specificity :  0.812527005306

Changing Threshold to 0.8

In [8]:
### Converting predicted values into classes using new threshold
threshold=0.8

predicted_class1=np.zeros(predicted_values1.shape)
predicted_class1[predicted_values1>threshold]=1
predicted_class1
Out[8]:
array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

Change in Confusion Matrix, Accuracy and Sensitivity-Specificity

In [9]:
#Confusion matrix, Accuracy, sensitivity and specificity
from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix(Fiber_df[['active_cust']],predicted_class1)
print('Confusion Matrix : \n', cm1)

total1=sum(sum(cm1))
#####from confusion matrix calculate accuracy
accuracy1=(cm1[0,0]+cm1[1,1])/total1
print ('Accuracy : ', accuracy1)

sensitivity1 = cm1[0,0]/(cm1[0,0]+cm1[0,1])
print('Sensitivity : ', sensitivity1 )

specificity1 = cm1[1,1]/(cm1[1,0]+cm1[1,1])
print('Specificity : ', specificity1)
Confusion Matrix : 
 [[37767  4374]
 [30521 27338]]
Accuracy :  0.65105
Sensitivity :  0.896205595501
Specificity :  0.472493475518

Sensitivity and Specificity

  • By changing the threshold, the good and bad customers classification will be changed hence the sensitivity and specificity will be changed.
  • Which one of these two we should maximize? What should be ideal threshold?
  • Ideally we want to maximize both Sensitivity & Specificity. But this is not possible always. There is always a trade-off.
  • Sometimes we want to be 100% sure on Predicted negatives, sometimes we want to be 100% sure on Predicted positives.
  • Sometimes we simply don’t want to compromise on sensitivity sometimes we don’t want to compromise on specificityThe threshold is set based on business problem

When Sensitivity is a High Priority

  • Predicting a bad customers or defaulters before issuing the loan
  • Predicting a bad defaulters before issuing the loan
  • The profit on good customer loan is not equal to the loss on one bad customer loan.
  • The loss on one bad loan might eat up the profit on 100 good customers.
  • In this case one bad customer is not equal to one good customer.
  • If p is probability of default then we would like to set our threshold in such a way that we don’t miss any of the bad customers.
  • We set the threshold in such a way that Sensitivity is high.
  • We can compromise on specificity here. If we wrongly reject a good customer, our loss is very less compared to giving a loan to a bad customer.
  • We don’t really worry about the good customers here, they are not harmful hence we can have less Specificity.

When Specificity is a High Priority

  • Testing a medicine is good or poisonous
  • Testing a medicine is good or poisonous
  • In this case, we have to really avoid cases like , Actual medicine is poisonous and model is predicting them as good.
  • We can’t take any chance here.
  • The specificity need to be near 100.
  • The sensitivity can be compromised here. It is not very harmful not to use a good medicine when compared with vice versa case.

Sensitivity vs Specificity – Importance

  • There are some cases where Sensitivity is important and need to be near to 1.
  • There are business cases where Specificity is important and need to be near to 1.
  • We need to understand the business problem and decide the importance of Sensitivity and Specificity.

The next post is about roc and auc.

Link to the next post : https://statinfer.com/204-4-4-roc-and-auc/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.