• No products in the cart.

204.6.2 Practice : Simple Classifier

Let's put classifiers into practice

Link tot he previous post : https://statinfer.com/204-6-1-introduction-to-svm/

In this practice session we will cover all the things we discussed about simple classifiers in last post.

Practice : Simple Classifiers

  • Dataset: Fraud Transaction/Transactions_sample.csv
  • Draw a classification graph that shows all the classes
  • Build a logistic regression classifier
  • Draw the classifier on the data plot

Solution

In [1]:
#Importing the dataset:
import pandas as pd
Transactions_sample = pd.read_csv("datasets/Fraud Transaction/Transactions_sample.csv")
Transactions_sample.head(6)
Out[1]:
id Total_Amount Tr_Count_week Fraud_id
0 16078 7294.60 4.79 0
1 41365 7659.53 2.45 0
2 11666 8259.29 10.77 0
3 11824 11630.25 15.29 1
4 36414 12286.63 22.18 1
5 90 12783.34 16.34 1
In [2]:
#Name of the columns 
Transactions_sample.columns
Out[2]:
Index(['id', 'Total_Amount', 'Tr_Count_week', 'Fraud_id'], dtype='object')
In [3]:
#The clasification graph distinguishing the two classes with colors or shapes.
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==0],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==0], s=10, c='b', marker="o", label='Fraud_id=0')
ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==1],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==1], s=10, c='r', marker="+", label='Fraud_id=1')
plt.legend(loc='upper left');
plt.show()
In [4]:
#build a logistic regression model
###Logistic Regerssion model1
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Fraud_id ~ Total_Amount+Tr_Count_week', data=Transactions_sample)
fitted1 = model1.fit()
fitted1.summary()
Optimization terminated successfully.
         Current function value: 0.040114
         Iterations 10
Out[4]:
Logit Regression Results
Dep. Variable: Fraud_id No. Observations: 210
Model: Logit Df Residuals: 207
Method: MLE Df Model: 2
Date: Mon, 14 Nov 2016 Pseudo R-squ.: 0.9421
Time: 19:18:37 Log-Likelihood: -8.4239
converged: True LL-Null: -145.55
LLR p-value: 2.795e-60
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -26.1481 7.817 -3.345 0.001 -41.469 -10.827
Total_Amount 0.0025 0.001 2.386 0.017 0.000 0.005
Tr_Count_week 0.1089 0.270 0.403 0.687 -0.421 0.638
In [5]:
# Getting slope and intercept of the line
#coefficients
coef=fitted1.normalized_cov_params
print(coef)

slope1=coef.Intercept[1]/(-coef.Intercept[2])
intercept1=coef.Intercept[0]/(-coef.Intercept[2])
               Intercept  Total_Amount  Tr_Count_week
Intercept      61.106470     -0.008058       1.428005
Total_Amount   -0.008058      0.000001      -0.000237
Tr_Count_week   1.428005     -0.000237       0.072958
In [8]:
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==0],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==0], s=30, c='b', marker="o", label='Fraud_id 0')
ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==1],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==1], s=30, c='r', marker="+", label='Fraud_id 1')

plt.xlim(min(Transactions_sample.Total_Amount), max(Transactions_sample.Total_Amount))
plt.ylim(min(Transactions_sample.Tr_Count_week), max(Transactions_sample.Tr_Count_week))

plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
plt.show()
In [9]:
#Accuracy of the model
#Creating the confusion matrix
predicted_values=fitted1.predict(Transactions_sample[["Total_Amount"]+["Tr_Count_week"]])
print('Predicted Values: ', predicted_values[1:10])

threshold=0.5

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

print('Predicted Class: ', predicted_class)

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Transactions_sample[['Fraud_id']],predicted_class)
print('Confusion Matrix: ', ConfusionMatrix)

accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy: ', accuracy)

error=1-accuracy
print('Error: ', error)
Predicted Values:  [ 0.00154015  0.01714584  0.9932035   0.99938783  0.99967144  0.99846609
  0.99793177  0.99981494  0.99991438]
Predicted Class:  [ 0.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  0.  0.  0.  0.
  1.  0.  0.  0.  1.  0.  1.  1.  0.  1.  0.  1.  0.  0.  0.  0.  1.  0.
  1.  1.  0.  0.  0.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  0.  0.  1.
  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.
  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.
  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.
  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.
  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.
  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.
  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.
  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.
  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.]
Confusion Matrix:  [[104   0]
 [  1 105]]
Accuracy:  0.995238095238
Error:  0.0047619047619

The next  post is about the SVM algorithm.
Link to the next post : https://statinfer.com/204-6-3-svm-the-algorithm/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.