Link tot he previous post : https://statinfer.com/204-6-1-introduction-to-svm/

In this practice session we will cover all the things we discussed about simple classifiers in last post.

Practice : Simple Classifiers

Dataset: Fraud Transaction/Transactions_sample.csv
Draw a classification graph that shows all the classes
Build a logistic regression classifier
Draw the classifier on the data plot

Solution

In [1]:

#Importing the dataset:
import pandas as pd
Transactions_sample = pd.read_csv("datasets/Fraud Transaction/Transactions_sample.csv")
Transactions_sample.head(6)

Out[1]:

	id	Total_Amount	Tr_Count_week	Fraud_id
0	16078	7294.60	4.79	0
1	41365	7659.53	2.45	0
2	11666	8259.29	10.77	0
3	11824	11630.25	15.29	1
4	36414	12286.63	22.18	1
5	90	12783.34	16.34	1

In [2]:

#Name of the columns 
Transactions_sample.columns

Out[2]:

Index(['id', 'Total_Amount', 'Tr_Count_week', 'Fraud_id'], dtype='object')

In [3]:

#The clasification graph distinguishing the two classes with colors or shapes.
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==0],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==0], s=10, c='b', marker="o", label='Fraud_id=0')
ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==1],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==1], s=10, c='r', marker="+", label='Fraud_id=1')
plt.legend(loc='upper left');
plt.show()

In [4]:

#build a logistic regression model
###Logistic Regerssion model1
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Fraud_id ~ Total_Amount+Tr_Count_week', data=Transactions_sample)
fitted1 = model1.fit()
fitted1.summary()

Optimization terminated successfully.
         Current function value: 0.040114
         Iterations 10

Out[4]:

Logit Regression Results
Dep. Variable:	Fraud_id	No. Observations:	210
Model:	Logit	Df Residuals:	207
Method:	MLE	Df Model:	2
Date:	Mon, 14 Nov 2016	Pseudo R-squ.:	0.9421
Time:	19:18:37	Log-Likelihood:	-8.4239
converged:	True	LL-Null:	-145.55
		LLR p-value:	2.795e-60

	coef	std err	z	P>\|z\|	[95.0% Conf. Int.]
Intercept	-26.1481	7.817	-3.345	0.001	-41.469 -10.827
Total_Amount	0.0025	0.001	2.386	0.017	0.000 0.005
Tr_Count_week	0.1089	0.270	0.403	0.687	-0.421 0.638

In [5]:

# Getting slope and intercept of the line
#coefficients
coef=fitted1.normalized_cov_params
print(coef)

slope1=coef.Intercept[1]/(-coef.Intercept[2])
intercept1=coef.Intercept[0]/(-coef.Intercept[2])

               Intercept  Total_Amount  Tr_Count_week
Intercept      61.106470     -0.008058       1.428005
Total_Amount   -0.008058      0.000001      -0.000237
Tr_Count_week   1.428005     -0.000237       0.072958

In [8]:

import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==0],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==0], s=30, c='b', marker="o", label='Fraud_id 0')
ax1.scatter(Transactions_sample.Total_Amount[Transactions_sample.Fraud_id==1],Transactions_sample.Tr_Count_week[Transactions_sample.Fraud_id==1], s=30, c='r', marker="+", label='Fraud_id 1')

plt.xlim(min(Transactions_sample.Total_Amount), max(Transactions_sample.Total_Amount))
plt.ylim(min(Transactions_sample.Tr_Count_week), max(Transactions_sample.Tr_Count_week))

plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
plt.show()

In [9]:

#Accuracy of the model
#Creating the confusion matrix
predicted_values=fitted1.predict(Transactions_sample[["Total_Amount"]+["Tr_Count_week"]])
print('Predicted Values: ', predicted_values[1:10])

threshold=0.5

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

print('Predicted Class: ', predicted_class)

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Transactions_sample[['Fraud_id']],predicted_class)
print('Confusion Matrix: ', ConfusionMatrix)

accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy: ', accuracy)

error=1-accuracy
print('Error: ', error)

Predicted Values:  [ 0.00154015  0.01714584  0.9932035   0.99938783  0.99967144  0.99846609
  0.99793177  0.99981494  0.99991438]
Predicted Class:  [ 0.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.  0.  0.  0.  0.
  1.  0.  0.  0.  1.  0.  1.  1.  0.  1.  0.  1.  0.  0.  0.  0.  1.  0.
  1.  1.  0.  0.  0.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  0.  0.  1.
  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.
  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.
  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.
  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.
  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.
  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.
  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.
  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.
  0.  0.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.]
Confusion Matrix:  [[104   0]
 [  1 105]]
Accuracy:  0.995238095238
Error:  0.0047619047619

The next  post is about the SVM algorithm.
Link to the next post : https://statinfer.com/204-6-3-svm-the-algorithm/

21st June 2017

204.6.2 Practice : Simple Classifier

Let's put classifiers into practice

Practice : Simple Classifiers

Solution

Statinfer

Statinfer

Statinfer

204.6.2 Practice : Simple Classifier

Let's put classifiers into practice

Practice : Simple Classifiers

Solution

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer