• No products in the cart.

204.6.4 Building SVM model in Python

Building a SVM model in python.

Link to the previous post : https://statinfer.com/204-6-3-svm-the-algorithm/

We discussed the SVM algorithm in our last post. In this post we will try to build a SVM classification model in Python.

SVM on Python

  • There are multiple SVM libraries available in Python.
    • The package ‘Scikit’ is the most widely used for machine learning.
  • There is a function called svm() within ‘Scikit’ package.
  • There are various options within svm() function to customize the training process.

Practice : First SVM Learning Problem

  • Dataset: Fraud Transaction/Transactions_sample.csv
  • Draw a classification graph that shows all the classes
  • Build a SVM classifier
  • Draw the classifier on the data plots
  • Predict the (Fraud vs not-Fraud) class for the data points Total_Amount=11000, Tr_Count_week=15 & Total_Amount=2000, Tr_Count_week=4
  • Download the complete Dataset: Fraud Transaction/Transaction.csv
  • Draw a classification graph that shows all the classes
  • Build a SVM classifier
  • Draw the classifier on the data plots
In [10]:
# Importing the sample data
import pandas as pd
Transactions_sample= pd.read_csv("datasets\\Fraud Transaction\\Transactions_sample.csv")
X = Transactions_sample[['Total_Amount']+['Tr_Count_week']]  # we only take the first two features. We could
                      # avoid this ugly slicing by using a two-dim dataset
y = Transactions_sample[['Fraud_id']].values.ravel()
In [11]:
#Drawing a classification graph of all classes
import matplotlib.pyplot as plt

plt.scatter(X['Total_Amount'], X['Tr_Count_week'], c=y, cmap=plt.cm.Paired)
plt.show()
In [12]:
#Building a SVM Classifier in python
from sklearn import svm
import numpy
X = Transactions_sample[['Total_Amount']+['Tr_Count_week']]  # we only take the first two features. We could
                      # avoid this ugly slicing by using a two-dim dataset
y = Transactions_sample[['Fraud_id']].values.ravel()

clf = svm.SVC(kernel='linear')
model =clf.fit(X,y)

Predicted = numpy.zeros(50)

# NOTE: If i is in range(0,n), then i takes vales [0,n-1] 
for i in range(0,50):
    a = Transactions_sample.Total_Amount[i]
    b = Transactions_sample.Tr_Count_week[i]
    Predicted[i]=clf.predict([[a,b]])
    del a,b
In [13]:
#Plotting in SVM
import matplotlib.pyplot as plt
plt.scatter(X['Total_Amount'], X['Tr_Count_week'], c=y, cmap=plt.cm.Paired)
w = clf.coef_[0]
o = -w[0] / w[1]
plt.xlim(min(Transactions_sample.Total_Amount), max(Transactions_sample.Total_Amount))
x_min, x_max = ax1.get_xlim()
xx = np.linspace(x_min, x_max)
yy = o * xx - (clf.intercept_[0]) / w[1]

plt.plot(xx, yy, 'k-')
plt.show()
In [14]:
#Predict the (Fraud vs not-Fraud) class for the data points Total_Amount=11000, Tr_Count_week=15 & Total_Amount=2000, Tr_Count_week=4
#Prediction in SVM
new_data1=[11000, 15]
new_data2=[2000,4]

#Predict the (Fraud vs not-Fraud) class for the data points Total_Amount=11000, Tr_Count_week=15 & Total_Amount=2000, Tr_Count_week=4
NewPredicted1=model.predict([new_data1])
print(NewPredicted1)

NewPredicted2=clf.predict([new_data2])
print(NewPredicted2) 
[1]
[0]
In [15]:
# Importing the whole dataset
import pandas as pd
Transactions= pd.read_csv("datasets\\Fraud Transaction\\Transaction.csv")
X = Transactions[['Total_Amount']+['Tr_Count_week']]  # we only take the first two features. We could
                      # avoid this ugly slicing by using a two-dim dataset
y = Transactions[['Fraud_id']].values.ravel()
In [16]:
#Drawing a classification graph of all classes
import matplotlib.pyplot as plt

plt.scatter(X['Total_Amount'], X['Tr_Count_week'], c=y, cmap=plt.cm.Paired)
plt.show()
In [17]:
#Build a SVM classifier 
clf = svm.SVC(kernel='linear')
Smodel =clf.fit(X,y)
In [18]:
#Plotting in SVM
import matplotlib.pyplot as plt
plt.scatter(X['Total_Amount'], X['Tr_Count_week'], c=y, cmap=plt.cm.Paired)
w = clf.coef_[0]
o = -w[0] / w[1]
plt.xlim(min(Transactions_sample.Total_Amount), max(Transactions_sample.Total_Amount))
x_min, x_max = ax1.get_xlim()
xx = np.linspace(x_min, x_max)
yy = o * xx - (clf.intercept_[0]) / w[1]
plt.plot(xx, yy, 'k-')
plt.show()
The next post is about the non-linear decision boundary.
Link to the next post : https://statinfer.com/204-6-5-the-non-linear-decision-boundary/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.