• No products in the cart.

104.1.5 Python Packages

Using python packages.

Link to the previous post : https://statinfer.com/104-1-4-conditional-operators-in-python/

Packages

  • A package is collection of python functions. A properly structured and complied code. A package may contain many sub packages.
  • Many python functions are only available via “packages” that must be imported.
  • For example to find value of log(10) we need to first import match package that has the log function in it
In [113]:
log(10)
exp(5)
sqrt(256)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-113-b0bfd28c096c> in <module>()
----> 1 log(10)
      2 exp(5)
      3 sqrt(256)

NameError: name 'log' is not defined
In [115]:
import math
math.log(10)
Out[115]:
2.302585092994046
In [116]:
math.exp(5)
Out[116]:
148.4131591025766
In [117]:
math.sqrt(256)
Out[117]:
16.0
  • To be a good data scientist on python, on need to be very comfortable with below packages
    • Numpy
    • Scipy
    • Pandas
    • Scikit-Learn
    • Matplotlib

Important Packages- NumPy

  • NumPy is for fast operations on vectors and matrices, including mathematical, logical, shape manipulation, sorting, selecting.
  • It is the foundation on which all higher level tools for scientific Python are built
In [118]:
import numpy as np

income = np.array([9000, 8500, 9800, 12000, 7900, 6700, 10000])
print(income) 
print(income[0])
[ 9000  8500  9800 12000  7900  6700 10000]
9000
In [119]:
expenses=income*0.65
print(expenses)
[ 5850.  5525.  6370.  7800.  5135.  4355.  6500.]
In [120]:
savings=income-expenses
print(savings)
[ 3150.  2975.  3430.  4200.  2765.  2345.  3500.]

Important Packages- Pandas

  • Data frames and data handling
  • Pandas has Data structures and operations for manipulating numerical tables and time series.
In [121]:
import pandas as pd
buyer_profile = pd.read_csv('datasets\\Buyers Profiles\\Train_data.csv')

print(buyer_profile)
    Age  Gender Bought
0    29    Male    Yes
1    34    Male    Yes
2    13  Female    Yes
3    27  Female     No
4    10  Female     No
5    68    Male    Yes
6    15    Male    Yes
7    53    Male    Yes
8    51    Male     No
9    48  Female     No
10   63  Female     No
11   43    Male    Yes
12    8  Female     No
13   47  Female     No
In [126]:
buyer_profile.Age
Out[126]:
0     29
1     34
2     13
3     27
4     10
5     68
6     15
7     53
8     51
9     48
10    63
11    43
12     8
13    47
Name: Age, dtype: int64
In [122]:
buyer_profile.Gender
Out[122]:
0       Male
1       Male
2     Female
3     Female
4     Female
5       Male
6       Male
7       Male
8       Male
9     Female
10    Female
11      Male
12    Female
13    Female
Name: Gender, dtype: object
In [124]:
buyer_profile.Age[0]
Out[124]:
29
In [125]:
buyer_profile.Age[0:10]
Out[125]:
0    29
1    34
2    13
3    27
4    10
5    68
6    15
7    53
8    51
9    48
Name: Age, dtype: int64

Important Packages- Scikit-Learn

  • Machine learning algorithms made easy
In [127]:
import sklearn as sk
import pandas as pd

air_passengers = pd.read_csv('datasets\\AirPassengers\\AirPassengers.csv')
air_passengers
Out[127]:
Week_num Passengers Promotion_Budget Service_Quality_Score Holiday_week Delayed_Cancelled_flight_ind Inter_metro_flight_ratio Bad_Weather_Ind Technical_issues_ind
0 1 37824 517356 4.00000 NO NO 0.70 YES YES
1 2 43936 646086 2.67466 NO YES 0.80 YES YES
2 3 42896 638330 3.29473 NO NO 0.90 NO NO
3 4 35792 506492 3.85684 NO NO 0.40 NO NO
4 5 38624 609658 3.90757 NO NO 0.87 NO YES
5 6 35744 476084 3.83710 NO YES 0.66 YES NO
6 7 40752 635978 3.60259 NO YES 0.74 YES NO
7 8 34592 495152 3.60086 NO YES 0.39 NO NO
8 9 35136 429800 3.62776 NO NO 0.61 NO YES
9 10 43328 613326 2.98305 NO NO 0.66 NO NO
10 11 34960 492758 3.60089 NO NO 0.77 NO NO
11 12 44464 600726 2.56064 NO YES 0.74 YES NO
12 13 36464 456960 3.89655 NO YES 0.39 YES NO
13 14 44464 586096 2.47713 NO YES 0.79 YES NO
14 15 51888 704802 1.77422 YES YES 0.72 YES YES
15 16 36800 536970 3.92254 NO NO 0.43 NO YES
16 17 48688 742308 1.93589 NO NO 0.90 NO YES
17 18 37456 500234 3.99060 NO NO 0.46 NO NO
18 19 44800 570682 2.43241 NO YES 0.79 YES YES
19 20 56032 826420 1.41139 YES YES 0.80 YES NO
20 21 58800 761040 1.24488 YES NO 0.69 NO NO
21 22 57440 753466 1.36091 YES NO 0.60 NO NO
22 23 32752 502712 3.37428 NO YES 0.45 YES YES
23 24 43424 653856 2.88878 NO YES 0.89 YES YES
24 25 45968 706748 2.31898 NO YES 0.62 YES NO
25 26 38816 532602 3.85307 NO NO 0.75 NO YES
26 27 35168 518070 3.70671 NO YES 0.47 YES YES
27 28 34496 539378 3.48455 NO YES 0.78 YES YES
28 29 34208 414120 3.48166 NO YES 0.38 YES NO
29 30 44320 653338 2.58325 NO NO 0.71 NO YES
50 51 43728 590492 2.77882 NO YES 0.47 YES NO
51 52 47040 694568 2.06989 NO YES 0.55 YES NO
52 53 34512 493444 3.57125 NO NO 0.74 NO YES
53 54 57600 781718 1.35511 YES NO 0.67 NO YES
54 55 36064 526162 3.87218 NO YES 0.73 NO YES
55 56 49392 707070 1.91865 NO NO 0.75 NO NO
56 57 42378 545510 3.46630 NO NO 0.62 NO YES
57 58 38584 555170 3.99116 NO NO 0.77 NO NO
58 59 28700 405916 3.07021 NO NO 0.72 NO NO
59 60 55160 738794 1.48667 YES YES 0.71 YES NO
60 61 52472 666778 1.58686 YES YES 0.90 YES NO
61 62 54474 715498 1.52341 YES YES 0.55 YES NO
62 63 54222 754418 1.58647 YES NO 0.78 YES NO
63 64 73444 1012130 0.91298 YES YES 0.90 YES NO
64 65 67130 1003002 0.98050 YES NO 0.79 NO YES
65 66 39984 589526 3.77575 NO NO 0.81 NO NO
66 67 41972 550872 3.49699 NO YES 0.68 YES YES
67 68 43722 652680 2.84565 NO YES 0.69 YES NO
68 69 76972 1041796 0.87470 YES YES 0.90 YES NO
69 70 58156 881818 1.33013 YES NO 0.82 NO NO
70 71 52304 679938 1.68678 YES NO 0.63 NO YES
71 72 76524 1024450 0.87933 YES YES 0.90 YES NO
72 73 60620 844578 1.15504 YES NO 0.90 NO YES
73 74 32018 445424 3.23666 NO YES 0.64 YES YES
74 75 51814 669144 1.87321 YES NO 0.88 NO YES
75 76 66934 927696 1.07138 YES YES 0.84 NO NO
76 77 81228 1108254 0.85536 YES YES 0.90 YES NO
77 78 43288 638162 3.08191 NO NO 0.62 NO NO
78 79 43834 636636 2.75382 NO YES 0.79 YES YES
79 80 40852 575008 3.52768 NO YES 0.54 YES YES

80 rows × 9 columns

In [130]:
x=air_passengers['Promotion_Budget']
x=x.reshape(-1,1)
y=air_passengers['Passengers']
y=y.reshape(-1,1)
In [132]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(x, y)
Out[132]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [133]:
print('Coefficients: \n', reg.coef_)
Coefficients: 
 [[ 0.06952969]]

Important Packages- Matplotlib

Plotting library similar to MATLAB plots

In [3]:
import numpy as np
import matplotlib as mp
import matplotlib.pyplot

#to print the plot in the notebook:
%matplotlib inline

X = np.random.normal(0,1,1000)
Y = np.random.normal(0,1,1000)

mp.pyplot.scatter(X,Y)
Out[3]:
<matplotlib.collections.PathCollection at 0x65ae2b0>

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.