• No products in the cart.

104.3.1 Data Sampling in Python

Taking random samples in python.
Many a times the dataset we are dealing with can be too large to be handled in python. A workaround is to take random samples out of the dataset and work on it.
There are situations where sampling is appropriate, as it gives a near representations of the underlying population.

Sampling in Python

  • We need to use sample() function
In [1]:
import pandas as pd 

Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.shape
Out[1]:
(541909, 8)
In [2]:
sample_data=Online_Retail.sample(n=1000,replace="False")
sample_data.shape
Out[2]:
(1000, 8)

Using function .sample() on our data set we have taken a random sample of 1000 rows out of total 541909 rows of full data.

Practice : Sampling in Python

  • Import “Census Income Data/Income_data.csv”
  • Create a new dataset by taking a random sample of 5000 records
In [3]:
Income_Data=pd.read_csv("datasets\\Census Income Data\\Income_data.csv", encoding = "ISO-8859-1")
Income_Data.shape
Out[3]:
(32561, 15)
In [4]:
sample_Income_Data=Income_Data.sample(n=5000,replace="False")
sample_Income_Data.shape
Out[4]:
(5000, 15)

Next post is about descriptive statistics mean and median.
Link to the next post : https://statinfer.com/104-3-2-descriptive-statistics-mean-and-median/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.