Many a times the dataset we are dealing with can be too large to be handled in python. A workaround is to take random samples out of the dataset and work on it.

There are situations where sampling is appropriate, as it gives a near representations of the underlying population.

Sampling in Python

We need to use sample() function

In [1]:

import pandas as pd 

Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.shape

Out[1]:

(541909, 8)

In [2]:

sample_data=Online_Retail.sample(n=1000,replace="False")
sample_data.shape

Out[2]:

(1000, 8)

Using function .sample() on our data set we have taken a random sample of 1000 rows out of total 541909 rows of full data.

Practice : Sampling in Python

Import “Census Income Data/Income_data.csv”
Create a new dataset by taking a random sample of 5000 records

In [3]:

Income_Data=pd.read_csv("datasets\\Census Income Data\\Income_data.csv", encoding = "ISO-8859-1")
Income_Data.shape

Out[3]:

(32561, 15)

In [4]:

sample_Income_Data=Income_Data.sample(n=5000,replace="False")
sample_Income_Data.shape

Out[4]:

(5000, 15)

Next post is about descriptive statistics mean and median.
Link to the next post : https://statinfer.com/104-3-2-descriptive-statistics-mean-and-median/

9th April 2019

104.3.1 Data Sampling in Python

Taking random samples in python.

Sampling in Python

Practice : Sampling in Python

Statinfer

Statinfer

Statinfer

104.3.1 Data Sampling in Python

Taking random samples in python.

Sampling in Python

Practice : Sampling in Python

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer