• No products in the cart.

# 104.3.1 Data Sampling in Python

##### Taking random samples in python.
Many a times the dataset we are dealing with can be too large to be handled in python. A workaround is to take random samples out of the dataset and work on it.
There are situations where sampling is appropriate, as it gives a near representations of the underlying population.

## Sampling in Python

• We need to use sample() function
In [1]:
import pandas as pd

Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.shape

Out[1]:
(541909, 8)
In [2]:
sample_data=Online_Retail.sample(n=1000,replace="False")
sample_data.shape

Out[2]:
(1000, 8)



Using function .sample() on our data set we have taken a random sample of 1000 rows out of total 541909 rows of full data.

## Practice : Sampling in Python

• Import “Census Income Data/Income_data.csv”
• Create a new dataset by taking a random sample of 5000 records
In [3]:
Income_Data=pd.read_csv("datasets\\Census Income Data\\Income_data.csv", encoding = "ISO-8859-1")
Income_Data.shape

Out[3]:
(32561, 15)
In [4]:
sample_Income_Data=Income_Data.sample(n=5000,replace="False")
sample_Income_Data.shape

Out[4]:
(5000, 15)

Next post is about descriptive statistics mean and median.
Link to the next post : https://statinfer.com/104-3-2-descriptive-statistics-mean-and-median/
9th April 2019

### 0 responses on "104.3.1 Data Sampling in Python"

Statinfer Software Solutions LLP

Software Technology Parks of India,
NH16, Krishna Nagar, Benz Circle,