Many a times the dataset we are dealing with can be too large to be handled in python. A workaround is to take random samples out of the dataset and work on it.
There are situations where sampling is appropriate, as it gives a near representations of the underlying population.
Sampling in Python
- We need to use sample() function
In [1]:
import pandas as pd
Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.shape
Out[1]:
In [2]:
sample_data=Online_Retail.sample(n=1000,replace="False")
sample_data.shape
Out[2]: