Many a times the dataset we are dealing with can be too large to be handled in python. A workaround is to take random samples out of the dataset and work on it.
There are situations where sampling is appropriate, as it gives a near representations of the underlying population.
Sampling in Python
- We need to use sample() function
import pandas as pd Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1") Online_Retail.shape
.sample() on our data set we have taken a random sample of 1000 rows out of total 541909 rows of full data.
Practice : Sampling in Python
- Import “Census Income Data/Income_data.csv”
- Create a new dataset by taking a random sample of 5000 records
Income_Data=pd.read_csv("datasets\\Census Income Data\\Income_data.csv", encoding = "ISO-8859-1") Income_Data.shape