Statinfer

104.3.1 Data Sampling in Python

Taking random samples in python.
Many a times the dataset we are dealing with can be too large to be handled in python. A workaround is to take random samples out of the dataset and work on it.
There are situations where sampling is appropriate, as it gives a near representations of the underlying population.

Sampling in Python

  • We need to use sample() function
In [1]:
import pandas as pd 

Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.shape
Out[1]:
(541909, 8)
In [2]:
sample_data=Online_Retail.sample(n=1000,replace="False")
sample_data.shape
Out[2]:
(1000, 8)

Using function .sample() on our data set we have taken a random sample of 1000 rows out of total 541909 rows of full data.

Practice : Sampling in Python

  • Import “Census Income Data/Income_data.csv”
  • Create a new dataset by taking a random sample of 5000 records
In [3]:
Income_Data=pd.read_csv("datasets\\Census Income Data\\Income_data.csv", encoding = "ISO-8859-1")
Income_Data.shape
Out[3]:
(32561, 15)
In [4]:
sample_Income_Data=Income_Data.sample(n=5000,replace="False")
sample_Income_Data.shape
Out[4]:
(5000, 15)

Next post is about descriptive statistics mean and median.
Link to the next post : https://statinfer.com/104-3-2-descriptive-statistics-mean-and-median/

0 responses on "104.3.1 Data Sampling in Python"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top