Link to the previous post : https://statinfer.com/104-2-1-importing-data-in-python/
In this post we will cover basic tasks we can perform on a dataset after importing it into python.
We will complete following tasks:
Printing the data and meta info
- Import “Superstore Sales Data\Sales_by_country_v1.csv” data.
- Perform the basic checks on the data.
- How many rows and columns are there in this dataset?
- Print only column names in the dataset.
- Print first 10 observations.
- Print the last 5 observations.
- Get the summary of the dataset.
- Print the structure of the data.
- Describe the field unitsSold, custCountry.
- Create a new dataset by taking first 30 observations from this data.
- Print the resultant data.
- Remove(delete) the new dataset.
In [4]:
import pandas as pd # importing library pandas
Sales_country =pd.read_csv("datasets\\Superstore Sales Data\\Sales_by_country_v1.csv")
print(Sales)
In [10]:
#How many rows and columns are there in this dataset?
Sales_country.shape
Out[10]:
In [11]:
#Print only column names in the dataset
Sales_country.columns.values
Out[11]:
In [12]:
#Print first 10 observations
Sales_country.head(10)
Out[12]:
In [14]:
#Print the last 5 observations
Sales_country.tail(5)
Out[14]:
In [15]:
#Get the summary of the dataset
Sales_country.describe()
Out[15]:
In [17]:
#Print the structure of the data
Sales_country.apply(lambda x: [x.unique()]) # this is close str() in R.
Out[17]:
In [19]:
#Describe the field unitsSold
Sales_country.unitsSold.describe()
Out[19]:
In [20]:
#Describe the field custCountry
Sales_country.custCountry.describe() #describe wont give much info about string variable, so we will create frequency table
Out[20]:
In [21]:
Sales_country.custCountry.value_counts() #frequency table
Out[21]:
In [23]:
#Create a new dataset by taking first 30 observations from this data
sales_new=Sales_country.head(30)
In [24]:
#Print the resultant data
print(sales_new)
In [25]:
#Remove(delete) the new dataset
del(sales_new)
The next post is about manipulating datasets in python.
Link to the next post : https://statinfer.com/104-2-3-manipulting-datasets-in-python/