Link to the previous post : https://statinfer.com/104-2-3-manipulting-datasets-in-python/
In previous post we saw how we can manipulate a dataset using python. In this post we will put what we learned into practice.
Sub setting the data
- Data : “./Bank Marketing/bank_market.csv”
- Create separate datasets for each of the below tasks
- Select first 1000 rows only
- Select only four columns “Cust_num” “age” “default” and “balance”
- Select 20,000 to 40,000 observations along with four variables “Cust_num” “job” “marital” and “education”
- Select 5000 to 6000 observations drop “poutcome“ and “y”
In [37]:
bank_data=pd.read_csv("datasets\\Bank Marketing\\bank_market.csv")
bank_data.shape
Out[37]:
In [38]:
bank_data.columns.values
Out[38]:
In [48]:
#Select first 1000 rows only
bank_data1 = bank_data.head(1000)
bank_data1.head(5)
Out[48]:
In [47]:
#Select only four columns "Cust_num" "age” "default" and "balance"
bank_data2 = bank_data[["Cust_num", "age","default","balance"]]
bank_data2.head(5)
Out[47]:
In [46]:
#Select 20,000 to 40,000 observations along with four variables "Cust_num" "job" "marital" and "education"
bank_data3 = bank_data[["Cust_num", "job","marital","education"]][20000:40000]
bank_data3.head(5)
Out[46]:
In [45]:
#Select 5000 to 6000 observations drop "poutcome“ and "y"
bank_data4=bank_data.drop(['poutcome','y'], axis=1)[5000:6000]
bank_data4.head(5)
Out[45]:
</div>
The next post is about subsetting data with variable filter condition in python.
Link to the next post : https://statinfer.com/104-2-5-subsetting-data-with-variable-filter-condition-in-python/