Link to the previous post : https://statinfer.com/104-2-4-practice-manipulating-dataset-in-python/
In previous posts we saw how to create subsets in python using pandas library and practiced the same.
In this post we will try to create subsets with variable filter conditions. We will also practice the same on a different dataset.
bank_subset1=bank_data[(bank_data['age']>40) & (bank_data['loan']=="no")]
bank_subset1.head(5)
bank_subset2=bank_data[(bank_data['age']>40) | (bank_data['loan']=="no")]
bank_subset2.head(5)
bank_subset3= bank_data[(bank_data['age']>40) & (bank_data['loan']=="no") | (bank_data['marital']=="single" )]
bank_subset3.head(5)
auto_data=pd.read_csv("C:\\Amrita\\Datavedi\\Automobile Data Set\\AutoDataset.csv")
auto_data.shape
auto_data.columns.values
# Create a new dataset for exclusively Toyota cars
auto_data1=auto_data[(auto_data[' make']=="toyota")]
auto_data1.head(5)
#Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.
auto_data2=auto_data[(auto_data[' city-mpg']>30) & (auto_data[ ' engine-size']<120)]
auto_data2.head(5)
#Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.
auto_data3=auto_data[auto_data[' body-style']=='sedan']
auto_data3.head(5)
auto_data4=auto_data3[[' make',' body-style',' fuel-type',' price']]
auto_data4.head(5)
#Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)
auto_data5=auto_data[(auto_data[' make']=='audi') | (auto_data[' make']=='bmw') | (auto_data[' make']=='porsche') ]
auto_data5.head(5)
auto_data6=auto_data5.drop([' price',' normalized-losses'],axis=1)
auto_data6.head(5)