Link to the previous post : https://statinfer.com/104-2-4-practice-manipulating-dataset-in-python/
In previous posts we saw how to create subsets in python using pandas library and practiced the same.
In this post we will try to create subsets with variable filter conditions. We will also practice the same on a different dataset.
Subset with Variable Filter Conditions
- Selection with a condition on variables
- For example, selection of complains where budget is greater than $5000.
- And condition & filters
In [50]:
bank_subset1=bank_data[(bank_data['age']>40) & (bank_data['loan']=="no")]
bank_subset1.head(5)
Out[50]:
- OR condition & filters
In [51]:
bank_subset2=bank_data[(bank_data['age']>40) | (bank_data['loan']=="no")]
bank_subset2.head(5)
Out[51]:
- AND, OR condition Numeric and Character filters
In [53]:
bank_subset3= bank_data[(bank_data['age']>40) & (bank_data['loan']=="no") | (bank_data['marital']=="single" )]
bank_subset3.head(5)
Out[53]:
Practice : Subset with variable filter conditions
- Data : “./Automobile Data Set/AutoDataset.csv”
- Create a new dataset for exclusively Toyota cars
- Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.
- Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.
- Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)
In [4]:
auto_data=pd.read_csv("C:\\Amrita\\Datavedi\\Automobile Data Set\\AutoDataset.csv")
auto_data.shape
Out[4]:
In [68]:
auto_data.columns.values
Out[68]:
In [69]:
# Create a new dataset for exclusively Toyota cars
auto_data1=auto_data[(auto_data[' make']=="toyota")]
auto_data1.head(5)
Out[69]:
In [70]:
#Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.
auto_data2=auto_data[(auto_data[' city-mpg']>30) & (auto_data[ ' engine-size']<120)]
auto_data2.head(5)
Out[70]:
In [5]:
#Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.
auto_data3=auto_data[auto_data[' body-style']=='sedan']
auto_data3.head(5)
Out[5]:
In [6]:
auto_data4=auto_data3[[' make',' body-style',' fuel-type',' price']]
auto_data4.head(5)
Out[6]:
In [7]:
#Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)
auto_data5=auto_data[(auto_data[' make']=='audi') | (auto_data[' make']=='bmw') | (auto_data[' make']=='porsche') ]
auto_data5.head(5)
Out[7]:
In [8]:
auto_data6=auto_data5.drop([' price',' normalized-losses'],axis=1)
auto_data6.head(5)
Out[8]: