In previous section we discussed about Sub Setting
As a bank manager you would like to know the quality of the customers which the bank is catering to.This would help you in profiling out some target customers. You may want to rollout some marketing offers to the selected customers of the bank.
Data : “./Bank Marketing/bank_market.csv”
1.Create separate datasets for each of the below tasks
2. Select first 1000 rows only
3. Select only four columns “Cust_num” “age” “default” and “balance”
4. Select 20,000 to 40,000 observations along with four variables “Cust_num” “job” “marital” and “education”
5. Select 5000 to 6000 observations drop “poutcome” and “y”
Solutions
- Data : “./Bank Marketing/bank_market.csv”
>bank <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Bank Tele >Marketing\\bank_market.csv") >head(bank)
## Cust_num age job marital education default balance housing loan ## 1 1 58 management married tertiary no 2143 yes no ## 2 2 44 technician single secondary no 29 yes no ## 3 3 33 entrepreneur married secondary no 2 yes yes ## 4 4 47 blue-collar married unknown no 1506 yes no ## 5 5 33 unknown single unknown no 1 no no ## 6 6 35 management married tertiary no 231 yes no ## contact day month duration campaign pdays previous poutcome y ## 1 unknown 5 may 261 1 -1 0 unknown no ## 2 unknown 5 may 151 1 -1 0 unknown no ## 3 unknown 5 may 76 1 -1 0 unknown no ## 4 unknown 5 may 92 1 -1 0 unknown no ## 5 unknown 5 may 198 1 -1 0 unknown no ## 6 unknown 5 may 139 1 -1 0 unknown no
Create separate datasets for each of the below tasks
- Select first 1000 rows only
>bank_modified <- bank[c(1:1000),] >dim(bank_modified)
## [1] 1000 18
- Select only four columns “Cust_num” “age” “default” and “balance”
>bank_modified1 <- bank[,c("Cust_num","age","default","balance")] >head(bank_modified1)
## Cust_num age default balance ## 1 1 58 no 2143 ## 2 2 44 no 29 ## 3 3 33 no 2 ## 4 4 47 no 1506 ## 5 5 33 no 1 ## 6 6 35 no 231
- Select 20,000 to 40,000 observations along with four variables “Cust_num” “job” “marital” and “education”
>bank_modified2 <- bank[c(20000:40000), c("Cust_num","job","marital","education")] >head(bank_modified2)
## Cust_num job marital education ## 20000 20000 management married tertiary ## 20001 20001 housemaid married primary ## 20002 20002 management married tertiary ## 20003 20003 technician married secondary ## 20004 20004 technician single tertiary ## 20005 20005 management married tertiary
- Select 5000 to 6000 observations drop “poutcome” and “y”
>bank_modified3 <- bank[c(5000:6000), !(colnames(bank) %in% c("poutcome","y"))] >head(bank_modified3)
## Cust_num age job marital education default balance housing ## 5000 5000 32 technician single unknown no 317 yes ## 5001 5001 32 management single tertiary no 728 yes ## 5002 5002 38 services married tertiary no -121 yes ## 5003 5003 29 management single tertiary no 330 yes ## 5004 5004 31 management single tertiary no 825 yes ## 5005 5005 36 management single tertiary no 247 no ## loan contact day month duration campaign pdays previous ## 5000 no unknown 21 may 394 2 -1 0 ## 5001 no unknown 21 may 125 1 -1 0 ## 5002 no unknown 21 may 288 1 -1 0 ## 5003 no unknown 21 may 315 1 -1 0 ## 5004 no unknown 21 may 506 2 -1 0 ## 5005 no unknown 21 may 354 5 -1 0
Subset with variable Filter Conditions
Subsetting can also be done by using various filters. Say we have a data set for 5000 employees, and we want to find out all the employees who have a salary more than 100,000 per month. Using filters, we can find the records of employees with a salary more than 100,000. Few of the conditional operators which can be used to filter out the datasets include AND (&) operator, OR (|) operator, NOT (!) operator, Greater than or less than (> or <), equal to (==) operator and many more.
Syntax
>new_data <- subset(old_data, condition1 | conditon2)
Examples stating the use of these conditional operators are shown below.
And condition
In AND condition, we can mention 2 conditions and here the filtering is done on the basis of satisfaction of both the criteria. The sample code is as shown:
>bank_subset1 <- subset(bank, age > 40 & loan == "no") >head(bank_subset1)
## Cust_num age job marital education default balance housing ## 1 1 58 management married tertiary no 2143 yes ## 2 2 44 technician single secondary no 29 yes ## 4 4 47 blue-collar married unknown no 1506 yes ## 8 8 42 entrepreneur divorced tertiary yes 2 yes ## 9 9 58 retired married primary no 121 yes ## 10 10 43 technician single secondary no 593 yes ## loan contact day month duration campaign pdays previous poutcome y ## 1 no unknown 5 may 261 1 -1 0 unknown no ## 2 no unknown 5 may 151 1 -1 0 unknown no ## 4 no unknown 5 may 92 1 -1 0 unknown no ## 8 no unknown 5 may 380 1 -1 0 unknown no ## 9 no unknown 5 may 50 1 -1 0 unknown no ## 10 no unknown 5 may 55 1 -1 0 unknown no
The code above would return the data for the individuals who have age more than 40 and who did not take any loan.
OR condition
In OR condition, filtering is done on the basis of satisfaction of any one of the criteria. The sample code is as shown:
>bank_subset2 <- subset(bank, age > 40 | loan == "no") >head(bank_subset2)
## Cust_num age job marital education default balance housing ## 1 1 58 management married tertiary no 2143 yes ## 2 2 44 technician single secondary no 29 yes ## 4 4 47 blue-collar married unknown no 1506 yes ## 5 5 33 unknown single unknown no 1 no ## 6 6 35 management married tertiary no 231 yes ## 8 8 42 entrepreneur divorced tertiary yes 2 yes ## loan contact day month duration campaign pdays previous poutcome y ## 1 no unknown 5 may 261 1 -1 0 unknown no ## 2 no unknown 5 may 151 1 -1 0 unknown no ## 4 no unknown 5 may 92 1 -1 0 unknown no ## 5 no unknown 5 may 198 1 -1 0 unknown no ## 6 no unknown 5 may 139 1 -1 0 unknown no ## 8 no unknown 5 may 380 1 -1 0 unknown no
The code above would return the data for the individuals who have age more than 40 or who did not take loan.
Customization can also be done applying combination of different filtering operands as per our need.AND, OR condition with Numeric and Character filters
AND, OR condition with Numeric and Character filters
>bank_subset3<-subset(bank, (age>40 & loan=="no") | marital=="single" ) >head(bank_subset3)
Cust_num age job marital education default balance housing ## 1 1 58 management married tertiary no 2143 yes ## 2 2 44 technician single secondary no 29 yes ## 4 4 47 blue-collar married unknown no 1506 yes ## 5 5 33 unknown single unknown no 1 no ## 7 7 28 management single tertiary no 447 yes ## 8 8 42 entrepreneur divorced tertiary yes 2 yes ## loan contact day month duration campaign pdays previous poutcome y ## 1 no unknown 5 may 261 1 -1 0 unknown no ## 2 no unknown 5 may 151 1 -1 0 unknown no ## 4 no unknown 5 may 92 1 -1 0 unknown no ## 5 no unknown 5 may 198 1 -1 0 unknown no ## 7 yes unknown 5 may 217 1 -1 0 unknown no ## 8 no unknown 5 may 380 1 -1 0 unknown no
AND, OR condition with Numeric and Character filters & selected fields
>market4<-subset(bank, (age>40 & loan=="no") | marital=="single" , select=c (Cust_num , age , marital, loan )) >head(market4)
Cust_num age marital loan ## 1 1 58 married no ## 2 2 44 single no ## 4 4 47 married no ## 5 5 33 single no ## 7 7 28 single yes ## 8 8 42 divorced no
AND, OR condition with Numeric and Character filters & dropping few fields
>market5<-subset(bank, (age>40 & loan=="no") | marital=="single" , select=-c (poutcome,y, balance, contact)) >head(market5)
## Cust_num age job marital education default housing loan day ## 1 1 58 management married tertiary no yes no 5 ## 2 2 44 technician single secondary no yes no 5 ## 4 4 47 blue-collar married unknown no yes no 5 ## 5 5 33 unknown single unknown no no no 5 ## 7 7 28 management single tertiary no yes yes 5 ## 8 8 42 entrepreneur divorced tertiary yes yes no 5 ## month duration campaign pdays previous ## 1 may 261 1 -1 0 ## 2 may 151 1 -1 0 ## 4 may 92 1 -1 0 ## 5 may 198 1 -1 0 ## 7 may 217 1 -1 0 ## 8 may 380 1 -1 0
In the next post we will see Sub Setting Example 2.