• No products in the cart.

103.2.3.a Sub Setting-An Example

Learn through examples

In previous section we discussed about  Sub Setting

As a bank manager you would like to know the quality of the customers which the bank is catering to.This would help you in profiling out some target customers. You may want to rollout some marketing offers to the selected customers of the bank.

Data : “./Bank Marketing/bank_market.csv”
1.Create separate datasets for each of the below tasks
2. Select first 1000 rows only
3. Select only four columns “Cust_num”  “age” “default” and  “balance”
4. Select 20,000 to 40,000 observations along with four variables  “Cust_num”  “job”       “marital” and   “education”
5. Select 5000 to 6000 observations drop  “poutcome” and  “y”

 Solutions

  1. Data : “./Bank Marketing/bank_market.csv”
>bank <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Bank Tele >Marketing\\bank_market.csv")
>head(bank)
##   Cust_num age          job marital education default balance housing loan
## 1        1  58   management married  tertiary      no    2143     yes   no
## 2        2  44   technician  single secondary      no      29     yes   no
## 3        3  33 entrepreneur married secondary      no       2     yes  yes
## 4        4  47  blue-collar married   unknown      no    1506     yes   no
## 5        5  33      unknown  single   unknown      no       1      no   no
## 6        6  35   management married  tertiary      no     231     yes   no
##   contact day month duration campaign pdays previous poutcome  y
## 1 unknown   5   may      261        1    -1        0  unknown no
## 2 unknown   5   may      151        1    -1        0  unknown no
## 3 unknown   5   may       76        1    -1        0  unknown no
## 4 unknown   5   may       92        1    -1        0  unknown no
## 5 unknown   5   may      198        1    -1        0  unknown no
## 6 unknown   5   may      139        1    -1        0  unknown no

Create separate datasets for each of the below tasks

  1. Select first 1000 rows only
>bank_modified <- bank[c(1:1000),]
>dim(bank_modified)

 

## [1] 1000   18
  1. Select only four columns “Cust_num” “age” “default” and “balance”
>bank_modified1 <- bank[,c("Cust_num","age","default","balance")]
>head(bank_modified1)
##   Cust_num age default balance
## 1        1  58      no    2143
## 2        2  44      no      29
## 3        3  33      no       2
## 4        4  47      no    1506
## 5        5  33      no       1
## 6        6  35      no     231
  1. Select 20,000 to 40,000 observations along with four variables “Cust_num” “job” “marital” and “education”
>bank_modified2 <- bank[c(20000:40000), c("Cust_num","job","marital","education")]
>head(bank_modified2)

 

##       Cust_num        job marital education
## 20000    20000 management married  tertiary
## 20001    20001  housemaid married   primary
## 20002    20002 management married  tertiary
## 20003    20003 technician married secondary
## 20004    20004 technician  single  tertiary
## 20005    20005 management married  tertiary
  1. Select 5000 to 6000 observations drop “poutcome” and “y”
>bank_modified3 <- bank[c(5000:6000), !(colnames(bank) %in% c("poutcome","y"))]
>head(bank_modified3)
##      Cust_num age        job marital education default balance housing
## 5000     5000  32 technician  single   unknown      no     317     yes
## 5001     5001  32 management  single  tertiary      no     728     yes
## 5002     5002  38   services married  tertiary      no    -121     yes
## 5003     5003  29 management  single  tertiary      no     330     yes
## 5004     5004  31 management  single  tertiary      no     825     yes
## 5005     5005  36 management  single  tertiary      no     247      no
##      loan contact day month duration campaign pdays previous
## 5000   no unknown  21   may      394        2    -1        0
## 5001   no unknown  21   may      125        1    -1        0
## 5002   no unknown  21   may      288        1    -1        0
## 5003   no unknown  21   may      315        1    -1        0
## 5004   no unknown  21   may      506        2    -1        0
## 5005   no unknown  21   may      354        5    -1        0

Subset with variable Filter Conditions

Subsetting can also be done by using various filters. Say we have a data set for 5000 employees, and we want to find out all the employees who have a salary more than 100,000 per month. Using filters, we can find the records of employees with a salary more than 100,000. Few of the conditional operators which can be used to filter out the datasets include AND (&) operator, OR (|) operator, NOT (!) operator, Greater than or less than (> or <), equal to (==) operator and many more.

Syntax

>new_data <- subset(old_data, condition1 | conditon2)

Examples stating the use of these conditional operators are shown below.

And condition

In AND condition, we can mention 2 conditions and here the filtering is done on the basis of satisfaction of both the criteria. The sample code is as shown:

>bank_subset1 <- subset(bank, age > 40 & loan == "no")
>head(bank_subset1)

 

##    Cust_num age          job  marital education default balance housing
## 1         1  58   management  married  tertiary      no    2143     yes
## 2         2  44   technician   single secondary      no      29     yes
## 4         4  47  blue-collar  married   unknown      no    1506     yes
## 8         8  42 entrepreneur divorced  tertiary     yes       2     yes
## 9         9  58      retired  married   primary      no     121     yes
## 10       10  43   technician   single secondary      no     593     yes
##    loan contact day month duration campaign pdays previous poutcome  y
## 1    no unknown   5   may      261        1    -1        0  unknown no
## 2    no unknown   5   may      151        1    -1        0  unknown no
## 4    no unknown   5   may       92        1    -1        0  unknown no
## 8    no unknown   5   may      380        1    -1        0  unknown no
## 9    no unknown   5   may       50        1    -1        0  unknown no
## 10   no unknown   5   may       55        1    -1        0  unknown no

The code above would return the data for the individuals who have age more than 40 and who did not take any loan.

OR condition

In OR condition, filtering is done on the basis of satisfaction of any one of the criteria. The sample code is as shown:

>bank_subset2 <- subset(bank, age > 40 | loan == "no")
>head(bank_subset2)
##   Cust_num age          job  marital education default balance housing
## 1        1  58   management  married  tertiary      no    2143     yes
## 2        2  44   technician   single secondary      no      29     yes
## 4        4  47  blue-collar  married   unknown      no    1506     yes
## 5        5  33      unknown   single   unknown      no       1      no
## 6        6  35   management  married  tertiary      no     231     yes
## 8        8  42 entrepreneur divorced  tertiary     yes       2     yes
##   loan contact day month duration campaign pdays previous poutcome  y
## 1   no unknown   5   may      261        1    -1        0  unknown no
## 2   no unknown   5   may      151        1    -1        0  unknown no
## 4   no unknown   5   may       92        1    -1        0  unknown no
## 5   no unknown   5   may      198        1    -1        0  unknown no
## 6   no unknown   5   may      139        1    -1        0  unknown no
## 8   no unknown   5   may      380        1    -1        0  unknown no

The code above would return the data for the individuals who have age more than 40 or who did not take loan.

Customization can also be done applying combination of different filtering operands as per our need.AND, OR condition with Numeric and Character filters

AND, OR condition with Numeric and Character filters

>bank_subset3<-subset(bank, (age>40 &  loan=="no") | marital=="single" )
>head(bank_subset3)
Cust_num age          job  marital education default balance housing
## 1        1  58   management  married  tertiary      no    2143     yes
## 2        2  44   technician   single secondary      no      29     yes
## 4        4  47  blue-collar  married   unknown      no    1506     yes
## 5        5  33      unknown   single   unknown      no       1      no
## 7        7  28   management   single  tertiary      no     447     yes
## 8        8  42 entrepreneur divorced  tertiary     yes       2     yes
##   loan contact day month duration campaign pdays previous poutcome  y
## 1   no unknown   5   may      261        1    -1        0  unknown no
## 2   no unknown   5   may      151        1    -1        0  unknown no
## 4   no unknown   5   may       92        1    -1        0  unknown no
## 5   no unknown   5   may      198        1    -1        0  unknown no
## 7  yes unknown   5   may      217        1    -1        0  unknown no
## 8   no unknown   5   may      380        1    -1        0  unknown no

AND, OR condition with Numeric and Character filters & selected fields

>market4<-subset(bank, (age>40 &  loan=="no") | marital=="single" , select=c (Cust_num , age ,   marital,    loan ))
>head(market4)
Cust_num age  marital loan
## 1        1  58  married   no
## 2        2  44   single   no
## 4        4  47  married   no
## 5        5  33   single   no
## 7        7  28   single  yes
## 8        8  42 divorced   no

AND, OR condition with Numeric and Character filters & dropping few fields

>market5<-subset(bank, (age>40 &  loan=="no") | marital=="single" , select=-c (poutcome,y, balance, contact))
>head(market5)
##   Cust_num age          job  marital education default housing loan day
## 1        1  58   management  married  tertiary      no     yes   no   5
## 2        2  44   technician   single secondary      no     yes   no   5
## 4        4  47  blue-collar  married   unknown      no     yes   no   5
## 5        5  33      unknown   single   unknown      no      no   no   5
## 7        7  28   management   single  tertiary      no     yes  yes   5
## 8        8  42 entrepreneur divorced  tertiary     yes     yes   no   5
##   month duration campaign pdays previous
## 1   may      261        1    -1        0
## 2   may      151        1    -1        0
## 4   may       92        1    -1        0
## 5   may      198        1    -1        0
## 7   may      217        1    -1        0
## 8   may      380        1    -1        0

In the next post we will see Sub Setting Example 2.

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.