In previous post we saw Sub Setting Example 2
Calculated or the derived field is another important concept in data analysis. Sometimes not only the fields or the rows and columns of the raw data are sufficient for the data analysis, we might also have to do some operations and create some new fields. A new field can be added to the existing dataset by using $
Creating Calculated Fields in R
Consider the following syntax, where a new field sum is added(attached) to the dataset, where the content of this variable is the sum of the other two variables x1 and x2 in the dataset.
>dataset$sum <- dataset$x1 + dataset$x2
Average of two existing variables can be stored in a new variable in a dataset, as shown below
>dataset$mean <- (dataset$x1 + dataset$x2)/2
However, assigning this way is very time to consume as the user has to type the name of the dataset again and again.This process can be bypassed by attaching the dataset. Using this, the user need not refer to the dataset again and again.
Refer to the code below:
>attach(dataset) >dataset$sum <- x1 + x2 >dataset >dataset$mean <- (x1 + x2)/2 >detach(dataset) >dataset
Now let us do the following.
- Getting an idea on size of the car in Auto Data.
- Find out the Volume(length *width * height) of the car
>dim(auto_data) >auto_data$area<-(auto_data$length)*(auto_data$width)*(auto_data$height) >names(auto_data) >dim(auto_data)
## [1] 205 26 ## [1] "symboling" "normalized.losses" "make" ## [4] "fuel.type" "aspiration" "num.of.doors" ## [7] "body.style" "drive.wheels" "engine.location" ## [10] "wheel.base" "length" "width" ## [13] "height" "curb.weight" "engine.type" ## [16] "num.of.cylinders" "engine.size" "fuel.system" ## [19] "bore" "stroke" "compression.ratio" ## [22] "horsepower" "peak.rpm" "city.mpg" ## [25] "highway.mpg" "price" "area" ## [1] 205 27
As we can see, before creating a new variable area in AutoDataset, the number of rows and columns were 205, 26, and after creating the new field, the number of rows and columns became 205, and 27. Hence a new variable ‘area’ has been created . Creating a new variable by reducing the balance by 20%.
>bank$balance_new<-bank$balance*0.8 >summary(bank)
## [1] "Cust_num" "age" "job" "marital" "education" ## [6] "default" "balance" "housing" "loan" "contact" ## [11] "day" "month" "duration" "campaign" "pdays" ## [16] "previous" "poutcome" "y" "balance_new" ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -8019 72 448 1362 1428 102100 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -6415.0 57.6 358.4 1090.0 1142.0 81700.0
- A new variable balance_new is created which contains the data with balance 20% less (balance*0.8).
If-Else statements
- If-Else statement is used if we we have to place some conditions in our argument.
- It can be written as:
if(condition == true)
{
Syntax1
}
else
{
Syntax2
}
Consider the following example
>first_element <- c(5,6,7,8,9) >if(first_element[] > 9) { print("condition pass") }else { print("condition fail") }
[1] "condition fail" Warning message: In if (first_element[] > 3) { : the condition has length > 1 and only the first element will be used
The above code would print “condition fail” in the output as none of the elements in the variable first_element is greater than 9. Else should be written immediately after ‘}’ and not in the next line as it has been written in the above example.
If-then-Else Statement
If then Else, functions the same way as If Else. However, the difference lies in the syntax. The syntax for If then Else is as shown below
Newvar<-ifelse( Condition, True Value, False Value)
In arguments we give the condition, then the value when the condition is true and the value when the condition is false.
Consider the following examples:
We see if there are any missing values in horsepower.
>Sum(is.na(AutoDataset$horsepower))
If there are any missing value then replace it with -1 using ‘If-then-Else’ condition
>auto_data$horsepower_new<-ifelse(auto_data$horsepower=="?",-1, auto_data$horsepower) >auto_data$horsepower_new
## [1] 7 7 22 4 10 6 6 6 17 25 3 3 13 13 13 30 30 30 36 46 46 44 44 ## [24] 4 44 44 44 4 55 20 40 49 41 49 49 49 49 54 54 54 54 3 2 50 46 46 ## [47] 56 29 29 34 44 44 44 44 44 3 3 3 16 52 52 52 52 43 52 12 47 14 14 ## [70] 14 14 23 23 31 31 28 44 44 44 4 11 55 20 20 20 55 55 11 11 45 38 45 ## [93] 45 45 45 45 45 45 45 60 60 21 21 21 25 32 25 60 59 60 59 59 59 59 59 ## [116] 60 59 18 44 4 44 44 44 55 20 19 33 33 33 35 -1 -1 6 6 6 6 25 25 ## [139] 45 48 48 51 51 58 51 7 51 58 51 7 42 42 42 42 42 42 46 46 39 39 46 ## [162] 46 46 46 46 8 8 11 11 11 11 11 11 57 48 57 57 57 26 26 24 24 37 53 ## [185] 37 53 53 44 2 56 56 6 44 55 9 9 9 9 27 27 9 25 15 5 9
Replace missing at peak.rpm values by -1 using If then Else
>auto_data$peak_rpm_new<-ifelse(auto_data$peak.rpm=="?",-1,auto_data$peak.rpm) >auto_data$peak_rpm_new
>auto_data$peak_rpm_new<-ifelse(auto_data$peak.rpm=="?",-1,auto_data$peak.rpm) >auto_data$peak_rpm_new In the next post we will learn about Sorting of Data.