In previous section we saw Sub Setting Example 1
Here we have to import the automobile dataset and then perform various operations in it.
- Create a new dataset for exclusively Toyota cars
- Import : “./Automobile Data Set/AutoDataset.csv”
- Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.
- Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.
- Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)
Solutions
1.Import : “./Automobile Data Set/AutoDataset.csv”
>auto_data <- read.csv("C:\\Amrita\\Datavedi\\Automobile Data Set\\AutoDataset.csv")
2.Create a new dataset for exclusively Toyota cars
>toyota_data <- subset(auto_data, make == "toyota")
>head(toyota_data)
symboling normalized.losses make fuel.type aspiration num.of.doors
## 151 1 87 toyota gas std two
## 152 1 87 toyota gas std two
## 153 1 74 toyota gas std four
## 154 0 77 toyota gas std four
## 155 0 81 toyota gas std four
## 156 0 91 toyota gas std four
## body.style drive.wheels engine.location wheel.base length width height
## 151 hatchback fwd front 95.7 158.7 63.6 54.5
## 152 hatchback fwd front 95.7 158.7 63.6 54.5
## 153 hatchback fwd front 95.7 158.7 63.6 54.5
## 154 wagon fwd front 95.7 169.7 63.6 59.1
## 155 wagon 4wd front 95.7 169.7 63.6 59.1
## 156 wagon 4wd front 95.7 169.7 63.6 59.1
## curb.weight engine.type num.of.cylinders engine.size fuel.system bore
## 151 1985 ohc four 92 2bbl 3.05
## 152 2040 ohc four 92 2bbl 3.05
## 153 2015 ohc four 92 2bbl 3.05
## 154 2280 ohc four 92 2bbl 3.05
## 155 2290 ohc four 92 2bbl 3.05
## 156 3110 ohc four 92 2bbl 3.05
## stroke compression.ratio horsepower peak.rpm city.mpg highway.mpg
## 151 3.03 9 62 4800 35 39
## 152 3.03 9 62 4800 31 38
## 153 3.03 9 62 4800 31 38
## 154 3.03 9 62 4800 31 37
## 155 3.03 9 62 4800 27 32
## 156 3.03 9 62 4800 27 32
## price
## 151 5348
## 152 6338
## 153 6488
## 154 6918
## 155 7898
## 156 8778
3.Create a new dataset for all cars with city.mpg greater than 30 and engine size is less than 120.
>auto_data1 <- subset(auto_data, (city.mpg > 30) & (engine.size < 120)) >head(auto_data1)
## symboling normalized.losses make fuel.type aspiration num.of.doors ## 19 2 121 chevrolet gas std two ## 20 1 98 chevrolet gas std two ## 21 0 81 chevrolet gas std four ## 22 1 118 dodge gas std two ## 23 1 118 dodge gas std two ## 25 1 148 dodge gas std four ## body.style drive.wheels engine.location wheel.base length width height ## 19 hatchback fwd front 88.4 141.1 60.3 53.2 ## 20 hatchback fwd front 94.5 155.9 63.6 52.0 ## 21 sedan fwd front 94.5 158.8 63.6 52.0 ## 22 hatchback fwd front 93.7 157.3 63.8 50.8 ## 23 hatchback fwd front 93.7 157.3 63.8 50.8 ## 25 hatchback fwd front 93.7 157.3 63.8 50.6 ## curb.weight engine.type num.of.cylinders engine.size fuel.system bore ## 19 1488 l three 61 2bbl 2.91 ## 20 1874 ohc four 90 2bbl 3.03 ## 21 1909 ohc four 90 2bbl 3.03 ## 22 1876 ohc four 90 2bbl 2.97 ## 23 1876 ohc four 90 2bbl 2.97 ## 25 1967 ohc four 90 2bbl 2.97 ## stroke compression.ratio horsepower peak.rpm city.mpg highway.mpg price ## 19 3.03 9.50 48 5100 47 53 5151 ## 20 3.11 9.60 70 5400 38 43 6295 ## 21 3.11 9.60 70 5400 38 43 6575 ## 22 3.23 9.41 68 5500 37 41 5572 ## 23 3.23 9.40 68 5500 31 38 6377 ## 25 3.23 9.40 68 5500 31 38 622
4.Create a new dataset by taking only sedan cars. Keep only four variables(Make, body style, fuel type, price) in the final dataset.
>auto_data2 <- subset(auto_data, body.style == "sedan" , select = c(make, body.style,fuel.type,price))
>head(auto_data2)
make body.style fuel.type price
## 4 audi sedan gas 13950
## 5 audi sedan gas 17450
## 6 audi sedan gas 15250
## 7 audi sedan gas 17710
## 9 audi sedan gas 23875
## 11 bmw sedan gas 16430
5.Create a new dataset by taking Audi, BMW or Porsche company makes. Drop two variables from the resultant dataset(price and normalized losses)
auto_data3 <- subset(auto_data, (make == "audi") | (make == "bmw") | (make == "porsche"), select = c(-price, -normalized.losses))
head(auto_data3)
## symboling make fuel.type aspiration num.of.doors body.style drive.wheels
## 4 2 audi gas std four sedan fwd
## 5 2 audi gas std four sedan 4wd
## 6 2 audi gas std two sedan fwd
## 7 1 audi gas std four sedan fwd
## 8 1 audi gas std four wagon fwd
## 9 1 audi gas turbo four sedan fwd
## engine.location wheel.base length width height curb.weight engine.type
## 4 front 99.8 176.6 66.2 54.3 2337 ohc
## 5 front 99.4 176.6 66.4 54.3 2824 ohc
## 6 front 99.8 177.3 66.3 53.1 2507 ohc
## 7 front 105.8 192.7 71.4 55.7 2844 ohc
## 8 front 105.8 192.7 71.4 55.7 2954 ohc
## 9 front 105.8 192.7 71.4 55.9 3086 ohc
## num.of.cylinders engine.size fuel.system bore stroke compression.ratio
## 4 four 109 mpfi 3.19 3.4 10.0
## 5 five 136 mpfi 3.19 3.4 8.0
## 6 five 136 mpfi 3.19 3.4 8.5
## 7 five 136 mpfi 3.19 3.4 8.5
## 8 five 136 mpfi 3.19 3.4 8.5
## 9 five 131 mpfi 3.13 3.4 8.3
## horsepower peak.rpm city.mpg highway.mpg
## 4 102 5500 24 30
## 5 115 5500 18 22
## 6 110 5500 19 25
## 7 110 5500 19 25
## 8 110 5500 19 25
## 9 140 5500 17 20
With these two examples we have learned much more about subsetting in R.
In the next post we will see Calculated Fields in R.