• No products in the cart.

104.2.3 Manipulating datasets in python

Subsetting the datasets.

link to the previous post : https://statinfer.com/104-2-2-practice-working-with-datasets-in-python/

In this blog we will see how we can manipulate imported dataset into subsets.

Sub-setting the data

  • Dataset: “./World Bank Data/GDP.csv“
In [30]:
import pandas as pd   
#The below line may throw some error
gdp=pd.read_csv("datasets\\World Bank Data\\GDP.csv",encoding = "ISO-8859-1")

gdp.shape
Out[30]:
(194, 4)
In [29]:
gdp.columns.values
Out[29]:
array(['Country_code', 'Rank', 'Country', 'GDP'], dtype=object)
  • New dataset with selected rows
In [32]:
gdp1 = gdp.head(10)
gdp2=gdp.iloc[[2,9,15,25]]
print(gdp2)
   Country_code  Rank             Country      GDP
2           JPN     3               Japan  4601461
9           RUS    10  Russian Federation  1860598
15          IDN    16           Indonesia   888538
25          NOR    26              Norway   499817
  • New dataset by keeping selected columns
In [33]:
gdp3 = gdp[["Country", "Rank"]]
gdp3
Out[33]:
Country Rank
0 United States 1
1 China 2
2 Japan 3
3 Germany 4
4 United Kingdom 5
5 France 6
6 Brazil 7
7 Italy 8
8 India 9
9 Russian Federation 10
10 Canada 11
11 Australia 12
12 Korea, Rep. 13
13 Spain 14
14 Mexico 15
15 Indonesia 16
16 Netherlands 17
17 Turkey 18
18 Saudi Arabia 19
19 Switzerland 20
20 Sweden 21
21 Nigeria 22
22 Poland 23
23 Argentina 24
24 Belgium 25
25 Norway 26
26 Austria 27
27 Iran, Islamic Rep. 28
28 Thailand 29
29 United Arab Emirates 30
164 Maldives 165
165 Faeroe Islands 166
166 Lesotho 167
167 Liberia 168
168 Bhutan 169
169 Cabo Verde 170
170 Central African Republic 171
171 Belize 172
172 Djibouti 173
173 Seychelles 174
174 Timor-Leste 175
175 St. Lucia 176
176 Antigua and Barbuda 177
177 Solomon Islands 178
178 Guinea-Bissau 179
179 Grenada 180
180 Gambia, The 181
181 St. Kitts and Nevis 182
182 Vanuatu 183
183 Samoa 184
184 St. Vincent and the Grenadines 185
185 Comoros 186
186 Dominica 187
187 Tonga 188
188 São Tomé and Principe 189
189 Micronesia, Fed. Sts. 190
190 Palau 191
191 Marshall Islands 192
192 Kiribati 193
193 Tuvalu 194

194 rows × 2 columns

  • New dataset with selected rows and columns
In [34]:
gdp4 = gdp[["Country", "GDP"]][0:10]
gdp4
Out[34]:
Country GDP
0 United States 17419000
1 China 10354832
2 Japan 4601461
3 Germany 3868291
4 United Kingdom 2988893
5 France 2829192
6 Brazil 2346076
7 Italy 2141161
8 India 2048517
9 Russian Federation 1860598

New dataset with selected rows and excluding columns

In [35]:
gdp5=gdp.drop(["Country_code"], axis=1)[0:12]
gdp5
Out[35]:
Rank Country GDP
0 1 United States 17419000
1 2 China 10354832
2 3 Japan 4601461
3 4 Germany 3868291
4 5 United Kingdom 2988893
5 6 France 2829192
6 7 Brazil 2346076
7 8 Italy 2141161
8 9 India 2048517
9 10 Russian Federation 1860598
10 11 Canada 1785387
11 12 Australia 1454675

The next post is a practice session on manipulating dataset in python.
Link to the next post : https://statinfer.com/104-2-4-practice-manipulating-dataset-in-python/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.