link to the previous post : https://statinfer.com/104-2-2-practice-working-with-datasets-in-python/
In this blog we will see how we can manipulate imported dataset into subsets.
Sub-setting the data
- Dataset: “./World Bank Data/GDP.csv“
Out[29]:
array(['Country_code', 'Rank', 'Country', 'GDP'], dtype=object)
- New dataset with selected rows
Country_code Rank Country GDP
2 JPN 3 Japan 4601461
9 RUS 10 Russian Federation 1860598
15 IDN 16 Indonesia 888538
25 NOR 26 Norway 499817
- New dataset by keeping selected columns
Out[33]:
|
Country |
Rank |
0 |
United States |
1 |
1 |
China |
2 |
2 |
Japan |
3 |
3 |
Germany |
4 |
4 |
United Kingdom |
5 |
5 |
France |
6 |
6 |
Brazil |
7 |
7 |
Italy |
8 |
8 |
India |
9 |
9 |
Russian Federation |
10 |
10 |
Canada |
11 |
11 |
Australia |
12 |
12 |
Korea, Rep. |
13 |
13 |
Spain |
14 |
14 |
Mexico |
15 |
15 |
Indonesia |
16 |
16 |
Netherlands |
17 |
17 |
Turkey |
18 |
18 |
Saudi Arabia |
19 |
19 |
Switzerland |
20 |
20 |
Sweden |
21 |
21 |
Nigeria |
22 |
22 |
Poland |
23 |
23 |
Argentina |
24 |
24 |
Belgium |
25 |
25 |
Norway |
26 |
26 |
Austria |
27 |
27 |
Iran, Islamic Rep. |
28 |
28 |
Thailand |
29 |
29 |
United Arab Emirates |
30 |
… |
… |
… |
164 |
Maldives |
165 |
165 |
Faeroe Islands |
166 |
166 |
Lesotho |
167 |
167 |
Liberia |
168 |
168 |
Bhutan |
169 |
169 |
Cabo Verde |
170 |
170 |
Central African Republic |
171 |
171 |
Belize |
172 |
172 |
Djibouti |
173 |
173 |
Seychelles |
174 |
174 |
Timor-Leste |
175 |
175 |
St. Lucia |
176 |
176 |
Antigua and Barbuda |
177 |
177 |
Solomon Islands |
178 |
178 |
Guinea-Bissau |
179 |
179 |
Grenada |
180 |
180 |
Gambia, The |
181 |
181 |
St. Kitts and Nevis |
182 |
182 |
Vanuatu |
183 |
183 |
Samoa |
184 |
184 |
St. Vincent and the Grenadines |
185 |
185 |
Comoros |
186 |
186 |
Dominica |
187 |
187 |
Tonga |
188 |
188 |
São Tomé and Principe |
189 |
189 |
Micronesia, Fed. Sts. |
190 |
190 |
Palau |
191 |
191 |
Marshall Islands |
192 |
192 |
Kiribati |
193 |
193 |
Tuvalu |
194 |
194 rows × 2 columns
- New dataset with selected rows and columns
Out[34]:
|
Country |
GDP |
0 |
United States |
17419000 |
1 |
China |
10354832 |
2 |
Japan |
4601461 |
3 |
Germany |
3868291 |
4 |
United Kingdom |
2988893 |
5 |
France |
2829192 |
6 |
Brazil |
2346076 |
7 |
Italy |
2141161 |
8 |
India |
2048517 |
9 |
Russian Federation |
1860598 |
New dataset with selected rows and excluding columns
Out[35]:
|
Rank |
Country |
GDP |
0 |
1 |
United States |
17419000 |
1 |
2 |
China |
10354832 |
2 |
3 |
Japan |
4601461 |
3 |
4 |
Germany |
3868291 |
4 |
5 |
United Kingdom |
2988893 |
5 |
6 |
France |
2829192 |
6 |
7 |
Brazil |
2346076 |
7 |
8 |
Italy |
2141161 |
8 |
9 |
India |
2048517 |
9 |
10 |
Russian Federation |
1860598 |
10 |
11 |
Canada |
1785387 |
11 |
12 |
Australia |
1454675 |
The next post is a practice session on manipulating dataset in python.
Link to the next post : https://statinfer.com/104-2-4-practice-manipulating-dataset-in-python/