link to the previous post : https://statinfer.com/104-2-2-practice-working-with-datasets-in-python/

In this blog we will see how we can manipulate imported dataset into subsets.

Sub-setting the data

Dataset: “./World Bank Data/GDP.csv“

In [30]:

import pandas as pd   
#The below line may throw some error
gdp=pd.read_csv("datasets\\World Bank Data\\GDP.csv",encoding = "ISO-8859-1")

gdp.shape

Out[30]:

(194, 4)

In [29]:

gdp.columns.values

Out[29]:

array(['Country_code', 'Rank', 'Country', 'GDP'], dtype=object)

New dataset with selected rows

In [32]:

gdp1 = gdp.head(10)
gdp2=gdp.iloc[[2,9,15,25]]
print(gdp2)

   Country_code  Rank             Country      GDP
2           JPN     3               Japan  4601461
9           RUS    10  Russian Federation  1860598
15          IDN    16           Indonesia   888538
25          NOR    26              Norway   499817

New dataset by keeping selected columns

In [33]:

gdp3 = gdp[["Country", "Rank"]]
gdp3

Out[33]:

	Country	Rank
0	United States	1
1	China	2
2	Japan	3
3	Germany	4
4	United Kingdom	5
5	France	6
6	Brazil	7
7	Italy	8
8	India	9
9	Russian Federation	10
10	Canada	11
11	Australia	12
12	Korea, Rep.	13
13	Spain	14
14	Mexico	15
15	Indonesia	16
16	Netherlands	17
17	Turkey	18
18	Saudi Arabia	19
19	Switzerland	20
20	Sweden	21
21	Nigeria	22
22	Poland	23
23	Argentina	24
24	Belgium	25
25	Norway	26
26	Austria	27
27	Iran, Islamic Rep.	28
28	Thailand	29
29	United Arab Emirates	30
…	…	…
164	Maldives	165
165	Faeroe Islands	166
166	Lesotho	167
167	Liberia	168
168	Bhutan	169
169	Cabo Verde	170
170	Central African Republic	171
171	Belize	172
172	Djibouti	173
173	Seychelles	174
174	Timor-Leste	175
175	St. Lucia	176
176	Antigua and Barbuda	177
177	Solomon Islands	178
178	Guinea-Bissau	179
179	Grenada	180
180	Gambia, The	181
181	St. Kitts and Nevis	182
182	Vanuatu	183
183	Samoa	184
184	St. Vincent and the Grenadines	185
185	Comoros	186
186	Dominica	187
187	Tonga	188
188	São Tomé and Principe	189
189	Micronesia, Fed. Sts.	190
190	Palau	191
191	Marshall Islands	192
192	Kiribati	193
193	Tuvalu	194

194 rows × 2 columns

New dataset with selected rows and columns

In [34]:

gdp4 = gdp[["Country", "GDP"]][0:10]
gdp4

Out[34]:

	Country	GDP
0	United States	17419000
1	China	10354832
2	Japan	4601461
3	Germany	3868291
4	United Kingdom	2988893
5	France	2829192
6	Brazil	2346076
7	Italy	2141161
8	India	2048517
9	Russian Federation	1860598

New dataset with selected rows and excluding columns

In [35]:

gdp5=gdp.drop(["Country_code"], axis=1)[0:12]
gdp5

Out[35]:

	Rank	Country	GDP
0	1	United States	17419000
1	2	China	10354832
2	3	Japan	4601461
3	4	Germany	3868291
4	5	United Kingdom	2988893
5	6	France	2829192
6	7	Brazil	2346076
7	8	Italy	2141161
8	9	India	2048517
9	10	Russian Federation	1860598
10	11	Canada	1785387
11	12	Australia	1454675

The next post is a practice session on manipulating dataset in python.
Link to the next post : https://statinfer.com/104-2-4-practice-manipulating-dataset-in-python/

20th June 2017

104.2.3 Manipulating datasets in python

Subsetting the datasets.

Sub-setting the data

Statinfer

Statinfer

Statinfer

104.2.3 Manipulating datasets in python

Subsetting the datasets.

Sub-setting the data

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer