203.3.6 The Decision Tree Algorithm

The Decision tree Algorithm

In previous section, we studied about Information Gain in Decision Tree Split

The major step is to identify the best split variables and best split criteria
Once we have the split then we have to go to segment level and drill down further

Until stopped:

Select a leaf node
Find the best splitting attribute
Spilt the node using the attribute
Go to each child node and repeat step 2 & 3

Stopping criteria:

Each leaf-node contains examples of one type
Algorithm ran out of attributes
No further significant information gain

The Decision tree Algorithm – Demo

Entropy([4+,10-]) Ovearll = 86.3% (Impurity)

Entropy([7+,1-]) Male= 54.3%
Entropy([3+,3-]) Female = 100%
Information Gain for Gender=86.3-((8/14)54.3+(6/14)100) =12.4

Entropy([4+,10-]) Ovearll = 86.3% (Impurity)

Entropy([0+,9-]) Married = 0%
Entropy([4+,1-]) Un Married= 72.1%
Information Gain for Marital Status=86.3-((9/14)0+(5/14)72.1)=60.5
The information gain for Marital Status is high, so it has to be the first variable for segmentation

Now we consider the segment “Married” and repeat the same process of looking for the best splitting variable for this sub segment ### The Decision tree Algorithm

Until stopped: 1. Select a leaf node 2. Find the best splitting attribute 3. Spilt the node using the attribute 4. Go to each child node and repeat step 2 & 3 Stopping criteria: – Each leaf-node contains examples of one type – Algorithm ran out of attributes – No further significant information gain

Many Splits for a Single Variable

Sometimes we may find multiple values taken by a variable
which will lead to multiple split options for a single variable
that will give us multiple information gain values for a single variable

What is the information gain for income?

What is the information gain for income?
There are multiple options to calculate Information gain
For income, we will consider all possible scenarios and calculate the information gain for each scenario
The best split is the one with highest information gain
Within income, out of all the options, the split with best information gain is considered
So, node partitioning for multi class attributes need to be included in the decision tree algorithm
We need find best splitting attribute along with best split rule

The Decision tree Algorithm- Full version

Until stopped: 1. Select a leaf node 2. Select an attribute – Partition the node population and calculate information gain. – Find the split with maximum information gain for this attribute 3. Repeat this for all attributes – Find the best splitting attribute along with best split rule 4. Spilt the node using the attribute 5. Go to each child node and repeat step 2 to 4

Stopping criteria:

Each leaf-node contains examples of one type
Algorithm ran out of attributes
No further significant information gain

The next post is about Building a Decision Tree in R.

20th June 2017