Link to the previous post : https://statinfer.com/204-3-3-how-decision-tree-splits-works/
Entropy Calculation – Example
- Entropy at root
- Total population at root 100 [50+,50-]
- Entropy(S) = −p+log2p+−p−log2p−
- −0.5log2(0.5)−0.5log2(0.5)
- -(0.5)(-1) – (0.5)(-1)
- 1
- 100% Impurity at root
Entropy(S)=−(p+)(log2(p+))−(p−)(log2(p−))
Entropy Calculation
- Gender Splits the population into two segments
- Segment-1 : Age=”Young”
- Segment-2: Age=”Old”
- Entropy at segment-1
- Age=”Young” segment has 60 records [31+,29-]
Entropy(S)=−(p+)(log2(p+))−(p−)(log2(p−))
- −31/60log231/60−29/60log229/60
- (-31/60)log(31/60,2)-(29/60)log(29/60,2)
- 0.9991984 (99% Impurity in this segment)
- Age=”Young” segment has 60 records [31+,29-]
- Entropy at segment-2
- Age=”Old” segment has 40 records [19+,21-]
Entropy(S)=−(p+)(log2(p+))−(p−)(log2(p−))
- −19/40log219/40−21/40log221/40
- (-19/40)log(19/40,2)-(21/40)log(21/40,2)
- 0.9981959(99% Impurity in this segment too)
- Age=”Old” segment has 40 records [19+,21-]
Practice : Entropy Calculation – Example
- Calculate entropy at the root for the given population
- Calculate the entropy for the two distinct gender segments
Code- Entropy Calculation
- Entropy at root 100%
- Male Segment : (-48/60)log(48/60,2)-(12/60)log(12/60,2)
- 0.7219281
- FemaleSegment : (-2/40)log(2/40,2)-(38/40)log(38/40,2)
- 0.286397
The next post is about information gain in decision tree split.
Link to the next post : https://statinfer.com/204-3-5-information-gain-in-decision-tree-split/