Link to the previous post :https://statinfer.com/204-3-1-decision-trees-in-python-segmentation/
The Decision Tree Approach
- The aim is to divide the whole population or the data set into segments.
- The segmentation need to be useful for business decision making.
- If one class is really dominating in a segments.
- Then it will be easy for us to classify the unknown items.
- Then its very easy for applying business strategy.
- For example:
- It takes no great skill to say that the customers have 50% chance to buy and 50% chance to not buy.
- A good splitting criterion segments the customers with 90% -10% buying probability, say Gender=“Female” customers have 5% buying probability and 95% not buying
Example Sales Segmentation Based on Age
Example Sales Segmentation Based on Gender
Main Questions
- Ok we are looking for pure segments
- Dataset has many attributes
- Which is the right attribute for pure segmentation?
- Can we start with any attribute?
- Which attribute to start from? – The best separating attribute
- Customer Age can impact the sales, gender can impact sales , customer place and demographics can impact the sales. How to identify the best attribute and the split?
The Splitting Criterion
- The best split is
- The split that does the best job of separating the data into groups
- Where a single class(either 0 or 1) predominates in each group
Example Sales Segmentation Based on Age
Example Sales Segmentation Based on Gender
The next post is about how decision tree splits works.
Link to the next post : https://statinfer.com/204-3-3-how-decision-tree-splits-works/