This is for a Python data mining class. Decision trees splitting rules: entropy
ID: 3872756 • Letter: T
Question
This is for a Python data mining class.
Decision trees splitting rules: entropy or Gini. Using Gini impurity measure and entropy assess the two possible splits below, and identify the split that leads to higher reduction in impurity or higher information gain. The data is for 15 cases in Titanic dataset. D is for 'Died', S is for 'Survived'. Split 1: Age>42 Split 2: Ticket>18 S D S D SS D D D S WW D D SD D S DS D SS D D Ds D DS D D D S D SS D D D D DS D D SS WVW WAW DS DS D D S DS D DSS D DExplanation / Answer
Part a)
Gini Index for a given node t :
Part b)
Entropy for a given node t :
Split 1 #SURVIVED #DEAD GINI ROOT 6 9 1-(6/15)^2-(9/15)^2 = 0.48 LEFTNODE 3 7 1-(3/10)^2-(7/10)^2 = 0.42 RIGHTNODE 3 2 1-(3/5)^2-(2/5)^2 = 0.48