Basic algorithm - 2023.2 English

Starting from the root node, compute all information gains of all features’ splits to select the best partition as the judging conditions. Following above rules to grow tree nodes, until reach stopping rules:

a.The node depth is equal to the maxDepth training parameter. b.No split candidate leads to an information gain greater than minInfoGain. c.No split candidate produces child nodes which each have at least minInstancesPerNode training instances.

Node impurity and information gain: The node impurity is a measure of the homogeneity of the labels at the node. The information gain is the difference between the parent node impurity and the weighted sum of the two child node impurities. We use gini impurity and variance impurity for classification and regression scenario espectively. Meanwhile, information gain is used to find the best feature split in our implementation.

Caution

Current implementation provides one impurity measure (gini) for classification, and one impurity (variance) for regression. Entropy (only for regression) is to be extended.