1SE rule details in CCP pruning of CART
several ways to choose the best tree in the pruned tree list
besttreefrom={cross?validation,smalldatasetsk?SE,largedatasetschoosethewiththelowesterror,largedatasetsbest\ tree from=\left\{ \begin{aligned} cross-validation ,small\ datasets \\ k-SE \ ,large\ datasets\\ choose\ the\ with\ the\ lowest\ error,large\ datasets \end{aligned} \right.best?treefrom=??????cross?validation,small?datasetsk?SE?,large?datasetschoose?the?with?the?lowest?error,large?datasets?
At first,"pruned tree"is an ambiguous word which has several meanings,Let’s
make a decision :
1.call the following bigger circle as “pruned tree” who contains the original root node.
1.call the following smaller circle as “pruned parts” who don’t contains the original root node.
CCP:cost complexity pruning
according to [1]:
according to[2]:
In summary:
when we get the pruned sequence after CCP Algorithm,the pruned tree sequence is:
[T1,T2,...TkT_1,T_2,...T_kT1?,T2?,...Tk?]
We do NOT select the candidate in the above sequence with the minimum MSE where MSE is computed with an independent validation datasets based on the pruned tree-Tkmodel,k∈[1,K]T_k model ,k\in[1,K]Tk?model,k∈[1,K]
We make a compromise:
we choose the subtree in above sequence whose MSE(computed with independent validatation datasets)lower than the"minimum MSE + 1·Standard Error of the pruned tree who owns the minimum MSE".
The number of pruned trees in the above sequence who satisfied the above requirement may be more than one.We just select the subtree who has the biggest k among these pruned trees who satisfy the requirement..
Such behavior loses some precision in terms of validation,but it improve the simplicity of the final tree from the above sequence.
Reference:
[1]Statistical Consulting Group
[2]Cost-Complexity Pruning Process - IBM
總結
以上是生活随笔為你收集整理的1SE rule details in CCP pruning of CART的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 访客统计
- 下一篇: haroopad夜间模式与数学公式显示