當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习算法-随机森林之决策树R 代码从头暴力实现（3）

發布時間：2025/3/15 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习算法-随机森林之决策树R 代码从头暴力实现（3）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前文 (機器學習算法 - 隨機森林之決策樹初探（1）) 講述了決策樹的基本概念、決策評價標準并手算了單個變量、單個分組的Gini impurity。是一個基本概念學習的過程，如果不了解，建議先讀一下再繼續。

機器學習算法-隨機森林之決策樹R 代碼從頭暴力實現（2）通過 R 代碼從頭暴力方式自寫函數訓練決策樹，已決策出第一個節點。后續......

再決策第二個節點、第三個節點

第一個決策節點找好了，后續再找其它決策節點。如果某個分支的點從屬于多個class，則遞歸決策。

遞歸決策終止的條件是：

再添加分支不會降低Gini impurity

某個分支的數據點屬于同一分類組 (Gini impurity = 0)

定義函數如下：

brute_descition_tree_result <- list() brute_descition_tree_result_index <- 0# 遞歸分支決策 brute_descition_tree <- function(data, measure_variable, class_variable, type="Root"){# 計算初始Gini值Init_gini_impurity <- Gini_impurity(data[[class_variable]])# 確定當前需要決策的節點的最優變量和最優閾值brute_force_result <- Gini_impurity_for_all_possible_branches_of_all_variables(data, variables, class=class_variable, Init_gini_impurity=Init_gini_impurity)# 輸出中間計算結果print(brute_force_result)# 根據最優決策變量、閾值和Gini增益split_variable <- brute_force_result[1,1]split_threshold <- brute_force_result[1,2]gini_gain = brute_force_result[1,5]# print(gini_gain)# 判斷此次決策是否需要保留if(gini_gain>0){brute_descition_tree_result_index <<- brute_descition_tree_result_index + 1brute_descition_tree_result[[brute_descition_tree_result_index]] <<- c(type=type, split_variable=split_variable,split_threshold=split_threshold)# print(brute_descition_tree_result_index)# print(brute_descition_tree_result)# 決策左右分支left <- data[data[split_variable]<split_threshold,]right <- data[data[split_variable]>=split_threshold,]# 分別對左右分支進一步決策if(length(unique(left[[class_variable]]))>1){brute_descition_tree(data=left, measure_variable, class_variable,type=paste(brute_descition_tree_result_index, "left"))}if(length(unique(right[[class_variable]]))>1){brute_descition_tree(data=right, measure_variable, class_variable,type=paste(brute_descition_tree_result_index, "right"))}}# return(brute_descition_tree_result) }

調用函數，并輸出中間計算結果

brute_descition_tree(data, variables, "color")

根節點評估記錄

## Variable Threshold Left_branch Right_branch Gini_gain ## 5 x 1.95 blue x 3; red x 2 green x 5 0.38 ## 3 x 1.45 blue x 3 green x 5; red x 2 0.334285714285714 ## 31 y 1.75 blue x 3 green x 5; red x 2 0.334285714285714 ## 4 x 1.85 blue x 3; red x 1 green x 5; red x 1 0.303333333333333 ## 6 x 2.25 blue x 3; green x 1; red x 2 green x 4 0.253333333333333 ## 41 y 2.05 blue x 3; green x 1 green x 4; red x 2 0.203333333333333 ## 2 x 0.8 blue x 2 blue x 1; green x 5; red x 2 0.195 ## 21 y 1.25 blue x 2 blue x 1; green x 5; red x 2 0.195 ## 51 y 2.15 blue x 3; green x 1; red x 1 green x 4; red x 1 0.18 ## 7 x 2.75 blue x 3; green x 2; red x 2 green x 3 0.162857142857143 ## 71 y 2.9 blue x 3; green x 2; red x 2 green x 3 0.162857142857143 ## 61 y 2.5 blue x 3; green x 2; red x 1 green x 3; red x 1 0.103333333333333 ## 8 x 3.3 blue x 3; green x 3; red x 2 green x 2 0.095 ## 81 y 3.15 blue x 3; green x 3; red x 2 green x 2 0.095 ## 1 x 0.25 blue x 1 blue x 2; green x 5; red x 2 0.0866666666666667 ## 11 y 0.75 blue x 1 blue x 2; green x 5; red x 2 0.0866666666666667 ## 9 x 3.65 blue x 3; green x 4; red x 2 green x 1 0.0422222222222223 ## 91 y 3.4 blue x 3; green x 4; red x 2 green x 1 0.0422222222222223

第二層節點評估記錄

## Variable Threshold Left_branch Right_branch Gini_gain ## 3 x 1.45 blue x 3 red x 2 0.48 ## 31 y 1.8 blue x 3 red x 2 0.48 ## 2 x 0.8 blue x 2 blue x 1; red x 2 0.213333333333333 ## 21 y 1.25 blue x 2 blue x 1; red x 2 0.213333333333333 ## 4 x 1.85 blue x 3; red x 1 red x 1 0.18 ## 41 y 2.45 blue x 3; red x 1 red x 1 0.18 ## 1 x 0.25 blue x 1 blue x 2; red x 2 0.08 ## 11 y 0.75 blue x 1 blue x 2; red x 2 0.08

最終選擇的決策變量和決策閾值

as.data.frame(do.call(rbind, brute_descition_tree_result))

最終選擇的決策變量和決策閾值

## type split_variable split_threshold ## 1 Root x 1.95 ## 2 2 left x 1.45

運行后，獲得兩個決策節點，繪制決策樹如下：

從返回的Gini gain表格可以看出，第二個節點有兩種效果一樣的分支方式。

這樣我們就用暴力方式完成了決策樹的構建。

https://victorzhou.com/blog/intro-to-random-forests/
https://victorzhou.com/blog/gini-impurity/
https://stats.stackexchange.com/questions/192310/is-random-forest-suitable-for-very-small-data-sets
https://towardsdatascience.com/understanding-random-forest-58381e0602d2
https://www.stat.berkeley.edu/~breiman/RandomForests/reg_philosophy.html
https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d

往期精品(點擊圖片直達文字對應教程)

后臺回復“生信寶典福利第一波”或點擊閱讀原文獲取教程合集

（請備注姓名-學校/企業-職務等）

總結

以上是生活随笔為你收集整理的机器学习算法-随机森林之决策树R 代码从头暴力实现（3）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：学生信，不是贪多的，而是求精的！
下一篇：如何使用Bioconductor进行单细

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

机器学习算法-随机森林之决策树R 代码从头暴力实现（3）

再決策第二個節點、第三個節點

往期精品(點擊圖片直達文字對應教程)

總結