达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择
達(dá)爾文進(jìn)化獎(jiǎng)
Maths almost always have a good answer in questions related to feature selection. However, sometimes good-old brute force algorithms can bring into the game a better and more practical answer.
中號(hào) ATHS幾乎總是在與特征選擇問題一個(gè)很好的答案。 但是,有時(shí)舊式的蠻力算法可以為游戲帶來更好,更實(shí)用的答案。
Genetic algorithms are a family of algorithms inspired by biological evolution, that basically use the cycle — cross, mutate, try, developing the best combination of states depending on the scoring metric. So, let’s get to the code.
遺傳算法是一類受生物進(jìn)化啟發(fā)的算法,它們基本上使用循環(huán)-交叉,變異,嘗試,根據(jù)評(píng)分標(biāo)準(zhǔn)開發(fā)狀態(tài)的最佳組合。 因此,讓我們看一下代碼。
使用來自Kydavra庫的GeneticAlgorithmSelector。 (Using GeneticAlgorithmSelector from Kydavra library.)
To install kydavra just write the following command in terminal:
要安裝kydavra,只需在終端中輸入以下命令:
pip install kydavraNow you can import the Selector and apply it on your data set a follows:
現(xiàn)在,您可以導(dǎo)入選擇器,并將其應(yīng)用于數(shù)據(jù)集,如下所示:
from kydavra import GeneticAlgorithmSelectorselector = GeneticAlgorithmSelector()new_columns = selector.select(model, df, ‘target’)As with every Kydavra selector that’s all. Now let’s try it on the Heart disease dataset.
就像所有Kydavra選擇器一樣。 現(xiàn)在讓我們?cè)凇靶呐K病”數(shù)據(jù)集上嘗試一下。
import pandas as pddf = pd.read_csv(‘cleaned.csv’)I highly recommend you to shuffle your dataset before applying the selector, because it uses metrics (and right now cross_val_score isn’t implemented in this selector).
我強(qiáng)烈建議您在應(yīng)用選擇器之前先對(duì)數(shù)據(jù)集進(jìn)行洗牌,因?yàn)樗褂弥笜?biāo)(并且此選擇器中目前未實(shí)現(xiàn)cross_val_score)。
df = df.sample(frac=1).reset_index(drop=True)Now we can apply our selector. To mention it has some parameters:
現(xiàn)在我們可以應(yīng)用選擇器了。 要說它有一些參數(shù):
nb_children (int, default = 4) the number of best children that the algorithm will choose for the next generation.
nb_children (int,默認(rèn)= 4)該算法將為下一代選擇的最佳子代數(shù)。
nb_generation (int, default = 200) the number of generations that will be created, technically speaking the number of iterations.
nb_generation (整數(shù),默認(rèn)值= 200)將要?jiǎng)?chuàng)建的世代數(shù),從技術(shù)上講是迭代數(shù)。
scoring_metric (sklearn scoring metric, default = accuracy_score) The metric score used to select the best feature combination.
scoring_metric (sklearn評(píng)分標(biāo)準(zhǔn),默認(rèn)= precision_score)用于選擇最佳功能組合的度量標(biāo)準(zhǔn)分?jǐn)?shù)。
max (boolean, default=True) if is set to True, the algorithm will select the combinations with the highest score if False the lowest scores will be chosen.
max (布爾值,默認(rèn)值= True),如果設(shè)置為True,則算法將選擇得分最高的組合,如果為False,則選擇最低得分。
But for now, we will use the basic setting except for the scoring_metric, because we have there a problem of disease diagnosis, so it will better to use Precision instead of accuracy.
但是現(xiàn)在,我們將使用除scoring_metric之外的基本設(shè)置,因?yàn)槲覀兇嬖诩膊≡\斷的問題,因此最好使用Precision而不是Precision。
from kydavra import GeneticAlgorithmSelectorfrom sklearn.metrics import precision_scorefrom sklearn.ensemble import RandomForestClassifierselector = GeneticAlgorithmSelector(scoring_metric=precision_score)model = RandomForestClassifier()So now let’s find the best features. GAS (short version for GeneticAlgorithmSelector) need a sklearn model to train during the process of choosing features, the data frame itself and of course the name of target column:
因此,現(xiàn)在讓我們找到最佳功能。 GAS(GeneticAlgorithmSelector的縮寫)需要一個(gè)sklearn模型來進(jìn)行特征選擇,數(shù)據(jù)框本身以及目標(biāo)列名稱的訓(xùn)練:
selected_cols = selector.select(model, df, 'target')Now let’s evaluate the result. Before feature selection, the precision score of the Random Forest was — 0.805. GAS choose the following features:
現(xiàn)在讓我們?cè)u(píng)估結(jié)果。 在特征選擇之前,隨機(jī)森林的精度得分為-0.805。 GAS選擇以下功能:
['age', 'sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'thal']Which gave the following precision score — 0.823. Which is a good result, knowing that in the majority of cases it is very hard to level up the scoring metrics.
得出的精度得分為0.823。 知道在大多數(shù)情況下很難提高評(píng)分標(biāo)準(zhǔn),這是一個(gè)很好的結(jié)果。
If you want to find out more about Genetic Algorithms, at the bottom of the article are some useful links. If you tried Kydavra and have some issues or feedback, please contact me on medium or please fill this form.
如果您想了解有關(guān)遺傳算法的更多信息,請(qǐng)?jiān)诒疚牡撞空业揭恍┯杏玫逆溄印?如果您嘗試了Kydavra,但有任何問題或反饋,請(qǐng)通過媒體與我聯(lián)系,或填寫此表格 。
Made with ? by Sigmoid
由Sigmoid制造的?
Useful links:
有用的鏈接:
https://towardsdatascience.com/the-most-important-part-in-artifical-intesystems-development-243f04f73fcd
https://towardsdatascience.com/the-most-important-part-in-artifical-intesystems-development-243f04f73fcd
翻譯自: https://towardsdatascience.com/applying-darwinian-evolution-to-feature-selection-with-kydavra-geneticalgorithmselector-378662fd1f5b
達(dá)爾文進(jìn)化獎(jiǎng)
總結(jié)
以上是生活随笔為你收集整理的达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ai无法启动产品_启动AI启动的三个关键
- 下一篇: 金山软件今日开工!雷军现场发红包 扮相很