皮尔逊相关性_皮尔逊的相关性及其在机器学习中的意义
皮爾遜相關性
Today we would be using a statistical concept i.e. Pearson's correlation to help us understand the relationships between the feature values (independent values) and the target value (dependent value or the value to be predicted ) which will further help us in improving our model’s efficiency.
今天,我們將使用統計概念(即Pearson的相關性)來幫助我們理解特征值(獨立值)與目標值(獨立值或要預測的值)之間的關系,這將進一步幫助我們提高模型的效率。
Mathematically pearson's correlation is calculated as:
在數學上, 皮爾遜的相關性計算如下:
Image source: https://businessjargons.com/wp-content/uploads/2016/04/Karl-Pearson-final.jpg
圖片來源: https : //businessjargons.com/wp-content/uploads/2016/04/Karl-Pearson-final.jpg
So now the question arises, what should be stored in the variable X and what should be stored in variable Y. We generally store the feature values in X and target value in the Y. The formula written above will tell us whether there exists any correlation between the selected feature value and the target value.
所以現在出現了一個問題,什么應該存儲在變量X中,什么應該存儲在變量Y中。我們通常將特征值存儲在X中,將目標值存儲在Y中。上面寫的公式將告訴我們是否存在任何相關性在所選特征值和目標值之間。
Before we code there are few basic things that we should keep in mind about correlation:
在進行編碼之前,關于關聯我們應該牢記一些基本的知識:
The value of Correlation will always lie between 1 and -1
關聯的值將始終在1到-1之間
Correlation=0, it means there is absolutely no relationship between the selected feature value and the target value.
Correlation = 0 ,這意味著所選特征值和目標值之間絕對沒有關系。
Correlation=1, it means that there is a perfect relationship between the selected feature value and the target value and this would mean that the selected feature is appropriate for our model to learn.
Correlation = 1 ,表示所選特征值與目標值之間存在完美的關系,這意味著所選特征適合我們的模型學習。
Correlation=-1, it means that there exists a negative relationship between the selected feature value and the target value, generally, the use of the feature value having a negative value of low magnitude is discouraged for e.g. -0.1 0r -0.2.
Correlation = -1 ,意味著在所選擇的特征值與目標值之間存在負的關系,通常,對于例如-0.1 0r -0.2,不鼓勵使用具有低幅度的負值的特征值。
So, guys let us now write the code to implement that we have just learned:
所以,伙計們讓我們現在編寫代碼以實現剛剛學習的代碼:
The data set used can be downloaded from here: headbrain3.CSV
可以從此處下載使用的數據集: headbrain3.CSV
""" # -*- coding: utf-8 -*- """ Created on Sun Jul 29 22:21:12 2018 @author: Raunak Goswami """ import numpy as np import pandas as pd import matplotlib.pyplot as plt """#reading the data """ here the directory of my code and the headbrain3.csv file is same make sure both the files are stored in same folder or directory """ data=pd.read_csv('headbrain3.csv')#this will show the first five records of the whole data data.head()w=data.iloc[:,0:1].values y=data.iloc[:,1:2].values #this will create a variable x which has the feature values i.e head size x=data.iloc[:,2:3].values #this will create a variable y which has the target value i.e brain weight z=data.iloc[:,3:4].values print(round(data['Gender'].corr(data['Brain Weight(grams)']))) plt.scatter(w,z,c='red') plt.title('scattered graph for coorelation between Gender and brainweight' ) plt.xlabel('age') plt.ylabel('brain weight') plt.show()print(round(data['Age Range'].corr(data['Brain Weight(grams)']))) plt.scatter(x,z,c='red') plt.title('scattered graph for coorelation between age and brainweight' ) plt.xlabel('age range') plt.ylabel('brain weight') plt.show()print(round((data['Head Size(cm^3)'].corr(data['Brain Weight(grams)'])))) plt.scatter(x,z,c='red') plt.title('scattered graph for coorelation between head size and brainweight' ) plt.xlabel('head size') plt.ylabel('brain weight') plt.show()data.info() data['Head Size(cm^3)'].corr(data['Brain Weight(grams)']) k=data.corr() print("The table for all possible values of pearson's coefficients is as follows") print(k) .minHeight{min-height: 250px;}@media (min-width: 1025px){.minHeight{min-height: 90px;}} .minHeight{min-height: 250px;}@media (min-width: 1025px){.minHeight{min-height: 90px;}}After you run your code in Spyder tool provided by anaconda distribution just go to your variable explorer and search for the variable named as k and double-click to see the values in that variable and you’ll see something like this
在anaconda發行版提供的Spyder工具中運行代碼之后,轉到變量資源管理器并搜索名為k的變量,然后雙擊以查看該變量中的值,您將看到類似以下的內容
The table above shows the correlation values here 1 means perfect correlation,0 is for no correlation and -1 stands for negative correlation.
上表顯示了相關值,此處1表示完全相關,0表示無相關,-1表示負相關。
Now let us understand these values using the graphs:
現在,讓我們使用圖形來了解這些值:
The reason for getting this abruptly looking graph is that there is no correlation between gender and brain weight, that is why we cannot use gender as a feature value in our prediction model.Let us try drawing graph for brain weight using another feature value, what about head size?
得到這張看起來很突然的圖的原因是性別和大腦重量之間沒有相關性,這就是為什么我們不能在預測模型中使用性別作為特征值的原因。讓我們嘗試使用另一個特征值繪制大腦重量的圖關于頭的大小?
As you can see in the table, there exists a perfect correlation between between brain weight and head size so as a result we a getting a definite graph this signifies that there exists a perfect linear relationship between brain weight and head size so we can use head size as one of the feature value in our model.
如您在表格中所見,大腦重量和頭部大小之間存在完美的關聯,因此,我們得到一個確定的圖,這表明大腦重量和頭部大小之間存在完美的線性關系,因此我們可以使用頭部大小作為模型中的特征值之一。
That is all for this article if you have any queries just write in the comment section I would be happy to help you. Have a great day ahead, keep learning.
如果您有任何疑問,只需要在評論部分中編寫,這就是本文的全部內容,我們很樂意為您提供幫助。 祝您有美好的一天,繼續學習。
翻譯自: https://www.includehelp.com/ml-ai/pearsons-correlation-and-its-implication-in-machine-learning.aspx
皮爾遜相關性
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的皮尔逊相关性_皮尔逊的相关性及其在机器学习中的意义的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: spearman相关性_Spearman
- 下一篇: arm中clz指令_JavaScript