回归_英国酒精和香烟关系
?
sklearn實戰-乳腺癌細胞數據挖掘(博客主親自錄制視頻教程)
?
https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share
?
?
數據統計分析聯系:QQ:231469242
英國酒精和香煙官網
http://lib.stat.cmu.edu/DASL/Stories/AlcoholandTobacco.html
Story Name: Alcohol and TobaccoImage: Scatterplot of Alcohol vs. Tobacco, with Northern Ireland marked with a blue X.
?
Story Topics: Consumer , HealthDatafile Name: Alcohol and TobaccoMethods: Correlation , Dummy variable , Outlier , Regression , ScatterplotAbstract: Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcholic beverages. A scatterplot of spending on alcohol vs. spending on tobacco in the 11 regions of Great Britain shows an overall positive linear relationship with Northern Ireland as an outlier. Northern Ireland's influence is illustrated by the fact that the correlation between alcohol and tobacco spending jumps from .224 to .784 when Northern Ireland is eliminated from the dataset.
This dataset may be used to illustrate the effect of a single influential observation on regression results. In a simple regression of alcohol spending on tobacco spending, tobacco spending does not appear to be a significant predictor of tobacco spending. However, including a dummy variable that takes the value 1 for Northern Ireland and 0 for all other regions results in significant coefficients for both tobacco spending and the dummy variable, and a high R-squared.
?
?
?
?
兩個模塊算出的R平方值一樣的
?
?
?
# -*- coding: utf-8 -*- """ python3.0 Alcohol and Tobacco 酒精和煙草的關系 http://lib.stat.cmu.edu/DASL/Stories/AlcoholandTobacco.html 很多時候,數據讀寫不一定是文件,也可以在內存中讀寫。 StringIO顧名思義就是在內存中讀寫str。 要把str寫入StringIO,我們需要先創建一個StringIO,然后,像文件一樣寫入即可 """import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import statsmodels.formula.api as sm from sklearn.linear_model import LinearRegression from scipy import statslist_alcohol=[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08,4.02] list_tobacco=[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51,4.56] plt.plot(list_tobacco,list_alcohol,'ro') plt.ylabel('Alcohol') plt.ylabel('Tobacco') plt.title('Sales in Several UK Regions') plt.show()data=pd.DataFrame({'Alcohol':list_alcohol,'Tobacco':list_tobacco})result = sm.ols('Alcohol ~ Tobacco', data[:-1]).fit() print(result.summary())?
?
?
python2.7
?
# -*- coding: utf-8 -*- #斯皮爾曼等級相關(Spearman’s correlation coefficient for ranked data) import numpy as np import scipy.stats as stats from scipy.stats import f import pandas as pd import matplotlib.pyplot as plt from statsmodels.stats.diagnostic import lillifors import normality_checky=[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08] x=[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51] list_group=[x,y] sample=len(x)#數據可視化 plt.plot(x,y,'ro') #斯皮爾曼等級相關,非參數檢驗 def Spearmanr(x,y):print"use spearmanr,Nonparametric tests"#樣本不一致時,發出警告if len(x)!=len(y):print "warming,the samples are not equal!"r,p=stats.spearmanr(x,y)print"spearman r**2:",r**2print"spearman p:",pif sample<500 and p>0.05:print"when sample < 500,p has no mean(>0.05)"print"when sample > 500,p has mean"#皮爾森 ,參數檢驗 def Pearsonr(x,y):print"use Pearson,parametric tests"r,p=stats.pearsonr(x,y)print"pearson r**2:",r**2print"pearson p:",pif sample<30:print"when sample <30,pearson has no mean"#kendalltau非參數檢驗 def Kendalltau(x,y):print"use kendalltau,Nonparametric tests"r,p=stats.kendalltau(x,y)print"kendalltau r**2:",r**2print"kendalltau p:",p#選擇模型 def mode(x,y):#正態性檢驗Normal_result=normality_check.NormalTest(list_group)print "normality result:",Normal_resultif len(list_group)>2:Kendalltau(x,y)if Normal_result==False:Spearmanr(x,y)Kendalltau(x,y)if Normal_result==True: Pearsonr(x,y)mode(x,y) ''' x=[50,60,70,80,90,95] y=[500,510,530,580,560,1000] use shapiro: data are normal distributed use shapiro: data are not normal distributed normality result: False use spearmanr,Nonparametric tests spearman r: 0.942857142857 spearman p: 0.00480466472303 use kendalltau,Nonparametric tests kendalltau r: 0.866666666667 kendalltau p: 0.0145950349193#肯德爾系數測試 x=[3,5,2,4,1] y=[3,5,2,4,1] z=[3,4,1,5,2] h=[3,5,1,4,2] k=[3,5,2,4,1] '''?
?python2.7
# -*- coding: utf-8 -*- ''' Author:Toby QQ:231469242,all right reversed,no commercial use normality_check.py 正態性檢驗腳本'''import scipy from scipy.stats import f import numpy as np import matplotlib.pyplot as plt import scipy.stats as stats # additional packages from statsmodels.stats.diagnostic import lillifors#正態分布測試 def check_normality(testData):#20<樣本數<50用normal test算法檢驗正態分布性if 20<len(testData) <50:p_value= stats.normaltest(testData)[1]if p_value<0.05:print"use normaltest"print "data are not normal distributed"return Falseelse:print"use normaltest"print "data are normal distributed"return True#樣本數小于50用Shapiro-Wilk算法檢驗正態分布性if len(testData) <50:p_value= stats.shapiro(testData)[1]if p_value<0.05:print "use shapiro:"print "data are not normal distributed"return Falseelse:print "use shapiro:"print "data are normal distributed"return Trueif 300>=len(testData) >=50:p_value= lillifors(testData)[1]if p_value<0.05:print "use lillifors:"print "data are not normal distributed"return Falseelse:print "use lillifors:"print "data are normal distributed"return Trueif len(testData) >300: p_value= stats.kstest(testData,'norm')[1]if p_value<0.05:print "use kstest:"print "data are not normal distributed"return Falseelse:print "use kstest:"print "data are normal distributed"return True#對所有樣本組進行正態性檢驗 def NormalTest(list_groups):for group in list_groups:#正態性檢驗status=check_normality(group)if status==False :return Falsereturn True''' group1=[2,3,7,2,6] group2=[10,8,7,5,10] group3=[10,13,14,13,15] list_groups=[group1,group2,group3] list_total=group1+group2+group3 #對所有樣本組進行正態性檢驗 NormalTest(list_groups) '''?
python風控評分卡建模和風控常識(博客主親自錄制視頻教程)
https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share
轉載于:https://www.cnblogs.com/webRobot/p/7140749.html
總結
以上是生活随笔為你收集整理的回归_英国酒精和香烟关系的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 蔚小理增长乏力 哪吒、零跑、威马就能成功
- 下一篇: 《漫威蜘蛛侠》PC版M站评分解禁 媒体均