當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

《机器学习实战》chapter05 Logistic回归

發(fā)布時間：2025/3/16 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了《机器学习实战》chapter05 Logistic回归小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

（1）收集數(shù)據(jù)：任意方法

（2）準(zhǔn)備數(shù)據(jù)：由于需要計算距離，因此要求數(shù)據(jù)類型為數(shù)值型，結(jié)構(gòu)化數(shù)據(jù)格式則最佳

（3）分析數(shù)據(jù)：任意方法

（4）訓(xùn)練算法：大部分時間將用于訓(xùn)練，訓(xùn)練的目的是為了找到最佳的分類回歸系數(shù)

（5）測試算法：一旦訓(xùn)練完成，分類將會很快

Logistic回歸

優(yōu)點：計算代價不高，易于理解和實現(xiàn)
缺點：容易欠擬合，分類精度可能不高
適用數(shù)據(jù)類型：數(shù)值型和標(biāo)稱型

對于回歸函數(shù)的選擇，我們想要的是能接受所有的輸入然后預(yù)測出類型。在兩個類別的情況下，上述函數(shù)輸出0或1，即單位階躍函數(shù)，這里我們使用Sigmoid函數(shù)，如果橫坐標(biāo)足夠大，Sigmoid函數(shù)看起來很像一個階躍函數(shù)，公式如下：

為了實現(xiàn)Logistic回歸分類器，我們可以在每個特征上都乘以一個回歸系數(shù)，然后把所有結(jié)果值相加，把這個總和代入Sigmoid函數(shù)中進(jìn)而得到一個0-1之間的數(shù)值。任何大于0.5的數(shù)據(jù)被分入1類，小于0.5的被分入0類。

1、梯度上升優(yōu)化算法 from numpy import *# 加載數(shù)據(jù) def loadData():dataMat = []labelMat = []fr = open("testSet.txt")for line in fr.readlines():lineArr = line.strip().split()# 為方便計算， X0設(shè)置為1dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])# 文件中的第三列為類別labelMat.append(int(lineArr[2]))return dataMat, labelMat# sigmoid def sigmoid(inX):return 1.0 / (1 + exp(-inX))# 梯度上升優(yōu)化算法 def gradAscent(dataMatIn, classLabels):dataMatrix = mat(dataMatIn)# transpose()矩陣轉(zhuǎn)置labelMat = mat(classLabels).transpose()m, n = shape(dataMatrix)alpha = 0.001maxCycles = 500weights = ones((n, 1))# 重復(fù)maxCycles次for k in range(maxCycles):# 計算整個數(shù)據(jù)集的梯度，h和error是都是向量h = sigmoid(dataMatrix * weights)error = (labelMat - h)# 更新回歸系數(shù)向量weights = weights + alpha * dataMatrix.transpose() * errorreturn weights

可視化方法

# 畫出數(shù)據(jù)集和Logistic回歸最佳擬合直線的函數(shù) def plotBestFit(weights):import matplotlib.pyplot as pltdataMat, labelMat = loadData()dataArr = array(dataMat)n = shape(dataArr)[0]xcord1 = []ycord1 = []xcord2 = []ycord2 = []for i in range(n):if int(labelMat[i]) == 1:xcord1.append(dataArr[i, 1])ycord1.append(dataArr[i, 2])else:xcord2.append(dataArr[i, 1])ycord2.append(dataArr[i, 2])fig = plt.figure()ax = fig.add_subplot(111)ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')ax.scatter(xcord2, ycord2, s=30, c='green')x = arange(-3.0, 3.0, 0.1)y = (-weights[0] - weights[1] * x) / weights[2]ax.plot(x, y)plt.xlabel('X1')plt.ylabel('X2')plt.show()

2、隨機梯度上升

# 隨機梯度上升算法，每次只用數(shù)據(jù)集中的一個樣本點來更新系數(shù)，順序遍歷整個數(shù)據(jù)集 def stocGradAscent0(dataMatrix, classLabels):# 強制類型轉(zhuǎn)換，避免array和list混用dataMatrix = array(dataMatrix)m, n = shape(dataMatrix)alpha = 0.01weights = ones(n)for i in range(m):# h和error是數(shù)值h = sigmoid(sum(dataMatrix[i] * weights))error = classLabels[i] - hweights = weights + alpha * error * dataMatrix[i]return weights

3、改進(jìn)的隨機梯度上升

# 改進(jìn)的隨機梯度上升算法, 多加一個迭代次數(shù)控制函數(shù) # 在每次迭代中隨機地用單個樣本點更新數(shù)據(jù)集，每次迭代，遍歷整個數(shù)據(jù)集 def stocGradAscent1(dataMatrix, classLabels, numIter=150):# 強制類型轉(zhuǎn)換，避免array和list混用dataMatrix = array(dataMatrix)m, n = shape(dataMatrix)alpha = 0.01weights = ones(n)for j in range(numIter):dataIndex = list(range(m))for i in range(m):alpha = 4 / (1.0 + j + i) + 0.01randIndex = int(random.uniform(0, len(dataIndex)))h = sigmoid(sum(dataMatrix[randIndex] * weights))error = classLabels[randIndex] - hweights = weights + alpha * error * dataMatrix[randIndex]del dataIndex[randIndex]return weights

總結(jié)

以上是生活随笔為你收集整理的《机器学习实战》chapter05 Logistic回归的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：温昱：架构实践全景图
下一篇：深入解读 MySQL 架构设计原理，剖析

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

《机器学习实战》chapter05 Logistic回归

Logistic回歸

總結(jié)