當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

（一）神经网络入门之线性回归

發布時間：2025/6/15 编程问答 18 豆豆

生活随笔收集整理的這篇文章主要介紹了（一）神经网络入门之线性回归小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

作者：chen_h
微信號 & QQ：862251340
微信公眾號：coderpai
簡書地址：https://www.jianshu.com/p/0da...

這篇教程是翻譯Peter Roelants寫的神經網絡教程，作者已經授權翻譯，這是原文。

該教程將介紹如何入門神經網絡，一共包含五部分。你可以在以下鏈接找到完整內容。

（一）神經網絡入門之線性回歸
Logistic分類函數
（二）神經網絡入門之Logistic回歸（分類問題）
（三）神經網絡入門之隱藏層設計
Softmax分類函數
（四）神經網絡入門之矢量化
（五）神經網絡入門之構建多層網絡

這篇教程中的代碼是由 Python 2 IPython Notebook產生的，在教程的最后，我會給出全部代碼的鏈接，幫助學習。神經網絡中有關矩陣的運算我們采用NumPy來構建，畫圖使用Matplotlib來構建。如果你來沒有按照這些軟件，那么我強烈建議你使用Anaconda Python來安裝，這個軟件包中包含了運行這個教程的所有軟件包，非常方便使用。

我們先導入教程需要的軟件包

from __future__ import print_functionimport numpy as np import matplotlib.pyplot as plt

線性回歸

本教程主要包含三部分：

一個非常簡單的神經網絡
一些概念，比如目標函數，損失函數
梯度下降

首先我們來構建一個最簡單的神經網絡，這個神經網絡只有一個輸入，一個輸出，用來構建一個線性回歸模型，從輸入的x來預測一個真實結果t。神經網絡的模型結構為y = x * w ，其中x是輸入參數，w是權重，y是預測結果。神經網絡的模型可以被表示為下圖：

在常規的神經網絡中，神經網絡結構中有多個層，非線性激活函數和每個節點上面的偏差單元。在這個教程中，我們只使用一個只有一個權重w的層，并且沒有激活函數和偏差單元。在簡單線性回歸中，權重w和偏差單元一般都寫成一個參數向量β，其中偏差單元是y軸上面的截距，w是回歸線的斜率。在線性回歸中，我們一般使用最小二乘法來優化這些參數。

在這篇教程中，我們的目的是最小化目標損失函數，使得實際輸出的y和正確結果t盡可能的接近。損失函數我們定義為：

對于損失函數的優化，我們采用梯度下降，這個方法是神經網絡中常見的優化方法。

定義目標函數

在這個例子中，我們使用函數f來產生目標結果t，但是對目標結果加上一些高斯噪聲N(0, 0.2)，其中N表示正態分布，均值是0，方差是0.2，f定義為f(x) = 2x，x是輸入參數，回歸線的斜率是2，截距是0。所以最后的t = f(x) + N(0, 0.2)。

我們將產生20個均勻分布的數據作為數據樣本x，然后設計目標結果t。下面的程序我們生成了x和t，以及畫出了他們之間的線性關系。

# Define the vector of input samples as x, with 20 values sampled from a uniform distribution # between 0 and 1 x = np.random.uniform(0, 1, 20)# Generate the target values t from x with small gaussian noise so the estimation won't be perfect. # Define a function f that represents the line that generates t without noise def f(x): return x * 2# Create the targets t with some gaussian noise noise_variance = 0.2 # Variance of the gaussian noise # Gaussian noise error for each sample in x noise = np.random.randn(x.shape[0]) * noise_variance # Create targets t t = f(x) + noise # Plot the target t versus the input x plt.plot(x, t, 'o', label='t') # Plot the initial line plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)') plt.xlabel('$x$', fontsize=15) plt.ylabel('$t$', fontsize=15) plt.ylim([0,2]) plt.title('inputs (x) vs targets (t)') plt.grid() plt.legend(loc=2) plt.show()

定義損失函數

我們將優化模型y = w * x中的參數w，使得對于訓練集中的N個樣本，損失函數達到最小。

即，我們的優化目標是：

從函數中，我們可以發現，我們將所有樣本的誤差都進行了累加，這就是所謂的批訓練（batch training）。我們也可以在訓練的時候，每次訓練一個樣本，這種方法在在線訓練中非常常用。

我們利用以下函數畫出損失函數與權重的關系。從圖中，我們可以看出損失函數的值達到最小時，w的值是2。這個值就是我們函數f(x)的斜率。這個損失函數是一個凸函數，并且只有一個全局最小值。

nn(x, w)函數實現了神經網絡模型，cost(y, t)函數實現了損失函數。

# Define the neural network function y = x * w def nn(x, w): return x*w# Define the cost function def cost(y, t): return ((t - y) ** 2).sum()

優化損失函數

對于教程中簡單的損失函數，可能你看一眼就能知道最佳的權重是什么。但是對于復雜的或者更高維度的損失函數，這就是我們為什么要使用各種優化方法的原因了。

梯度下降

在訓練神經網絡中，梯度下降算法是一種比較常用的優化算法。梯度下降算法的原理是損失函數對于每個參數進行求導，并且利用負梯度對參數進行更新。權重w通過循環進行更新：

其中，w(k)表示權重w更新到第k步時的值，Δw為定義為：

其中，μ是學習率，它的含義是在參數更新的時候，每一步的跨度大小。?ξ/?w 表示損失函數 ξ 對于 w 的梯度。對于每一個訓練樣本i，我們可以利用鏈式規則推導出對應的梯度，如下：

其中，ξi是第i個樣本的損失函數，因此，?ξi/?yi可以這樣進行推導：

因為y(i) = x(i) ? w，所以我們對于?yi/?w可以這樣進行推導：

因此，對于第i個訓練樣本，Δw的完整推導如下：

在批處理過程中，我們將所有的梯度都進行累加：

在進行梯度下降之前，我們需要對權重進行一個初始化，然后再使用梯度下降算法進行訓練，最后直至算法收斂。學習率作為一個超參數，需要單獨調試。

gradient(w, x, t)函數實現了梯度?ξ/?w，delta_w(w_k, x, t, learning_rate)函數實現了Δw。

# define the gradient function. Remember that y = nn(x, w) = x * w def gradient(w, x, t):return 2 * x * (nn(x, w) - t)# define the update function delta w def delta_w(w_k, x, t, learning_rate):return learning_rate * gradient(w_k, x, t).sum()# Set the initial weight parameter w = 0.1 # Set the learning rate learning_rate = 0.1# Start performing the gradient descent updates, and print the weights and cost: nb_of_iterations = 4 # number of gradient descent updates w_cost = [(w, cost(nn(x, w), t))] # List to store the weight, costs values for i in range(nb_of_iterations):dw = delta_w(w, x, t, learning_rate) # Get the delta w updatew = w - dw # Update the current weight parameterw_cost.append((w, cost(nn(x, w), t))) # Add weight, cost to list# Print the final w, and cost for i in range(0, len(w_cost)):print('w({}): {:.4f} \t cost: {:.4f}'.format(i, w_cost[i][0], w_cost[i][1]))# output w(0): 0.1000 cost: 23.3917 w(1): 2.3556 cost: 1.0670 w(2): 2.0795 cost: 0.7324 w(3): 2.1133 cost: 0.7274 w(4): 2.1091 cost: 0.7273

從計算結果中，我們很容易的看出來了，梯度下降算法很快的收斂到了2.0左右，接下來可視化一下梯度下降過程。

# Plot the first 2 gradient descent updates plt.plot(ws, cost_ws, 'r-') # Plot the error curve # Plot the updates for i in range(0, len(w_cost)-2):w1, c1 = w_cost[i]w2, c2 = w_cost[i+1]plt.plot(w1, c1, 'bo')plt.plot([w1, w2],[c1, c2], 'b-')plt.text(w1, c1+0.5, '$w({})$'.format(i)) # Show figure plt.xlabel('$w$', fontsize=15) plt.ylabel('$\\xi$', fontsize=15) plt.title('Gradient descent updates plotted on cost function') plt.grid() plt.show()

梯度更新

上圖展示了梯度下降的可視化過程。圖中藍色的點表示在第k輪中w(k)的值。從圖中我們可以得知，w的值越來越收斂于2.0。該模型訓練10次就能收斂，如下圖所示。

w = 0 # Start performing the gradient descent updates nb_of_iterations = 10 # number of gradient descent updates for i in range(nb_of_iterations):dw = delta_w(w, x, t, learning_rate) # get the delta w updatew = w - dw # update the current weight parameter # Plot the fitted line agains the target line # Plot the target t versus the input x plt.plot(x, t, 'o', label='t') # Plot the initial line plt.plot([0, 1], [f(0), f(1)], 'b-', label='f(x)') # plot the fitted line plt.plot([0, 1], [0*w, 1*w], 'r-', label='fitted line') plt.xlabel('input x') plt.ylabel('target t') plt.ylim([0,2]) plt.title('input vs. target') plt.grid() plt.legend(loc=2) plt.show()

完整代碼，點擊這里

作者：chen_h
微信號 & QQ：862251340
簡書地址：https://www.jianshu.com/p/0da...

CoderPai 是一個專注于算法實戰的平臺，從基礎的算法到人工智能算法都有設計。如果你對算法實戰感興趣，請快快關注我們吧。加入AI實戰微信群，AI實戰QQ群，ACM算法微信群，ACM算法QQ群。長按或者掃描如下二維碼，關注 “CoderPai” 微信號（coderpai）

總結

以上是生活随笔為你收集整理的（一）神经网络入门之线性回归的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Refuses to install f
下一篇： mycat快速体验（转）