當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

神经网络激活函数对数函数_神经网络中的激活函数

發布時間：2023/12/15 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了神经网络激活函数对数函数_神经网络中的激活函数小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

神經網絡激活函數對數函數

Activation function, as the name suggests, decides whether a neuron should be activated or not based on the addition of a bias with the weighted sum of inputs. Hence, it is a very significant component of Deep Learning, as they in a way determine the output of models. The activation function has to be efficient so the model can scale along the increase in the number of neurons.

顧名思義，激活函數根據輸入的加權和加上偏倚來決定是否激活神經元。因此，它是深度學習的重要組成部分，因為它們以某種方式確定了模型的輸出。激活功能必須高效，因此模型可以隨著神經元數量的增加而縮放。

To be precise, the activation function decides how much information of the input relevant for the next stage.

確切地說，激活功能決定了與下一階段相關的輸入信息量。

For example, suppose x1 and x2 are two inputs with w1 and w2 their respective weights to the neuron. The output Y = activation_function(y).

例如，假設x1和x2是兩個輸入，其中w1和w2分別占神經元的權重。輸出Y = activation_function(y)。

Here, y = x1.w1 + x2.w2 + b i.e. weighted sum of inputs and bias.

y = x1.w1 + x2.w2 + b，即輸入和偏置的加權和。

Activation functions are mainly of 3 types. We will analyze the curves, pros and cons of each here. The input we work on will be an arithmetic progression in [-10, 10] with a constant difference of 0.1

激活功能主要有3種。我們將在這里分析每個曲線的優缺點。我們正在處理的輸入將是[-10，10]的算術級數，常數差為0.1

x = tf.Variable(tf.range(-10, 10, 0.1), dtype=tf.float32)

二元步 (Binary step)

A binary step function is a threshold-based activation function. If the input value is above or below a certain threshold, the neuron is activated and sends exactly the same signal to the next layer.

二進制步進函數是基于閾值的激活函數。如果輸入值高于或低于某個閾值，則神經元被激活并將完全相同的信號發送到下一層。

#Binary Step Activation
def binary_step(x):
return np.array([1 if each > 0 else 0 for each in list(x.numpy())])
do_plot(x.numpy(), binary_step(x), 'Binary Step')

The binary step is not used mostly because of two reasons. Firstly it allows only 2 outputs that don’t work for multi-class problems. Also, it doesn’t have a derivative.

不使用二進制步驟主要是因為兩個原因。首先，它僅允許2個對多類問題不起作用的輸出。另外，它沒有派生詞。

線性的 (Linear)

As the name suggests, the output is a linear function of the input i.e. y = cx

顧名思義，輸出是輸入的線性函數，即y = cx

#Linear Activation
def linear_activation(x):
c = 0.1
return c*x.numpy()
do_plot(x.numpy(), linear_activation(x), 'Linear Activation')

The linear activation function is also not used is neural networks because of two main reasons.

由于兩個主要原因，線性神經網絡也不使用線性激活函數。

Firstly, with this activation at multiple layers, the last output just becomes a linear function of the input. That defeats the purpose of multiple neurons and layers.

首先，通過多層激活，最后一個輸出剛好成為輸入的線性函數。這破壞了多個神經元和層的目的。

Secondly, as y = cx, the derivative becomes dy/dx = c, which is a constant. So it can’t be used in backpropagation to train neural networks.

其次，當y = cx時，導數變為dy / dx = c，這是一個常數。因此，它不能用于反向傳播訓練神經網絡。

非線性的 (Non-linear)

Non-linear activation functions are used everywhere in neural networks as they capture the complex pattern in data due to the non-linearity nature and support back-propagation as they have a derivative function.

非線性激活函數在神經網絡中隨處可見，因為它們具有非線性特性，因此可以捕獲數據中的復雜模式，并且由于具有微分函數而支持反向傳播。

Here we discuss 5 popularly used activation functions.

在這里，我們討論5種常用的激活函數。

Sigmoid

乙狀結腸

Sigmoid function squashes the output between 0 to 1 and the function looks like sigmoid(x) = 1 / (1 + e^(?x))

Sigmoid函數將輸出壓縮在0到1之間，并且該函數看起來像sigmoid(x)= 1 /(1 + e ^(-x))

y = tf.nn.sigmoid(x)
do_plot(x.numpy(), y.numpy(), 'Sigmoid Activation')

The major advantage of this function is that its gradient is smooth and output always lies between 0 to 1.

此功能的主要優點是其漸變平滑，輸出始終在0到1之間。

It has a few cons like, the output always lies between 0 to 1 which is not suitable for multi-class problems. With multiple layers and neurons, the training gets slower as it is computationally expensive.

它有一些缺點，例如輸出始終位于0到1之間，這不適合多類問題。由于具有多層和神經元，因此訓練速度較慢，因為計算量大。

with tf.GradientTape() as t:
y = tf.nn.sigmoid(x)
do_plot(x.numpy(), t.gradient(y, x).numpy(), 'Grad of Sigmoid')

Also as seen in the gradient graph, it suffers from vanishing gradient problem. The input changes from -10 to -5 and 5 to 10, but the gradient output doesn’t change.

同樣從梯度圖中可以看出，它遭受了梯度消失的困擾。輸入從-10更改為-5，從5更改為10，但梯度輸出不變。

2. Tanh

2. 譚

Tanh activation function is more like sigmoid but it squashes output between -1 to 1. The function looks like tanh(x) =(1-e^(-2x))/(1+e^(2x))

Tanh激活函數更像是Sigmoid，但是在-1到1之間壓縮輸出。該函數看起來像tanh(x)=(1-e ^(-2x))/(1 + e ^(2x))

y = tf.nn.tanh(x)
do_plot(x.numpy(), y.numpy(), 'Tanh Activation')

In addition to the advantages of the sigmoid function, its output is 0 centered.

除了S形函數的優點外，其輸出以0為中心。

With multiple layers and neurons, the training gets slower as it is also computationally expensive.

由于具有多層和神經元，因此訓練變得較慢，因為它在計算上也很昂貴。

with tf.GradientTape() as t:
y = tf.nn.tanh(x)
do_plot(x.numpy(), t.gradient(y, x).numpy(), 'Grad of Tanh')

As seen in the gradient graph, it also suffers from a vanishing gradient problem. Here the input gets out of [-2.5, 2.5], but the gradient output doesn’t change no matter how much the input changes.

從梯度圖中可以看出，它也面臨著消失的梯度問題。此處輸入脫離[-2.5，2.5]，但無論輸入變化多少，梯度輸出都不會改變。

3. ReLU

3. ReLU

ReLU or Rectified Linear Unit either passes the information further or completely blocks it. The function looks like relu(x) = max(0, x)

ReLU或整流線性單元會進一步傳遞信息或將其完全阻止。該函數看起來像relu(x)= max(0，x)

y = tf.nn.relu(x)
do_plot(x.numpy(), y.numpy(), 'ReLU Activation')

It is most popular due to its simplicity and non-linearity. Its derivatives are particularly well behaved: either they vanish or they just let the argument through.

由于其簡單性和非線性，它是最受歡迎的。它的派生詞表現得特別好：要么消失，要么只是讓爭論通過。

with tf.GradientTape() as t:
y = tf.nn.relu(x)
do_plot(x.numpy(), t.gradient(y, x).numpy(), 'Grad of ReLU')

One disadvantage it has is that it doesn’t retain negative inputs thus if inputs are negative, it doesn’t learn anything. That is called the dying ReLU problem.

它的一個缺點是它不會保留負輸入，因此，如果輸入為負，它不會學到任何東西。那就是垂死的ReLU問題。

4. Softmax

Softmax activation function gives output in terms of probability and the number of outputs is equal to the number of inputs. The function looks like,

Softmax激活函數根據概率給出輸出，并且輸出數量等于輸入數量。該函數看起來像

softmax(xi) = xi / sum(xj)

softmax(xi)= xi /和(xj)

x1 = tf.Variable(tf.range(-1, 1, .5), dtype=tf.float32)
y = tf.nn.softmax(x1)

The major advantage of this activation function is it gives multiple outputs that make it popularly used in the output layer of a neural network. It makes it easier to classify multiple categories.

該激活函數的主要優點是它提供了多個輸出，使其廣泛用于神經網絡的輸出層。這樣可以更輕松地對多個類別進行分類。

The main limitation of this algorithm is that it won’t work if data is not linearly separable. Another limitation is that it does not support null rejection, so you need to train the algorithm with a specific null class if you need one.

該算法的主要局限性在于，如果數據不可線性分離，則該算法將不起作用。另一個限制是它不支持空值拒絕，因此如果需要一個特定的空值類，則需要對算法進行訓練。

5. Swish

5. 揮舞

Google Brain Team has proposed this activation function, named Swish. The function looks like swish(x) = x.sigmoid(x)

Google Brain團隊提出了名為Swish的激活功能。該函數看起來像swish(x)= x.sigmoid(x)

According to their paper, it performs better than ReLU with a similar level of computational efficiency.

根據他們的論文，在類似的計算效率水平上，它的性能優于ReLU。

y = tf.nn.swish(x)
do_plot(x.numpy(), y.numpy(), 'Swish Activation')

One reason swish might be performing better than ReLU is it addresses the dying ReLU issue as shown in the below graph.

下圖顯示swish可能比ReLU更好的原因之一是它解決了垂死的ReLU問題，如下圖所示。

with tf.GradientTape() as t:
y = tf.nn.swish(x)
do_plot(x.numpy(), t.gradient(y, x).numpy(), 'Grad of Swish')

On an additional note, ReLU has got other variances of its type also which are quite popular like leaky ReLU, parametric ReLU, etc.

另外需要注意的是，ReLU還具有其他類型的差異，如泄漏ReLU，參數化ReLU等，它們也很受歡迎。

翻譯自: https://medium.com/swlh/activation-functions-in-neural-network-eb0ab4bb493