语义分割损失函数系列(1):交叉熵损失函数
最近一直在做一些語義分割相關的項目,找損失函數的時候發現網上這些大佬的寫得各有千秋,也沒說怎么用,在此記錄一下自己在訓練過程中使用損失函數的一些心得.本人是使用的Pytorch框架,故這一系列都會基于Pytorch來實現。
首先是交叉熵損失函數,語義分割其實是一個逐像素分類的一個分類問題,做過圖像分類的應該都比較熟悉交叉熵損失函數。
pytorch中自帶有寫好的交叉熵函數,只需要調用就行:
loss_func = nn.CrossEntropyLoss()torch.nn模塊中寫好的損失函數都是以類的方式寫的,只需要提前聲明一下后面即可調用。
pytorch中交叉熵損失函數的實現:
class CrossEntropyLoss(_WeightedLoss):r"""This criterion computes the cross entropy loss between input and target.It is useful when training a classification problem with `C` classes.If provided, the optional argument :attr:`weight` should be a 1D `Tensor`assigning weight to each of the classes.This is particularly useful when you have an unbalanced training set.The `input` is expected to contain raw, unnormalized scores for each class.`input` has to be a Tensor of size either :math:`(minibatch, C)` or:math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1` for the`K`-dimensional case. The latter is useful for higher dimension inputs, suchas computing cross entropy loss per-pixel for 2D images.The `target` that this criterion expects should contain either:- Class indices in the range :math:`[0, C-1]` where :math:`C` is the number of classes; if`ignore_index` is specified, this loss also accepts this class index (this indexmay not necessarily be in the class range). The unreduced (i.e. with :attr:`reduction`set to ``'none'``) loss for this case can be described as:.. math::\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quadl_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})}\cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight,:math:`C` is the number of classes, and :math:`N` spans the minibatch dimension as well as:math:`d_1, ..., d_k` for the `K`-dimensional case. If:attr:`reduction` is not ``'none'`` (default ``'mean'``), then.. math::\ell(x, y) = \begin{cases}\sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, &\text{if reduction} = \text{`mean';}\\\sum_{n=1}^N l_n, &\text{if reduction} = \text{`sum'.}\end{cases}Note that this case is equivalent to the combination of :class:`~torch.nn.LogSoftmax` and:class:`~torch.nn.NLLLoss`.- Probabilities for each class; useful when labels beyond a single class per minibatch itemare required, such as for blended labels, label smoothing, etc. The unreduced (i.e. with:attr:`reduction` set to ``'none'``) loss for this case can be described as:.. math::\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quadl_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\exp(\sum_{i=1}^C x_{n,i})} y_{n,c}where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight,:math:`C` is the number of classes, and :math:`N` spans the minibatch dimension as well as:math:`d_1, ..., d_k` for the `K`-dimensional case. If:attr:`reduction` is not ``'none'`` (default ``'mean'``), then.. math::\ell(x, y) = \begin{cases}\frac{\sum_{n=1}^N l_n}{N}, &\text{if reduction} = \text{`mean';}\\\sum_{n=1}^N l_n, &\text{if reduction} = \text{`sum'.}\end{cases}.. note::The performance of this criterion is generally better when `target` contains classindices, as this allows for optimized computation. Consider providing `target` asclass probabilities only when a single class label per minibatch item is too restrictive.Args:weight (Tensor, optional): a manual rescaling weight given to each class.If given, has to be a Tensor of size `C`size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,the losses are averaged over each loss element in the batch. Note that forsome losses, there are multiple elements per sample. If the field :attr:`size_average`is set to ``False``, the losses are instead summed for each minibatch. Ignoredwhen :attr:`reduce` is ``False``. Default: ``True``ignore_index (int, optional): Specifies a target value that is ignoredand does not contribute to the input gradient. When :attr:`size_average` is``True``, the loss is averaged over non-ignored targets. Note that:attr:`ignore_index` is only applicable when the target contains class indices.reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, thelosses are averaged or summed over observations for each minibatch dependingon :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss perbatch element instead and ignores :attr:`size_average`. Default: ``True``reduction (string, optional): Specifies the reduction to apply to the output:``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction willbe applied, ``'mean'``: the weighted mean of the output is taken,``'sum'``: the output will be summed. Note: :attr:`size_average`and :attr:`reduce` are in the process of being deprecated, and inthe meantime, specifying either of those two args will override:attr:`reduction`. Default: ``'mean'``label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amountof smoothing when computing the loss, where 0.0 means no smoothing. The targetsbecome a mixture of the original ground truth and a uniform distribution as described in`Rethinking the Inception Architecture for Computer Vision <https://arxiv.org/abs/1512.00567>`__. Default: :math:`0.0`.Shape:- Input: :math:`(N, C)` where `C = number of classes`, or:math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`in the case of `K`-dimensional loss.- Target: If containing class indices, shape :math:`(N)` where each value is:math:`0 \leq \text{targets}[i] \leq C-1`, or :math:`(N, d_1, d_2, ..., d_K)` with:math:`K \geq 1` in the case of K-dimensional loss. If containing class probabilities,same shape as the input.- Output: If :attr:`reduction` is ``'none'``, shape :math:`(N)` or:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of K-dimensional loss.Otherwise, scalar.Examples::>>> # Example of target with class indices>>> loss = nn.CrossEntropyLoss()>>> input = torch.randn(3, 5, requires_grad=True)>>> target = torch.empty(3, dtype=torch.long).random_(5)>>> output = loss(input, target)>>> output.backward()>>>>>> # Example of target with class probabilities>>> input = torch.randn(3, 5, requires_grad=True)>>> target = torch.randn(3, 5).softmax(dim=1)>>> output = loss(input, target)>>> output.backward()"""__constants__ = ['ignore_index', 'reduction', 'label_smoothing']ignore_index: intlabel_smoothing: floatdef __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100,reduce=None, reduction: str = 'mean', label_smoothing: float = 0.0) -> None:super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)self.ignore_index = ignore_indexself.label_smoothing = label_smoothingdef forward(self, input: Tensor, target: Tensor) -> Tensor:return F.cross_entropy(input, target, weight=self.weight,ignore_index=self.ignore_index, reduction=self.reduction,label_smoothing=self.label_smoothing)在聲明損失函數的時候可以加上一些參數,其中比較重要的是weight: Optional[Tensor] = None,
weight權重是在計算中給每一類加上相應的計算權重,是一個tensor,長度和類別數一致;以及label_smoothing,label_smoothing方法在交叉熵損失函數中自帶有,假設label_smoothing = 0.1的話,在二分類問題中就會將類別為1的變為0.9,類別為0的變為0.1,這樣做能夠讓損失更加平滑,更容易收斂,避免錯誤分類帶來的過大的損失。
在使用也就是計算loss值的時候需要兩個參數,一個是input,一個是target,兩個都是tensor,input是你模型的預測結果,target是真實標注。例如:
import torch import torch.nn as nnloss_func = nn.CrossEntropyLoss()input = torch.randn(3, 5, requires_grad=True)target = torch.empty(3, dtype=torch.long).random_(5) output = loss_func(input, target) print(output)input: tensor([[ 1.6738, 0.0526, 0.6329, -0.8809, 1.4822],[-0.5908, 1.5717, 1.3402, 0.4227, -0.3498],[-0.3359, -2.3797, -1.6206, -2.3070, 0.6010]], requires_grad=True) target: tensor([3, 4, 1]) loss: tensor(3.2306, grad_fn=<NllLossBackward0>)上面這就類似與一個五分類的問題,input是模型最后的全連接層的輸出,或者是全卷積網絡最后的輸出。
這里注意:
加入你把分類輸出的概率做了argmax操作以后,計算會出錯,例如:
import torch import torch.nn as nnloss_func = nn.CrossEntropyLoss()# input = torch.randn(3, 5, requires_grad=True) input = torch.randn(3,requires_grad=True) print("input:",input) target = torch.empty(3, dtype=torch.long).random_(5) print("target:",target) output = loss_func(input, target) print("loss:",output)input: tensor([-0.3463, 1.2289, 0.2517], requires_grad=True) target: tensor([3, 4, 3]) Traceback (most recent call last):File "/home/lwf/Project/MRI-Segmentation/tets.py", line 19, in <module>output = loss_func(input, target)File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1152, in forwardlabel_smoothing=self.label_smoothing)File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/functional.py", line 2846, in cross_entropyreturn torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing) RuntimeError: Expected floating point type for target with class probabilities, got Long也就是說,input的輸出必須是每個類別的概率。
總結
以上是生活随笔為你收集整理的语义分割损失函数系列(1):交叉熵损失函数的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python读取dicom格式的图像并转
- 下一篇: Pytorch使用Vision Tran