对比学习系列论文SDCLR(二)-Self-Damaging Contrastive Learning
目錄
- 0.Abstract
- 0.1逐句翻譯
- 0.2總結
- 1. Introduction
- 1.1. Background and Research Gaps
- 1.1.1逐句翻譯
- 第一段(引出對比學習是否會受到長尾問題影響的疑問)
- 第二段(在監督學習當中的傳統方法這里不能使用了)
- 1.1.2總結
- 1.2. Rationale and Contributions
- 1.2.1 逐句翻譯
- 第一段(模型更容易忘記出現少的類別,所以可以利用這個來找出出現少的類別)
- 第二段(使用低頻率樣本會被逐漸遺忘的特點尋找這些樣本)
- 第三段(詳細介紹文章的實現原理)
- 1.2.1 總結
- 1.3contributions
- 1.3.1逐句翻譯
- 2. Related works
- 3. Method
- 3.1. Preliminaries(初期內容)
- 第一段(Contrastive Learning. 其實就是又簡單介紹了一下simCLR)
- 第二段(Pruning Identified Exemplars. 剪枝之后的原型,剪枝經常剪去長尾的信息)
- 第三段(明確本文的貢獻,將PIEs的識別從有監督擴展到無監督)
- 3.1.2總結
- 3.2. Self-Damaging Contrastive Learning
- 第一段()
0.Abstract
0.1逐句翻譯
The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervised training on real-world data applications.
最近通過對比學習取得的突破加快了在真實數據應用中部署無監督訓練的步伐。
However, unlabeled data in reality is commonly imbalanced and shows a long-tail distribution, and it is unclear how robustly the latest contrastive learning methods could perform in the practical scenario.
然而,現實中未標記的數據通常是不平衡的,呈長尾分布,目前尚不清楚最新的對比學習方法在實際場景中的表現如何。
This paper proposes to explicitly tackle this challenge, via a principled framework called Self-Damaging Contrastive Learning (SDCLR),to automatically balance the representation learning without knowing the classes.
本文提出了一個名為“自損對比學習”(SDCLR)的原則框架來明確解決這一挑戰,在不知道類別的情況下自動平衡表示學習。
Our main inspiration is drawn from the recent finding that deep models have difficult-to-memorize samples, and those may be exposed through network pruning (Hooker et al., 2020).
我們的主要靈感來自最近的發現,深層模型有難以記憶的樣本,這些樣本可能通過網絡修剪暴露出來(Hooker et al., 2020)。
It is further natural to hypothesize that long-tail samples are also tougher for the model to learn well due to insufficient examples.
更自然的假設是,由于樣本不足,長尾樣本對模型來說也更難以學習。
Hence, the key innovation in SDCLR is to create a dynamic self-competitor model to contrast with the target model, which is a pruned version of the latter.
因此,SDCLR的關鍵創新在于建立一個動態self-competitor模型來與目標模型進行對比,動態self-competitor模型是后者的精簡版。(self-competitor model是這個文章的核心一個點,他是通過原來模型在線剪枝獲得的,之后和原來的模型進行對比,獲得更多的信息)
During training, contrasting the two models will lead to adaptive online mining of the most easily forgotten samples for the current target model, and implicitly emphasize them more in the contrastive loss.
在訓練過程中,對比兩種模型會導致自適應在線挖掘當前目標模型中最容易被遺忘的樣本,并在對比損失中隱含地更加強調它們。
Extensive experiments across multiple datasets and imbalance settings show that SDCLR significantly improves not only overall accuracies but also balancedness, in terms of linear evaluation on the full-shot and fewshot settings.
在多個數據集和不平衡設置上的廣泛實驗表明,SDCLR不僅顯著提高了整體精度,而且在全鏡頭和少鏡頭設置的線性評價方面也顯著提高了平衡。(說明效果很好)
Our code is available at https://github.com/VITA-Group/SDCLR.
0.2總結
大約就是使用對比學習的方法來解決長尾的問題。
1. Introduction
1.1. Background and Research Gaps
1.1.1逐句翻譯
第一段(引出對比學習是否會受到長尾問題影響的疑問)
Contrastive learning (Chen et al., 2020a; He et al., 2020; Grill et al., 2020; Jiang et al., 2020; You et al., 2020) recently prevails for deep neural networks (DNNs) to learnpowerful visual representations from unlabeled data.
對比學習最近在深度神經網絡(DNNs)從未標記數據學習強大的視覺表征方面占據上風。(就是對比學習最近取得了很好的實驗效果)
The state-of-the-art contrastive learning frameworks consistently benefit from using bigger models and training on more task agnostic unlabeled data (Chen et al., 2020b).
最先進的對比學習框架一直受益于使用更大的模型和訓練更多的任務不可知的未標記的數據。
The predominant promise implied by those successes is to leverage contrastive learning techniques to pre-train strong and transferable representations from internet-scale sources of unlabeled data.
這些成功所暗示的主要前景是,利用對比學習技術,預先訓練來自互聯網規模的未標記數據來源的強大和可轉移的表征。(對比學習的前景就是使用互聯網時代的大量數據進行預訓練以此來達到很好的運行效果)
However, going from the controlled benchmark data to uncontrolled real-world data will run into several gaps.
然而,從受控制的基準數據到不受控制的真實數據將會遇到幾個缺口。
For example, most natural image and language data exhibit a Zipf long-tail distribution where various feature attributes have very different occurrence frequencies (Zhu et al., 2014; Feldman, 2020).
例如,大多數自然圖像和語言數據顯示Zipf長尾分布,其中各種特性屬性的出現頻率非常不同。()
Broadly speaking, such imbalance is not only limited to the standard single-label classification with majority versus minority class (Liu et al., 2019), but also can extend to multi-label problems along many attribute dimensions (Sarafianos et al., 2018).
廣義上說,這種不平衡不僅局限于多數與少數群體的標準單標簽分類(Liu et al., 2019),還可以延伸到多個屬性維度的多標簽問題(Sarafianos et al., 2018)。
That naturally questions whether contrastive learning can still generalize well in those long-tail scenarios.
這自然會引出對比學習是否仍能在長尾情況下很好地推廣的疑問。
第二段(在監督學習當中的傳統方法這里不能使用了)
We are not the first to ask this important question.
我們不是第一個提出這個重要問題的人。
Earlier works (Yang & Xu, 2020; Kang et al., 2021) pointed out that when the data is imbalanced by class, contrastive learning can learn more balanced feature space than its supervised counterpart.
更早的工作(楊&徐,2020;Kang et al., 2021)指出,當數據按類別不均衡時,對比學習比有監督的對應學習更均衡的特征空間。
Despite those preliminary successes, we find that the state-of-the-art contrastive learning methods remain certain vulnerability to the long-tailed data (even indeed improving over vanilla supervised learning), after digging into more experiments and imbalance settings (see Sec 4).
盡管有了這些初步的成功,我們發現,在深入研究了更多的實驗和不平衡設置(見第4節)后,最先進的對比學習方法仍然存在一定的弱點,容易受到長尾數據的影響(甚至確實比普通的監督學習受到更多的影響)。
(就是說對比學習在均衡方面仍然有一定的弱點)
Such vulnerability is reflected on the linear separability of pretrained features (the instance-rich classes has much more separable features than instance-scarce classes), and affects downstream tuning or transfer performance.
這種漏洞反映在預先訓練的特征的線性可分離性上(實例豐富的類比實例匱乏的類具有更多的可分離特征),并影響下游的調優或傳輸性能。
這種漏洞影響了提取出來的特征的 the linear separability of pretrained features,可能就是有的特征沒有拉開,因而影響了下游的任務。
To conquer this challenge further, the main hurdle lies in the absence of class information;
為了進一步克服這一挑戰,主要障礙在于分類信息的缺乏;
therefore, existing approaches for supervised learning, such as re-sampling the data distribution (Shen et al., 2016; Mahajan et al., 2018) or re-balancing the loss for each class (Khan et al., 2017; Cui et al., 2019; Cao et al., 2019), cannot be straightforwardly made to work here.
因此,現有的監督學習方法,如重新抽樣數據分布(Shen et al., 2016;Mahajan等人,2018年)或重新平衡每個類別的損失(Khan等人,2017年;Cui et al., 2019;曹等人,2019),不能直接在這里工作。
(因為我們沒有類別的信息,所以我們沒法使用傳統的兩種方法1.重新抽樣數據分布和2.重新調整loss的權重)
1.1.2總結
一切的大背景是對比學習大發展
- 1.傳統深度學習當中就存在這個問題。
- 2.雖然之前的工作指出,對比學習受到長尾問題影響比較小;但是作者實驗發現長尾問題可能對對比學習影響更大。
- 3.因為我們在無監督學習當中沒有有效的分類標簽,所以傳統的解決長尾問題的方法可能并不奏效。(大約隱含的就是還是得需要我們來解決這個問題)
1.2. Rationale and Contributions
1.2.1 逐句翻譯
第一段(模型更容易忘記出現少的類別,所以可以利用這個來找出出現少的類別)
Our overall goal is to find a bold push to extend the loss re-balancing and cost-sensitive learning ideas (Khan et al., 2017; Cui et al., 2019; Cao et al., 2019) into an unsupervised setting.
我們的總體目標是尋找一個廣泛使用的方法:這個方法可以擴大損失函數的平衡性,并且開銷比較小。
The initial hypothesis arises from the recent observations that DNNs tend to prioritize learning simple patterns (Zhang et al., 2016; Arpit et al., 2017; Liu et al., 2020; Yao et al., 2020; Han et al., 2020; Xia et al., 2021).
最初的假設來自于最近的觀察,即深度神經網絡(DNN)傾向于優先學習簡單模式。(也就是深度網絡更傾向學習出簡單的模型)
More precisely, the DNN optimization is content-aware, taking advantage of patterns shared by more training examples, and therefore inclined towards memorizing the majority samples.
更準確地說,DNN優化是內容感知的,利用了更多訓練示例共享的模式,因此傾向于記憶大多數樣本(主要的樣本)。
(也就是出現比較多的內容確實更容易被記住)
Since long-tail samples are underrepresented in the training set, they will tend to be poorly memorized, or more “easily forgotten” by the model - a characteristic that one can potentially leverage to spot long-tail samples from unlabeled data in a model-aware yet class-agnostic way.
由于長尾樣本在訓練集中的表現不足,它們往往記憶力差,或者更容易被模型“遺忘”——這是一種可以利用的特征,以一種模型感知但類不可知的方式從未標記的數據中發現長尾樣本。
(因為出現頻率比較低得樣本會提前被神經網絡遺忘,所以我們可以利用這個特點識別出這些長尾樣本)
第二段(使用低頻率樣本會被逐漸遺忘的特點尋找這些樣本)
However, it is in general tedious, if ever feasible, to measure how well each individual training sample is memorized in a given DNN (Carlini et al., 2019).
然而,如果可行的話,衡量在給定的DNN中每個單獨的訓練樣本的記憶程度通常是乏味的。(也就是我們不能挨個進行普查這個樣子)
One blessing comes from the recent empirical finding (Hooker et al., 2020) in the context of image classification.
最近在圖像分類方面的經驗發現(Hooker et al., 2020)帶來了一個好消息。
The authors observed that, network pruning, which usually removes the smallest magnitude weights in a trained DNN, does not affect all learned classes or samples equally.
作者觀察到,網絡修剪通常去除訓練有素的DNN中最小的幅度權值(數量最小的權重),但對所有學習到的類或樣本的影響并不相同。
Rather, it tends to disproportionally hamper the DNN memorization and generalization on the long-tailed and most difficult images from the training set.
相反,它傾向于不成比例地妨礙DNN記憶和泛化訓練集中的長尾和最困難的圖像。
In other words, long-tail images are not “memorized well” and may be easily “forgotten” by pruning the model, making network pruning a practical tool to spot the samples not yet well learned or represented by the DNN.
換句話說,長尾圖像不能“很好地記憶”,而且通過修剪模型很容易“忘記”,使得網絡修剪成為一種實用的工具,可以發現尚未被DNN很好地學習或表示的樣本。
第三段(詳細介紹文章的實現原理)
Inspired by the aforementioned, we present a principled framework called Self-Damaging Contrastive Learning (SDCLR), to automatically balance the representation learning without knowing the classes.
受上述啟發,我們提出了一個叫做自損對比學習(SDCLR)的原則性框架,可以在不知道類別的情況下自動平衡表征學習。
The workflow of SDCLR is illustrated in Fig. 1.
SDCLR工作流程如圖1所示。(大約就是基于優化速度進行剪枝,然后那些不經常出現的就會因為優化的不明顯,而被減去。這樣我們就得到了剪枝之后的網絡,這個只能識別出常見的類別。之后用這個和原來的對比,也就知道了不常出現的內容。)
In addition to creating strong contrastive views by input data augmentation, SDCLR intro duces another new level of contrasting via“model augmentation, by perturbing the target model’s structure and/or current weights.
除了通過輸入數據增強來創建強烈的對比視圖,SDCLR還通過“模型增強,通過擾亂目標模型的結構和/或當前權重”引入了另一個新的對比級別。
In particular, the key innovation in SDCLR is to create a dynamic self-competitor model by pruning the target model online, and contrast the pruned model’s features with the target model’s.
具體來說,SDCLR的關鍵創新在于通過在線剪枝目標模型建立動態自競爭模型,并將剪枝模型與目標模型的特征進行對比。
(self-competitor model是通過在線剪枝獲得)
Based on the observation (Hooker et al., 2020) that pruning impairs model ability to predict accurately on rare and atypical instances, those samples in practice will also have the largest prediction differences before then pruned and non-pruned models.
根據觀察(Hooker et al., 2020),修剪會削弱模型對罕見和非典型實例的準確預測能力,在實際應用中,這些樣本在剪枝模型和非剪枝模型之前的預測差異也最大。
(也就是剪枝對非典型樣本的預測影響很大)
That effectively boosts their weights in the contrastive loss and leads to implicit loss re-balancing.
這有效地增加了它們在對比損失中的權重,并導致隱性損失的重新平衡。(直接將這兩者對比,直接增加了這兩者在對比損失當中的權重)
Moreover, since the self-competitor is always obtained from the updated target model, the two models will co-evolve, which allows the target model to spot diverse memorization failures at different training stages and to progressively learn more balanced representations.
此外,由于自競爭對手總是從更新后的目標模型中獲得,因此兩種模型將共同進化,這使得目標模型能夠在不同的訓練階段發現不同的記憶失敗,并逐步學習更均衡的表征。
(這使得目標模型能夠在不同的訓練階段發現不同的記憶失敗,也就是在某個時間段當中某個預測很少,那么這個就會被對比損失抓出來,所以在不同的時間段,發現不同的失敗。)
1.2.1 總結
因為本文是研究長尾效應的問題,所以我們先得分析這些很少出現的種類具有什么特點。
- 1.因為出現的少,所以對這些低概率類別比較重要的分支優化的比較慢,所以會被依據更新速度進行剪枝的剪枝方法減去。
- 2.所以就導致了剪枝之后的模型,缺少了預測長尾樣本的能力(預測低頻率樣本的能力)。
本文提出的模式與傳統的異同:
先來看看傳統的方法
- 1.傳統模型當中的正例(通過孿生網絡的例子)是來自于兩個不同的數據增強。
- 2.傳統模型當中的孿生網絡都是來自于動量優化(Moco)或是停止梯度傳播(BYLO、simSiam)。
本文提出的模型: - 1.傳統模型當中的正例(通過孿生網絡的例子)是來自于兩個不同的數據增強。
- 2.這里的不同網絡是來自于剪枝。
1.3contributions
1.3.1逐句翻譯
Below we outline our main contributions:
以下是我們的主要貢獻:
? Seeing that unsupervised contrastive learning is not immune to the imbalance data distribution, we design a Self-Damaging Contrastive Learning (SDCLR) framework to address this new challenge.
鑒于無監督對比學習不能避免不平衡的數據分布,我們設計了一個自我損害對比學習(SDCLR)框架來解決這個新的挑戰。
? SDCLR innovates to leverage the latest advances in understanding DNN memorization. By creating and updating a self-competitor online by pruning the target model during training, SDCLR provides an adaptive online mining process to always focus on the most easily forgotten (long tailed) samples throughout training.
:SDCLR創新地利用了理解DNN記憶的最新進展。SDCLR通過在訓練過程中修剪目標模型,在線創建和更新一個自競爭對手,提供了一個自適應的在線挖掘過程,在整個訓練過程中始終關注最容易被遺忘的(長尾)樣本。
? Extensive experiments across multiple datasets and imbalance settings show that SDCLR can significantly improve not only the balancedness of the learned representation.
在多個數據集和不平衡設置的廣泛實驗表明,SDCLR不僅可以顯著改善學習表示的平衡性。
2. Related works
暫時跳過
3. Method
3.1. Preliminaries(初期內容)
第一段(Contrastive Learning. 其實就是又簡單介紹了一下simCLR)
Contrastive learning learns visual representation via enforcing similarity of the positive pairs(vi, vi+) and enlarging the distance of negative pairs (vi, vi?).
對比學習通過加強正負對(vi, vi+)的相似性和擴大負數對(vi, vi?)的距離來學習視覺表征。
Formally, the loss is defined as
在形式上,損失被定義為(就是傳統的對比損失嘛)
SimCLR (Chen et al., 2020a) is one of the state-of-theart contrastive learning frameworks.
SimCLR (Chen et al., 2020a)是目前最先進的對比學習框架之一。
For an input image SimCLR would augment it twice with two different augmentations, and then process them with two branches that share the same architecture and weights.
對于一個輸入圖像,SimCLR會用兩種不同的增強方法對其進行兩次增強,然后用共享相同架構和權值的兩個分支對其進行處理。
Two different versions of the same image are set as positive pairs, and the negative image is sampled from the rest images in the same batch.
將同一幅圖像的兩個不同版本設置為正負對,從同一批的其余圖像中抽取負圖像。
第二段(Pruning Identified Exemplars. 剪枝之后的原型,剪枝經常剪去長尾的信息)
(Hooker et al., 2020) systematically investigates the model output changes introduced by pruning and finds that certain examples are particularly sensitive to sparsity.
(Hooker et al., 2020)系統研究了剪枝引入的模型輸出變化,發現某些例子對稀疏性特別敏感。
These images most impacted after pruning are termed as Pruning Identified Exemplars (PIEs), representing the difficult-to-memorize samples in training.
這些剪枝后受影響最大的圖像被稱為剪枝識別樣本(pruning Identified Exemplars, pie),代表訓練中難以記憶的樣本。
Pruning Identified Exemplars (PIEs),被剪枝識別的樣本,識別的其實就是剪掉的那些。
Moreover, the authors also demonstrate that PIEs often show up at the long-tail of a distribution.
此外,作者還證明了PIEs經常出現在分布的長尾。
(也就是剪枝經常剪去那些長尾例子)
第三段(明確本文的貢獻,將PIEs的識別從有監督擴展到無監督)
We extend (Hooker et al., 2020)’s PIE hypothesis from supervised classification to the unsupervised setting for the first time.
我們首次將(Hooker et al., 2020)的PIE假設從監督分類擴展到無監督設置。
Moreover, instead of pruning a trained model and expose its PIEs once, we are now integrating pruning into the training process as an online step.
此外,我們現在將修剪作為一個在線步驟集成到訓練過程中,而不是修剪一個訓練模型并一次暴露它的PIEs。
With PIEs dynamically generated by pruning a target model under training, we expect them to expose different long-tail examples during training, as the model continues to be trained.
使用通過修剪訓練中的目標模型而動態生成的PIEs,我們希望它們在訓練過程中暴露不同的長尾示例,因為模型仍在繼續訓練。
Our experiments show that PIEs answer well to those new challenges.
我們的實驗表明,PIEs能夠很好地應對這些新挑戰。
3.1.2總結
大約就是說
- 1.simCLR的結構
- 2.剪枝很容易剪掉不經常出現的長尾數據
- 3.我們通過剪枝獲得不同的模型,這個模型忘記了長尾數據,之后使用對比損失函數讓他和原來的網絡拉近一點。這個拉近的過程其實就無形的強化了長尾數據。
3.2. Self-Damaging Contrastive Learning
第一段()
Observation: Contrastive learning is NOT immune to imbalance. Long-tail distribution fails many supervised approaches build on balanced benchmarks (Kang et al., 2019).
Observation:對比學習也不能避免不平衡。長尾分布使許多建立在平衡基準上的監督方法失效。(Kang et al., 2019).
Even contrastive learning does not rely on class labels, it still learns the transformation invariances in a data-driven manner, and will be affected by dataset bias (Purushwalkam & Gupta, 2020).
使對比學習不依賴類標簽,它仍然以數據驅動的方式學習轉換不變性,并且會受到數據集偏差的影響
Particularly for long-tail data, one would naturally hypothesize that the instance-rich head classes may dominate the invariance learning procedure and leaves the tail classes under-learned.
特別是對于長尾數據,人們自然會假設實例豐富的頭類可能主導不變學習過程,而尾類學習不足。
總結
以上是生活随笔為你收集整理的对比学习系列论文SDCLR(二)-Self-Damaging Contrastive Learning的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 对比学习系列论文SDCLR(一)-Sel
- 下一篇: 对比学习simSiam(一)--Expl