當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

bert 句向量的各向异性问题及与对比学习的联系

發布時間：2024/1/18 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 bert 句向量的各向异性问题及与对比学习的联系小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

? ? ? ? 本文主要介紹了為什么基于bert產出的句向量，在語義相似相關的任務上表現較差的原因及相關解釋（各向異性，表示退化，錐形空間），另外介紹了simcse 中論述的對比學習與各向異性的聯系。

????????主要是涉及的相關論文和主要論點，留存用。

問題引入：

相關論文解釋：

1. REPRESENTATION DEGENERATION PROBLEM IN TRAINING NATURAL LANGUAGE GENERATION MODELS

2. bert-flow : chap2 : Understanding the Sentence Embedding Space of BERT

2.1 The Connection between Semantic Similarity and BERT Pre-training ：

2.2 Anisotropic Embedding Space Induces Poor Semantic Similarity：

3. simcse : chap5 : Connection to Anisotropy

4. Alignment and Uniformity

相關論文：

問題引入：

why do the BERT-induced sentence embeddings perform poorly to retrieve semantically similar sentences?

即，為什么基于bert，來產出句向量，在語義相似相關的任務上表現極差？

Reimers and Gurevych (2019) demonstrate that such BERT sentence embeddings lag behind the state-of-the-art sentence embeddings in terms of semantic similarity. On the STS-B dataset, BERT sentence embeddings are even less competitive to averaged GloVe (Pennington et al., 2014) embed- dings, which is a simple and non-contextualized baseline proposed several years ago.

相關論文解釋：

1. REPRESENTATION DEGENERATION PROBLEM IN TRAINING NATURAL LANGUAGE GENERATION MODELS

主要引進了表示退化問題（各向異性）

We observe that when training a model for natural language genera- tion tasks through likelihood maximization with the weight tying trick, especially with big training datasets, most of the learnt word embeddings tend to degenerate and be distributed into a narrow cone, which largely limits the representation power of word embeddings.

......

2. bert-flow : chap2 : Understanding the Sentence Embedding Space of BERT

主要介紹了bert類預訓練任務和語義相似的聯系，以及對語義相似表現較差的分析

2.1 The Connection between Semantic Similarity and BERT Pre-training ：

The similarity between BERT sentence embed- dings can be reduced to the similarity betweenT2BERT context embeddings hc hc′ . However, as shown in Equation 1, the pretraining of BERT does not explicitly involve the computation of hTc hc′ . Therefore, we can hardly derive a mathematical formulation of what h?c hc′ exactly represents.
Co-Occurrence Statistics as the Proxy for Semantic Similarity： roughly speaking, it is semantically meaningful to compute the dot product be- tween a context embedding and a word embedding
Higher-Order Co-Occurrence Statistics as Context-Context Semantic Similarity： During pretraining, the semantic relationship between two contexts c and c′ could be inferred and reinforced with their connections to words.

2.2 Anisotropic Embedding Space Induces Poor Semantic Similarity：

To investigate the underlying problem of the fail- ure, we use word embeddings as a surrogate be- cause words and contexts share the same embed- ding space. If the word embeddings exhibits some misleading properties, the context embeddings will also be problematic, and vice versa.

Gao et al. (2019) and Wang et al. (2020) have pointed out that, for language modeling, the max- imum likelihood training with Equation 1 usually produces an anisotropic word embedding space. “Anisotropic” means word embeddings occupy a narrow cone in the vector space.

Observation 1: Word Frequency Biases the Embedding Space
Observation 2: Low-Frequency Words Dis- perse Sparsely We observe that, in the learned anisotropic embedding space, high-frequency words concentrates densely and low-frequency words disperse sparsely.
Due to the sparsity, many “holes” could be formed around the low-frequency word embed- dings in the embedding space, where the semantic meaning can be poorly defined. Note that BERT sentence embeddings are produced by averaging the context embeddings, which is a convexity- preserving operation. However, the holes violate the convexity of the embedding space

3. simcse : chap5 : Connection to Anisotropy

主要介紹了simcse 與各向異性的聯系，及為什么simcse會有效

we take a singular spectrum perspective—which is a common practice in analyzing word embeddings (Mu and Viswanath, 2018; Gao et al., 2019; Wang et al., 2020), and show that the contrastive objective can “flatten” the singular value distribution of sentence embeddings and make the representations more isotropic.

......

4. Alignment and Uniformity

主要引進了Alignment and Uniformity 來分析和評估（訓練）句向量

......

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活随笔

生活随笔

编程问答

bert 句向量的各向异性问题及与对比学习的联系

問題引入：

相關論文解釋：

1. REPRESENTATION DEGENERATION PROBLEM IN TRAINING NATURAL LANGUAGE GENERATION MODELS

2. bert-flow : chap2 : Understanding the Sentence Embedding Space of BERT

2.1 The Connection between Semantic Similarity and BERT Pre-training ：

2.2 Anisotropic Embedding Space Induces Poor Semantic Similarity：

3. simcse : chap5 : Connection to Anisotropy

4. Alignment and Uniformity

相關論文：

總結

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

bert 句向量 的 各向异性问题 及与 对比学习 的联系

問題引入：

相關論文解釋：

1. REPRESENTATION DEGENERATION PROBLEM IN TRAINING NATURAL LANGUAGE GENERATION MODELS

2. bert-flow : chap2 : Understanding the Sentence Embedding Space of BERT

2.1 The Connection between Semantic Similarity and BERT Pre-training ：

2.2 Anisotropic Embedding Space Induces Poor Semantic Similarity：

3. simcse : chap5 : Connection to Anisotropy

4. Alignment and Uniformity

相關論文：

總結

bert 句向量的各向异性问题及与对比学习的联系