當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Word Embedding的通俗解释

發布時間：2024/1/1 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 Word Embedding的通俗解释小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

**Word Embedding是NLP中最頻繁出現的詞了，關于word embedding，其實很簡單。
word embedding的意思是：給出一個文檔，文檔就是一個單詞序列比如 “A B A C B F G”, 希望對文檔中每個不同的單詞都得到一個對應的向量(往往是低維向量)表示。
比如，對于這樣的“A B A C B F G”的一個序列，也許我們最后能得到：A對應的向量為[0.1 0.6 -0.5]，B對應的向量為[-0.2 0.9 0.7] （此處的數值只用于示意）
之所以希望把每個單詞變成一個向量，目的還是為了方便計算，比如“求單詞A的同義詞”，就可以通過“求與單詞A在cos距離下最相似的向量”來做到。
word embedding不是一個新的topic，很早就已經有人做了，比如bengio的paper“Neural probabilistic language models”，這其實還不算最早，更早的時候，Hinton就已經提出了distributed representation的概念“Learning distributed representations of concepts”(只不過不是用在word embedding上面) ，AAAI2015的時候問過Hinton怎么看google的word2vec，他說自己20年前就已經搞過了，哈哈，估計指的就是這篇paper。
總之，常見的word embedding方法就是先從文本中為每個單詞構造一組features，然后對這組feature做distributed representations，哈哈，相比于傳統的distributed representations，區別就是多了一步(先從文檔中為每個單詞構造一組feature)。
既然word embedding是一個老的topic，為什么會火呢？原因是Tomas Mikolov在Google的時候發的這兩篇paper：“Efficient Estimation of Word Representations in Vector Space”、“Distributed Representations of Words and Phrases and their Compositionality”。
這兩篇paper中提出了一個word2vec的工具包，里面包含了幾種word embedding的方法，這些方法有兩個特點。一個特點是速度快，另一個特點是得到的embedding vectors具備analogy性質。analogy性質類似于“A-B=C-D”這樣的結構，舉例說明：“北京-中國 = 巴黎-法國”。Tomas Mikolov認為具備這樣的性質，則說明得到的embedding vectors性質非常好，能夠model到語義。
這兩篇paper是2013年的工作，至今(2015.8)，這兩篇paper的引用量早已經超好幾百，足以看出其影響力很大。當然，word embedding的方案還有很多，常見的word embedding的方法有:

Distributed Representations of Words and Phrases and their Compositionality

Efficient Estimation of Word Representations in Vector Space

GloVe Global Vectors forWord Representation

Neural probabilistic language models

Natural language processing (almost) from scratch

Learning word embeddings efficiently with noise contrastive estimation

A scalable hierarchical distributed language model

Three new graphical models for statistical language modelling

Improving word representations via global context and multiple word prototypes
word2vec中的模型至今(2015.8)還是存在不少未解之謎，因此就有不少papers嘗試去解釋其中一些謎團，或者建立其與其他模型之間的聯系，下面是paper list

Neural Word Embeddings as Implicit Matrix Factorization

Linguistic Regularities in Sparse and Explicit Word Representation

Random Walks on Context Spaces Towards an Explanation of the Mysteries of Semantic Word Embeddings

word2vec Explained Deriving Mikolov et al.’s Negative Sampling Word Embedding Method

Linking GloVe with word2vec

Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective**

原文鏈接：https://blog.csdn.net/jdbc/article/details/49467239

總結

以上是生活随笔為你收集整理的Word Embedding的通俗解释的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Microsoft Word的学习
下一篇：【计算机毕业设计】高校二手交易平台

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Word Embedding的通俗解释

總結