MIT自然语言处理第三讲:概率语言模型(第一、二、三部分)
MIT自然語言處理第三講:概率語言模型(第一部分)
自然語言處理:概率語言模型
Natural Language Processing: Probabilistic Language Modeling
作者:Regina Barzilay(MIT,EECS Department, November 15, 2004)
譯者:我愛自然語言處理(www.52nlp.cn ,2009年1月16日)
上一講主要內容回顧(Last time)
語料庫處理(Corpora processing)
齊夫定律(Zipf’s law)
數據稀疏問題(Data sparseness)
本講主要內容(Today):
概率語言模型(Probabilistic language Modeling)
一、 簡單介紹
a) 預測字符串概率(Predicting String Probabilities)
i. 那一個字符串更有可能或者更符合語法Which string is more likely? (Which string is more grammatical?)
1. Grill doctoral candidates.
2. Grill doctoral updates.
(example from Lee 1997)
ii. 向字符串賦予概率的方法被稱之為語言模型(Methods for assigning probabilities to strings are called Language Models.)
b) 動機(Motivation)
i. 語音識別,拼寫檢查,光學字符識別和其他應用領域(Speech recognition, spelling correction, optical character recognition and other applications)
ii. 讓E作為物證(?不確定翻譯),我們需要決定字符串W是否是有E編碼而得到的消息(Let E be physical evidence, and we need to determine whether the string W is the message encoded by E)
iii. 使用貝葉斯規則(Use Bayes Rule):
P(W/E)={P_{LM}(W)P(E/W)}/{P(E)}
其中P_{LM}(W)是語言模型概率(where?P_{LM}(W)is language model probability)
iv. P_{LM}(W)提供了必要的消歧信息(P_{LM}(W)provides the information necessary for isambiguation (esp. when the physical evidence is not sufficient for disambiguation))
c) 如何計算(How to Compute it)?
i. 樸素方法(Naive approach):
1. 使用最大似然估計(Use the maximum likelihood estimates (MLE))——字符串在語料庫S中存在次數的值由語料庫規模歸一化(the number of times that the string occurs in the corpus S, normalized by the corpus size):
P_{MLE}(Grill~doctorate~candidates)={count(Grill~doctorate~candidates)}/delim{|}{S}{|}
2. 對于未知事件,最大似然估計P_{MLE}=0(For unseen events,?P_{MLE}=0)
——數據稀疏問題比較“可怕”(Dreadful behavior in the presence of Data Sparseness)
d) 兩個著名的句子(Two Famous Sentences)
i. “It is fair to assume that neither sentence
“Colorless green ideas sleep furiously”
nor
“Furiously sleep ideas green colorless”
… has ever occurred … Hence, in any statistical model … these sentences will be ruled out on identical grounds as equally “remote” from English. Yet (1), though nonsensical, is grammatical, while (2) is not.” [Chomsky 1957]
ii. 注:這是喬姆斯基《句法結構》第9頁上的:下面的兩句話從來沒有在一段英語談話中出現過,從統計角度看離英語同樣的“遙遠”,但只有句1是合乎語法的:
1) Colorless green ideas sleep furiously.
2) Furiously sleep ideas sleep green colorless .
“從來沒有在一段英語談話中出現過”、“從統計角度看離英語同樣的‘遙遠’”要看從哪個角度去看了,如果拋開具體的詞匯、從形類角度看,恐怕句1的統計頻率要高于句2而且在英語中出現過。
未完待續:第二部分
附:課程及課件pdf下載MIT英文網頁地址:
http://people.csail.mit.edu/regina/6881/
注:本文遵照麻省理工學院開放式課程創作共享規范翻譯發布,轉載請注明出處“我愛自然語言處理”:www.52nlp.cn
from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-first-part/
MIT自然語言處理第三講:概率語言模型(第二部分)
自然語言處理:概率語言模型
Natural Language Processing: Probabilistic Language Modeling
作者:Regina Barzilay(MIT,EECS Department, November 15, 2004)
譯者:我愛自然語言處理(www.52nlp.cn ,2009年1月17日)
二、語言模型構造
a) 語言模型問題提出(The Language Modeling Problem)
i. 始于一些詞匯集合(Start with some vocabulary):
ν= {the, a, doctorate, candidate, Professors, grill, cook, ask, …}
ii. 得到一個與詞匯集合v關的訓練樣本(Get a training sample of v):
Grill doctorate candidate.
Cook Professors.
Ask Professors.
……
iii. 假設(Assumption):訓練樣本是由一些隱藏的分布P刻畫的(training sample is drawn from some underlying distribution P)
iv. 目標(Goal): 學習一個概率分布P prime盡可能的與P近似(learn a probability distribution?P prime?“as close” to P as possible)
sum{x in v}{}{P prime (x)}=1, P prime (x) >=0
P prime (candidates)=10^{-5}
{P prime (ask~candidates)}=10^{-8}
b) 獲得語言模型(Deriving Language Model)
i. 向一組單詞序列w_{1}w_{2}…w_{n}賦予概率(Assign probability to a sequencew_{1}w_{2}…w_{n}?)
ii. 應用鏈式法則(Apply chain rule):
1. P(w1w2…wn)= P(w1|S)?P(w2|S,w1)?P(w3|S,w1,w2)…P(E|S,w1,w2,…,wn)
2. 基于“歷史”的模型(History-based model): 我們從過去的事件中預測未來的事件(we predict following things from past things)
3. 我們需要考慮多大范圍的上下文(How much context do we need to take into account)?
c) 馬爾科夫假設(Markov Assumption)
i. 對于任意長度的單詞序列P(wi|w(i-n) …w(i?1))是比較難預測的(For arbitrary long contexts P(wi|w(i-n) …w(i?1))difficult to estimate)
ii. 馬爾科夫假設(Markov Assumption): 第i個單詞wi僅依賴于前n個單詞(wi depends only on n preceding words)
iii. 三元語法模型(又稱二階馬爾科夫模型)(Trigrams (second order)):
1. P(wi|START,w1,w2,…,w(i?1))=P(wi|w(i?1),w(i?2))
2. P(w1w2…wn)= P(w1|S)?P(w2|S,w1)?P(w3|w1,w2)?…P(E|w(n?1),wn)
d) 一種語言計算模型(A Computational Model of Language)
i. 一種有用的概念和練習裝置(A useful conceptual and practical device):“拋硬幣”模型(coin-flipping models)
1. 由隨機算法生成句子(A sentence is generated by a randomized algorithm)
——生成器可以是許多“狀態”中的一個(The generator can be one of several “states”)
——拋硬幣決定下一個狀態(Flip coins to choose the next state)
——拋其他硬幣決定哪一個字母或單詞輸出(Flip other coins to decide which letter or word to output)
ii. 香農(Shannon): “The states will correspond to the“residue of influence” from preceding letters”
e) 基于單詞的逼近(Word-Based Approximations)
注:以下是用莎士比亞作品訓練后隨機生成的句子,可參考《自然語言處理綜論》
i. 一元語法逼近(這里MIT課件有誤,不是一階逼近(First-order approximation))
1. To him swallowed confess hear both. which. OF save
2. on trail for are ay device and rote life have
3. Every enter now severally so, let
4. Hill he late speaks; or! a more to leg less first you
5. enter
ii. 三元語法逼近(這里課件有誤,不是三階逼近(Third-order approximation))
1. King Henry. What! I will go seek the traitor Gloucester.
2. Exeunt some of the watch. A great banquet serv’s in;
3. Will you tell me how I am?
4. It cannot be but so.
未完待續:第三部分
附:課程及課件pdf下載MIT英文網頁地址:
http://people.csail.mit.edu/regina/6881/
注:本文遵照麻省理工學院開放式課程創作共享規范翻譯發布,轉載請注明出處“我愛自然語言處理”:www.52nlp.cn
from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-second-part/
MIT自然語言處理第三講:概率語言模型(第三部分)
自然語言處理:概率語言模型
Natural Language Processing: Probabilistic Language Modeling
作者:Regina Barzilay(MIT,EECS Department, November 15, 2004)
譯者:我愛自然語言處理(www.52nlp.cn ,2009年1月18日)
三、 語言模型的評估
a) 評估一個語言模型(Evaluating a Language Model)
i. 我們有n個測試單詞串(We have n test string):
S_{1},S_{2},…,S_{n}
ii. 考慮在我們模型之下這段單詞串的概率(Consider the probability under our model):
prod{i=1}{n}{P(S_{i})}
或對數概率(or log probability):
log{prod{i=1}{n}{P(S_{i})}}=sum{i=1}{n}{logP(S_{i})}
iii. 困惑度(Perplexity):
Perplexity = 2^{-x}
這里x = {1/W}sum{i=1}{n}{logP(S_{i})}
W是測試數據里總的單詞數(W is the total number of words in the test data.)
iv. 困惑度是一種有效的“分支因子”評測方法(Perplexity is a measure of effective “branching factor”)
1. 我們有一個規模為N的詞匯集v,模型預測(We have a vocabulary v of size N, and model predicts):
P(w) = 1/N 對于v中所有的單詞(for all the words in v.)
v. 困惑度是什么(What about Perplexity)?
Perplexity = 2^{-x}
這里?x = log{1/N}
于是 Perplexity = N
vi. 人類行為的評估(estimate of human performance (Shannon, 1951)
1. 香農游戲(Shannon game)— 人們在一段文本中猜測下一個字母(humans guess next letter in text)
2. PP=142(1.3 bits/letter), uncased, open vocabulary
vii. 三元語言模型的評估(estimate of trigram language model (Brown et al. 1992))
PP=790(1.75 bits/letter), cased, open vocabulary
未完待續:第四部分
附:課程及課件pdf下載MIT英文網頁地址:
http://people.csail.mit.edu/regina/6881/
注:本文遵照麻省理工學院開放式課程創作共享規范翻譯發布,轉載請注明出處“我愛自然語言處理”:www.52nlp.cn
from:http://www.52nlp.cn/mit-nlp-third-lesson-probabilistic-language-modeling-third-part/
總結
以上是生活随笔為你收集整理的MIT自然语言处理第三讲:概率语言模型(第一、二、三部分)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MIT自然语言处理第二讲:单词计数(第三
- 下一篇: MIT自然语言处理第三讲:概率语言模型(