當前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

自然语言19.1_Lemmatizing with NLTK（单词变体还原）

發布時間：2023/12/13 综合教程 28 生活家

生活随笔收集整理的這篇文章主要介紹了自然语言19.1_Lemmatizing with NLTK（单词变体还原）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

python金融風控評分卡模型和數據分析微專業課（博主親自錄制視頻）：http://dwz.date/b9vv

Lemmatizing with NLTK

# -*- coding: utf-8 -*-
"""
Spyder Editor

author 231469242@qq.com
微信公眾號：pythonEducation
"""

import nltk
from nltk.stem import WordNetLemmatizer

lemmatizer=WordNetLemmatizer()
#如果不提供第二個參數，單詞變體還原為名詞
#pythonly 無法還原，說明精確度仍然達不到100%
print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("rocks"))
print(lemmatizer.lemmatize("pythonly"))
print(lemmatizer.lemmatize("better", pos="a"))
print(lemmatizer.lemmatize("best", pos="a"))
print(lemmatizer.lemmatize("run"))
print(lemmatizer.lemmatize("run",'v'))    
    
'''
cat
cactus
goose
rock
pythonly
good
best
run
run

'''

A very similar operation to stemming is called lemmatizing. The
major difference between these is, as you saw earlier, stemming can
often create non-existent words, whereas lemmas are actual words.

So, your root stem, meaning the word you end up with, is not
something you can just look up in a dictionary, but you can look up a
lemma.

Some times you will wind up with a very similar word, but sometimes,
you will wind up with a completely different word. Let's see some
examples.

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("rocks"))
print(lemmatizer.lemmatize("python"))
print(lemmatizer.lemmatize("better", pos="a"))
print(lemmatizer.lemmatize("best", pos="a"))
print(lemmatizer.lemmatize("run"))
print(lemmatizer.lemmatize("run",'v'))

Here, we've got a bunch of examples of the lemma for the words that we use. The only major thing to note is that lemmatize takes a part of speech parameter, "pos." If not supplied, the default is "noun." This means that an attempt will be made to find the closest noun, which can create trouble for you. Keep this in mind if you use lemmatizing!

In the next tutorial, we're going to dive into the NTLK corpus that came with the module, looking at all of the awesome documents they have waiting for us there.

python機器學習生物信息學系列課（博主錄制）：http://dwz.date/b9vw

歡迎關注博主主頁，學習python視頻資源

總結

以上是生活随笔為你收集整理的自然语言19.1_Lemmatizing with NLTK（单词变体还原）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：马斯克76岁父亲与35岁继女产女已有7
下一篇：上海首条自动驾驶测试专用道来了：私家车驶