當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

自然语言处理-nltk学习(一)

發布時間：2024/1/23 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了自然语言处理-nltk学习(一) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

NLTK庫安裝

pip install nltk

執行python并下載書籍：

[root@centos #] python Python 2.7.11 (default, Jan 22 2016, 08:29:18) [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import nltk >>> nltk.download()

選擇book后點Download開始下載

下載完成以后再輸入：

>>> from nltk.book import *

你會看到可以正常加載書籍如下：

*** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908

這里面的text*都是一個一個的書籍節點，直接輸入text1會輸出書籍標題：

>>> text1 <Text: Moby Dick by Herman Melville 1851>

搜索文本

執行

>>> text1.concordance("former")

會顯示20個包含former的語句上下文

我們還可以搜索相關詞，比如：

>>> text1.similar("ship") whale boat sea captain world way head time crew man other pequod line deck body fishery air boats side voyage

輸入了ship，查找了boat，都是近義詞

我們還可以查看某個詞在文章里出現的位置：

>>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

詞統計

len(text1)：返回總字數

set(text1)：返回文本的所有詞集合

len(set(text4))：返回文本總詞數

text4.count("is")：返回“is”這個詞出現的總次數

FreqDist(text1)：統計文章的詞頻并按從大到小排序存到一個列表里

fdist1 = FreqDist(text1);fdist1.plot(50, cumulative=True)：統計詞頻，并輸出累計圖像

縱軸表示累加了橫軸里的詞之后總詞數是多少，這樣看來，這些詞加起來幾乎達到了文章的總詞數

fdist1.hapaxes()：返回只出現一次的詞

text4.collocations()：頻繁的雙聯詞

總結

以上是生活随笔為你收集整理的自然语言处理-nltk学习(一)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： hadoop中的filesystem和l
下一篇：自然语言处理-nltk学习(二)

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

自然语言处理-nltk学习(一)

NLTK庫安裝

搜索文本

詞統計

總結