當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

8.2 英文词频统计(project)

發布時間：2024/5/15 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了 8.2 英文词频统计(project) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

第1關?讀取文件

第2關?統計單詞數量

第3關?統計單詞出現的次數

第4關?統計非特殊單詞出現的次數

第1關?讀取文件

本關任務：編寫一個讀取文件的小程序。

問題描述

《誰動了我的奶酪？》是美國作家斯賓塞·約翰遜創作的一個寓言故事，該書首次出版于1998年。書中主要講述4個“人物”——兩只小老鼠“嗅嗅(Sniff)”、“匆匆(Scurry)”和兩個小矮人“哼哼(Hem)”、“唧唧(Haw)”找尋奶酪的故事。????????????????????????????????????????????????????????????????

import stringdef read_file(file):"""接收文件名為參數，將文件中的內容讀為字符串，只保留文件中的英文字母和西文符號，過濾掉中文所有字符轉為小寫，將其中所有標點、符號替換為空格，返回字符串"""########## Begin ##########with open (file) as f :txt = f.read().lower()for i in ',."-':txt = txt.replace(i,' ')return txt########## End ##########if __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 調用函數返回字典類型的數據n = int(input())print(content[:n])

第2關?統計單詞數量

本關任務：編寫一個能計算單詞數量的小程序。

import stringdef count_of_words(txt):"""接收去除標點、符號的字符串，統計并返回其中單詞數量和不重復的單詞數量"""########## Begin ##########txt = txt.split()counts = {}for i in txt:counts[i] = counts.get(i,0) + 1return len(txt),len(counts)########## End ##########def read_file(file):"""接收文件名為參數，將文件中的內容讀為字符串，只保留文件中的英文字母和西文符號，過濾掉中文所有字符轉為小寫，將其中所有標點、符號替換為空格，返回字符串"""with open(file, 'r', encoding='utf-8') as novel:txt = novel.read()english_only_txt = ''.join(x for x in txt if ord(x) < 256)english_only_txt = english_only_txt.lower()for character in string.punctuation:english_only_txt = english_only_txt.replace(character, ' ')return english_only_txtif __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 調用函數返回字典類型的數據amount_results = count_of_words(content)print('文章共有單詞{}個，其中不重復單詞{}個'.format(*amount_results))

第3關?統計單詞出現的次數

預期輸出：

the 369

he 337

to 333

and 312

cheese 214

it 187

they 166

of 158

a 146

had 142

import stringdef word_frequency(txt):"""接收去除標點、符號的字符串，統計并返回每個單詞出現的次數返回值為字典類型，單詞為鍵，對應出現的次數為值"""########## Begin ##########txt = txt.split()counts = {}for i in txt:counts[i] = counts.get(i,0) + 1return counts########## End ##########def top_ten_words(frequency, cnt):"""接收詞頻字典，輸出出現次數最多的cnt個單詞及其出現次數"""########## Begin ##########dic = sorted(frequency.items(),key = lambda x: x[1], reverse = True)for i in dic[0:cnt]:print(*i)########## End ##########def read_file(file):"""接收文件名為參數，將文件中的內容讀為字符串，只保留文件中的英文字母和西文符號，過濾掉中文所有字符轉為小寫，將其中所有標點、符號替換為空格，返回字符串"""with open(file, 'r', encoding='utf-8') as novel:txt = novel.read()english_only_txt = ''.join(x for x in txt if ord(x) < 256)english_only_txt = english_only_txt.lower()for character in string.punctuation:english_only_txt = english_only_txt.replace(character, ' ')return english_only_txtif __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 調用函數返回字典類型的數據frequency_result = word_frequency(content) # 統計詞頻n = int(input())top_ten_words(frequency_result, n)

第4關?統計非特殊單詞出現的次數

測試輸入： 8

預期輸出：

cheese 214

haw 113

what 105

change 86

hem 83

new 70

said 60

maze 46

import stringdef top_ten_words_no_excludes(frequency, cnt):"""接收詞頻字典，去除常見的冠詞、代詞、系動詞和連接詞后，輸出出現次數最多的cnt個單詞及其出現次數需排除的單詞如下：excludes_words = ['a', 'an', 'the', 'i', 'he', 'she', 'his', 'my', 'we','or', 'is', 'was', 'do','and', 'at', 'to', 'of', 'it', 'on', 'that', 'her', 'c','in', 'you', 'had','s', 'with', 'for', 't', 'but', 'as', 'not', 'they', 'be', 'were', 'so', 'our','all', 'would', 'if', 'him', 'from', 'no', 'me', 'could', 'when', 'there','them', 'about', 'this', 'their', 'up', 'been', 'by', 'out', 'did', 'have']"""########## Begin ##########excludes_words = ['a', 'an', 'the', 'i', 'he', 'she', 'his', 'my', 'we','or', 'is', 'was', 'do','and', 'at', 'to', 'of', 'it', 'on', 'that', 'her', 'c','in', 'you', 'had','s', 'with', 'for', 't', 'but', 'as', 'not', 'they', 'be', 'were', 'so', 'our','all', 'would', 'if', 'him', 'from', 'no', 'me', 'could', 'when', 'there','them', 'about', 'this', 'their', 'up', 'been', 'by', 'out', 'did', 'have']for i in excludes_words:frequency.pop(i)dic = sorted(frequency.items(),key = lambda x: x[1], reverse = True)for i in dic[0:cnt]:print(*i)########## End ##########def read_file(file):"""接收文件名為參數，將文件中的內容讀為字符串，只保留文件中的英文字母和西文符號，過濾掉中文所有字符轉為小寫，將其中所有標點、符號替換為空格，返回字符串"""with open(file, 'r', encoding='utf-8') as novel:txt = novel.read()english_only_txt = ''.join(x for x in txt if ord(x) < 256)english_only_txt = english_only_txt.lower()for character in string.punctuation:english_only_txt = english_only_txt.replace(character, ' ')return english_only_txtdef word_frequency(txt):"""接收去除標點、符號的字符串，統計并返回每個單詞出現的次數返回值為字典類型，單詞為鍵，對應出現的次數為值"""frequency = dict()words_list = txt.split()for word in words_list:frequency[word] = frequency.get(word, 0) + 1return frequencyif __name__ == '__main__':filename = 'Who Moved My Cheese.txt' # 文件名content = read_file(filename) # 調用函數返回字典類型的數據frequency_result = word_frequency(content) # 統計詞頻n = int(input())top_ten_words_no_excludes(frequency_result, n)

總結

以上是生活随笔為你收集整理的8.2 英文词频统计(project)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：微信小程序 swiper和weiper-
下一篇： Python：使用爬虫获取中国最好的大学

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

8.2 英文词频统计(project)

第1關?讀取文件

第2關?統計單詞數量

第3關?統計單詞出現的次數

第4關?統計非特殊單詞出現的次數

總結