自然语言处理之词袋模型Bag_of_words
生活随笔
收集整理的這篇文章主要介紹了
自然语言处理之词袋模型Bag_of_words
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
文章目錄
- 讀取訓(xùn)練數(shù)據(jù)
- BeautifulSoup處理
- 獲取詞袋和向量
- 預(yù)測結(jié)果
- 使用隨機(jī)森林分類器進(jìn)行分類
- 輸出提交結(jié)果
- 嘗試使用xgb
- 還是隨機(jī)森林好用
教程地址:
https://www.kaggle.com/c/word2vec-nlp-tutorial/overview/part-1-for-beginners-bag-of-words
讀取訓(xùn)練數(shù)據(jù)
訓(xùn)練數(shù)據(jù)的內(nèi)容是2500條電影評論。
import pandas as pd train = pd.read_csv("./data/labeledTrainData.tsv", header=0, delimiter="\t", quoting=3) train.head(3)| "5814_8" | 1 | "With all this stuff going down at the moment ... |
| "2381_9" | 1 | "\"The Classic War of the Worlds\" by Timothy ... |
| "7759_3" | 0 | "The film starts with a manager (Nicholas Bell... |
train當(dāng)中的review項里面包含的數(shù)據(jù)是HTML類型,為了去除HTML標(biāo)簽,保存純粹的評論,使用BeautifulSoup。
BeautifulSoup處理
from bs4 import BeautifulSoup # 創(chuàng)建 beautifulsoup 對象 soup = BeautifulSoup(example) #格式化輸出內(nèi)容 print(soup.prettify()) <html><body><p>"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br/><br/>Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br/><br/>The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br/><br/>Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br/><br/>Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."</p></body> </html> # 查找各個標(biāo)簽 print(soup.title) print(soup.head) print(soup.a) print(soup.p) None None None <p>"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br/><br/>Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br/><br/>The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br/><br/>Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br/><br/>Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."</p> # 遍歷孩子 for child in soup.body.children:print (child) <p>"With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.<br/><br/>Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.<br/><br/>The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.<br/><br/>Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.<br/><br/>Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter."</p> # find_all是一個很神奇的函數(shù),可以傳入字符、列表、正則表達(dá)式、函數(shù)等等等。 print(soup.find_all('br')) [<br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>, <br/>] # 可以在soup.select里面直接使用css代碼 print(soup.select('.p')) [] # 獲取text print(soup.get_text()) "With all this stuff going down at the moment with MJ i've started listening to his music, watching the odd documentary here and there, watched The Wiz and watched Moonwalker again. Maybe i just want to get a certain insight into this guy who i thought was really cool in the eighties just to maybe make up my mind whether he is guilty or innocent. Moonwalker is part biography, part feature film which i remember going to see at the cinema when it was originally released. Some of it has subtle messages about MJ's feeling towards the press and also the obvious message of drugs are bad m'kay.Visually impressive but of course this is all about Michael Jackson so unless you remotely like MJ in anyway then you are going to hate this and find it boring. Some may call MJ an egotist for consenting to the making of this movie BUT MJ and most of his fans would say that he made it for the fans which if true is really nice of him.The actual feature film bit when it finally starts is only on for 20 minutes or so excluding the Smooth Criminal sequence and Joe Pesci is convincing as a psychopathic all powerful drug lord. Why he wants MJ dead so bad is beyond me. Because MJ overheard his plans? Nah, Joe Pesci's character ranted that he wanted people to know it is he who is supplying drugs etc so i dunno, maybe he just hates MJ's music.Lots of cool things in this like MJ turning into a car and a robot and the whole Speed Demon sequence. Also, the director must have had the patience of a saint when it came to filming the kiddy Bad sequence as usually directors hate working with one kid let alone a whole bunch of them performing a complex dance scene.Bottom line, this movie is for people who like MJ on one level or another (which i think is most people). If not, then stay away. It does try and give off a wholesome message and ironically MJ's bestest buddy in this movie is a girl! Michael Jackson is truly one of the most talented people ever to grace this planet but is he guilty? Well, with all the attention i've gave this subject....hmmm well i don't know because people can be different behind closed doors, i know this for a fact. He is either an extremely nice but stupid guy or one of the most sickest liars. I hope he is not the latter." import nltk import re from nltk.corpus import stopwords from nltk.stem.lancaster import LancasterStemmer lancaster_stemmer = LancasterStemmer() print (stopwords.words("english")) ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]獲取詞袋和向量
# 寫一個處理函數(shù) def review_to_words( raw_review ):review_text = BeautifulSoup(raw_review).get_text()# 去除標(biāo)點和數(shù)字,僅保留強(qiáng)烈語氣詞letters_only = re.sub("[^a-zA-Z?!]", " ", review_text) # 統(tǒng)一轉(zhuǎn)換為小寫字母words = letters_only.lower().split() # 由于set的搜索速度更快,所以把list轉(zhuǎn)換成setstops = set(stopwords.words("english")) # 移除停止詞,并且將詞轉(zhuǎn)為原形形式meaningful_words = [lancaster_stemmer.stem(w) for w in words if not w in stops] # 返回標(biāo)準(zhǔn)語句return( " ".join( meaningful_words )) # 獲取字符串列表 num_reviews = train["review"].size clean_train_reviews = [] for i in range(num_reviews):clean_train_reviews.append( review_to_words( train["review"][i] ) ) uk edit show rath less extrav us vert person concern get new kitch perhap bedroom bathroom wond grat got us vert show everyth real tv instead mak improv hous occup could afford entir hous get rebuilt know show try show lousy welf system ex us beg hard enough receiv rath vulg produc plac tak plac particul sear also uncal rsther turn on famy depr are pot millionair would far bet help commun whol instead spend hundr thousand doll on hom build someth whol commun perhap plac diy pow tool borrow return along build mat everyon benefit want giv on person caus enorm res among rest loc commun stil liv run hous在進(jìn)行下一步之前,有必要介紹一下詞袋模型,要將幾個句子轉(zhuǎn)化成向量,第一步是把它們包含的所有詞不重復(fù)地裝到一個袋子里,然后這幾個句子就可以轉(zhuǎn)換成和袋子里的詞的數(shù)量一樣長的向量,這個向量的每一個位置都對應(yīng)著袋子里面某一個詞在句子中出現(xiàn)的次數(shù),如果沒有出現(xiàn)就是0.
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(analyzer = "word", tokenizer = None, preprocessor = None, stop_words = None, max_features = 5000) train_data_features = vectorizer.fit_transform(clean_train_reviews) train_data_features = train_data_features.toarray() print(train_data_features.shape) (25000, 5000)查看詞袋里面裝的具體內(nèi)容
vocab = vectorizer.get_feature_names() print(len(vocab)) print(vocab) 5000 ['abandon', 'abbot', 'abc', 'abduc', 'abl', 'abomin', 'aborigin', 'abort', 'abound', 'about', 'abraham', 'abrupt', 'abs', 'absolv', 'absorb', 'absurd', 'abud', 'abund', 'abus', 'abysm', 'ac', 'academy', 'acc', 'acceiv', 'access', 'accid', 'acclaim', 'accompany', 'accompl', 'accord', 'account', 'accus', 'ach', 'achiev', 'acid', 'acknowledg', 'acquaint', 'acquir', 'across', 'act', 'actress', 'ad', 'adam', 'adapt', 'addict', 'addit', 'address', 'adel', 'adequ', 'adjust', 'admin', 'admir', 'admit', 'adolesc', 'adopt', 'adr', 'adult', 'adv', 'advers', 'advert', 'aesthet', 'af', 'affair', 'affect', 'affirm', 'affleck', 'afford', 'afr', 'afraid', 'afric', 'afterma', 'afternoon', 'afterward', 'ag', 'again', 'agend', 'aggress', 'ago', 'agon', 'agony', 'agr', 'agree', 'ah', 'ahead', 'aid', 'aim', 'aimless', 'air', 'airpl', 'airport', 'ak', 'akin', 'akshay', 'al', 'ala', 'alarm', 'alb', 'albeit', 'albert', 'alcohol', 'alec', 'alert', 'alex', 'alexand', 'alexandr', 'alfr', 'alic', 'alik', 'alison', 'all', 'alleg', 'alley', 'allow', 'allud', 'almost', 'alon', 'along', 'alongsid', 'alot', 'already', 'alright', 'also', 'alt', 'altern', 'although', 'altm', 'altogeth', 'alvin', 'alway', 'aly', 'am', 'amand', 'amaz', 'amazon', 'amb', 'ambigu', 'ambit', 'amby', 'americ', 'amidst', 'amitabh', 'among', 'amongst', 'amount', 'ampl', 'amrit', 'amus', 'amy', 'an', 'analys', 'anch', 'anct', 'and', 'anderson', 'andr', 'andre', 'andrew', 'andy', 'ang', 'angel', 'angl', 'angry', 'angst', 'angy', 'anil', 'anim', 'ann', 'annount', 'annoy', 'anny', 'anoth', 'answ', 'ant', 'antagon', 'antholog', 'anthony', 'anticip', 'anton', 'antonio', 'antonion', 'antwon', 'anxy', 'anybody', 'anyhow', 'anym', 'anyon', 'anyone', 'anyth', 'anytim', 'anyway', 'anywh', 'ap', 'apart', 'apocalypt', 'apolog', 'app', 'appal', 'appear', 'appl', 'applaud', 'apply', 'apprecy', 'approach', 'appropry', 'approv', 'approxim', 'april', 'apt', 'ar', 'arab', 'arc', 'arch', 'archaeolog', 'architect', 'are', 'area', 'argentin', 'argu', 'ariel', 'aristocr', 'arkin', 'arm', 'armstrong', 'army', 'arnold', 'around', 'arquet', 'arrang', 'arrest', 'arrog', 'arrow', 'art', 'arth', 'artic', 'artsy', 'artwork', 'arty', 'as', 'ash', 'asham', 'ashley', 'asid', 'ask', 'asleep', 'aspect', 'aspir', 'ass', 'assassin', 'assault', 'assembl', 'assert', 'asset', 'assign', 'assist', 'assocy', 'assort', 'assum', 'astair', 'aston', 'astound', 'astronaut', 'asyl', 'at', 'athlet', 'atl', 'atmosph', 'atroc', 'atrocy', 'attach', 'attack', 'attempt', 'attenborough', 'attend', 'attitud', 'attorney', 'attract', 'attribut', 'aud', 'audio', 'audit', 'audrey', 'audy', 'august', 'aunt', 'aur', 'aussy', 'aust', 'austin', 'austral', 'aut', 'auth', 'auto', 'autobiograph', 'autom', 'av', 'avail', 'aveng', 'avid', 'avoid', 'aw', 'await', 'awak', 'award', 'away', 'awesom', 'awhil', 'awkward', 'ax', 'aztec', 'bab', 'baby', 'babysit', 'bacal', 'bach', 'bachch', 'bachel', 'back', 'backdrop', 'background', 'backst', 'backward', 'bacon', 'bad', 'baddy', 'baffl', 'bag', 'bait', 'bak', 'baksh', 'bal', 'bald', 'baldwin', 'ballet', 'ban', 'band', 'bang', 'bank', 'bant', 'bar', 'barb', 'barbar', 'barbr', 'bargain', 'bark', 'barn', 'barney', 'baron', 'barrel', 'barry', 'barrym', 'bas', 'basebal', 'bash', 'basket', 'basketbal', 'bastard', 'bat', 'bath', 'bathroom', 'batm', 'battl', 'battlefield', 'bau', 'bay', 'bbc', 'be', 'beach', 'bean', 'bear', 'beard', 'beast', 'beat', 'beatl', 'beatty', 'beauty', 'beav', 'becam', 'beckham', 'beckins', 'becom', 'bed', 'bedroom', 'beer', 'beetl', 'befriend', 'beg', 'begin', 'begun', 'behav', 'behavio', 'behavy', 'behind', 'behold', 'being', 'bel', 'belg', 'believ', 'belong', 'belov', 'belt', 'belush', 'ben', 'bend', 'benea', 'benefit', 'bennet', 'bent', 'beowulf', 'bergm', 'berkeley', 'berlin', 'bernard', 'besid', 'best', 'bet', 'betray', 'better', 'betty', 'bev', 'bew', 'bewild', 'beyond', 'bias', 'bibl', 'big', 'biggest', 'bik', 'bikin', 'biko', 'bil', 'bimbo', 'bin', 'bind', 'bing', 'biograph', 'biop', 'bir', 'bird', 'birthday', 'bit', 'bitch', 'bittersweet', 'bizar', 'bla', 'black', 'blackmail', 'blad', 'blah', 'blair', 'blak', 'blam', 'bland', 'blank', 'blast', 'blat', 'blaz', 'bleak', 'blee', 'blend', 'bless', 'blew', 'blind', 'blink', 'bliss', 'blob', 'block', 'blockbust', 'blond', 'blood', 'bloody', 'bloom', 'blossom', 'blow', 'blown', 'blu', 'blunt', 'blur', 'bo', 'board', 'boast', 'boat', 'bob', 'bobby', 'body', 'bog', 'bogart', 'boggl', 'boil', 'bol', 'bold', 'bollywood', 'bomb', 'bon', 'bond', 'bonny', 'boo', 'boob', 'boog', 'book', 'boom', 'boost', 'boot', 'bor', 'bord', 'boredom', 'born', 'borrow', 'boss', 'boston', 'both', 'bottl', 'bottom', 'bought', 'bound', 'bount', 'bounty', 'bourn', 'bout', 'bow', 'bowl', 'box', 'boy', 'boyfriend', 'boyl', 'brad', 'brady', 'brain', 'brainless', 'branagh', 'branch', 'brand', 'brando', 'brat', 'brav', 'braveheart', 'bravo', 'brazil', 'brea', 'bread', 'break', 'breakdown', 'breakfast', 'breast', 'breath', 'breathtak', 'bree', 'brend', 'brent', 'bret', 'bri', 'brick', 'brid', 'bridg', 'bridget', 'brief', 'bright', 'bril', 'bring', 'brit', 'britain', 'bro', 'broad', 'broadcast', 'broadway', 'brok', 'bronson', 'bront', 'brood', 'brook', 'brooklyn', 'brosn', 'broth', 'brought', 'brow', 'brown', 'bruc', 'bruno', 'brush', 'brut', 'bry', 'bsg', 'btw', 'bubbl', 'buck', 'bucket', 'bud', 'buddy', 'budget', 'buff', 'buffalo', 'bug', 'build', 'built', 'bul', 'bulk', 'bullet', 'bum', 'bumbl', 'bump', 'bunch', 'bunny', 'bur', 'burd', 'burk', 'burn', 'burst', 'burt', 'burton', 'bury', 'bus', 'busey', 'bush', 'businessm', 'bust', 'busy', 'but', 'butch', 'butl', 'button', 'buy', 'buzz', 'bye', 'cab', 'cabin', 'cabl', 'caf', 'cag', 'cagney', 'cain', 'cak', 'cal', 'calc', 'calib', 'californ', 'calm', 'cam', 'cambod', 'camcord', 'cameo', 'camer', 'camera', 'cameram', 'cameron', 'camp', 'campaign', 'campbel', 'campy', 'can', 'canad', 'cancel', 'candid', 'candl', 'candy', 'cannib', 'cannon', 'cannot', 'cant', 'canyon', 'cap', 'capac', 'capit', 'capot', 'capt', 'captain', 'car', 'card', 'cardboard', 'carel', 'cares', 'caretak', 'carey', 'carl', 'carlito', 'carlo', 'carm', 'carn', 'carol', 'carolin', 'caron', 'carp', 'carradin', 'carrey', 'carry', 'cart', 'cartoon', 'cary', 'cas', 'casablanc', 'cash', 'casino', 'casp', 'cassavet', 'cassidy', 'cast', 'castl', 'cat', 'catalog', 'catastroph', 'catch', 'catchy', 'categ', 'catherin', 'cathol', 'cattl', 'caught', 'caus', 'caut', 'cav', 'cbs', 'cd', 'ceas', 'cecil', 'ceil', 'cel', 'celebr', 'celest', 'celluloid', 'cemetery', 'cens', 'cent', 'century', 'cerebr', 'ceremony', 'certain', 'cg', 'cgi', 'chain', 'chainsaw', 'chair', 'challeng', 'chamb', 'chamberlain', 'champ', 'chan', 'chang', 'channel', 'chant', 'chao', 'chaplin', 'chapt', 'char', 'charact', 'charg', 'charism', 'charl', 'charlot', 'charlton', 'charm', 'chas', 'chat', 'chavez', 'che', 'cheadl', 'cheap', 'cheaply', 'check', 'cheek', 'chees', 'cheesy', 'chem', 'cher', 'chess', 'chest', 'chew', 'chib', 'chicago', 'chick', 'chief', 'chil', 'child', 'childr', 'chin', 'chines', 'chip', 'cho', 'chocol', 'choir', 'chok', 'chong', 'choos', 'chop', 'choppy', 'chor', 'choreograph', 'chos', 'chris', 'christ', 'christian', 'christianity', 'christians', 'christie', 'christina', 'christine', 'christmas', 'christopher', 'christy', 'chronicles', 'chuck', 'chuckl', 'church', 'churn', 'cia', 'cigaret', 'cinderell', 'cindy', 'cinem', 'cinema', 'cinematograph', 'circ', 'circumst', 'cit', 'city', 'civil', 'cla', 'clad', 'claim', 'clair', 'clan', 'clar', 'clark', 'clash', 'class', 'classm', 'classy', 'claud', 'claustrophob', 'claw', 'clay', 'cle', 'clear', 'clerk', 'clev', 'cli', 'clich', 'click', 'cliff', 'cliffhang', 'clim', 'climact', 'climax', 'climb', 'clin', 'clint', 'clip', 'cliv', 'cloak', 'clock', 'clon', 'clooney', 'clos', 'closest', 'closet', 'closeup', 'cloth', 'cloud', 'clown', 'clu', 'club', 'clueless', 'clumsy', 'clunky', 'clut', 'co', 'coach', 'coast', 'coat', 'cod', 'cody', 'coff', 'coffin', 'coh', 'coher', 'coincid', 'cok', 'col', 'cold', 'colin', 'coll', 'collab', 'collaps', 'colleagu', 'collect', 'colleg', 'collet', 'collin', 'colm', 'colo', 'colon', 'colonel', 'colony', 'columb', 'columbo', 'com', 'comb', 'combin', 'comeback', 'comedy', 'comfort', 'command', 'commend', 'commerc', 'commit', 'common', 'commun', 'comp', 'company', 'comparison', 'compass', 'compel', 'compens', 'compet', 'competit', 'compl', 'complain', 'complaint', 'complet', 'complex', 'comply', 'compos', 'composit', 'compound', 'compr', 'comprehend', 'comprom', 'compuls', 'comput', 'con', 'conceit', 'conceiv', 'concern', 'concert', 'conclud', 'concoct', 'cond', 'condemn', 'condit', 'conduc', 'conf', 'confess', 'confid', 'confin', 'confirm', 'conflict', 'confront', 'confus', 'congrat', 'connect', 'connery', 'conqu', 'conrad', 'conscy', 'consequ', 'conserv', 'consid', 'consist', 'conspir', 'const', 'constitut', 'construct', 'consum', 'cont', 'contact', 'contain', 'contemp', 'contempl', 'contempt', 'contend', 'contest', 'context', 'contin', 'continu', 'contract', 'contradict', 'contrast', 'contribut', 'control', 'controvers', 'controversy', 'conv', 'conveny', 'convers', 'convert', 'convey', 'convict', 'convint', 'convolv', 'cook', 'cooky', 'cool', 'coop', 'cop', 'cor', 'corbet', 'corey', 'corm', 'corn', 'corny', 'corp', 'corps', 'correct', 'corrid', 'corrupt', 'cost', 'costum', 'couch', 'could', 'counsel', 'count', 'counterpart', 'countless', 'country', 'countrysid', 'county', 'coup', 'coupl', 'cour', 'cours', 'court', 'courtroom', 'cousin', 'cov', 'cow', 'coward', 'cowboy', 'cox', 'crack', 'craft', 'craig', 'cram', 'crap', 'crappy', 'crash', 'crav', 'crawford', 'crawl', 'craz', 'crazy', 'cre', 'cream', 'creasy', 'cred', 'credit', 'creek', 'creep', 'creepy', 'crew', 'cri', 'crim', 'crimin', 'cring', 'crippl', 'cris', 'crisp', 'crit', 'crocodil', 'crook', 'crop', 'crosby', 'cross', 'crow', 'crowd', 'crown', 'cru', 'cruc', 'crud', 'cruel', 'crush', 'cry', 'crypt', 'cryst', 'cub', 'cue', 'culmin', 'cult', 'cum', 'cunningham', 'cup', 'cur', 'curios', 'curs', 'curt', 'curtain', 'cury', 'cusack', 'cush', 'custom', 'cut', 'cyborg', 'cyc', 'cyn', 'cyph', 'da', 'dad', 'daddy', 'dahl', 'dahm', 'dai', 'daisy', 'dal', 'dalton', 'dam', 'damn', 'damon', 'dan', 'dandy', 'dang', 'daniel', 'danny', 'dant', 'dar', 'dark', 'darl', 'darn', 'dash', 'dat', 'daught', 'dav', 'david', 'davy', 'dawn', 'dawson', 'day', 'daylight', 'dazzl', 'de', 'dea', 'dead', 'deaf', 'deal', 'dealt', 'dean', 'deann', 'dear', 'death', 'deb', 'debby', 'debr', 'debt', 'debut', 'dec', 'decad', 'decapit', 'deceas', 'deceiv', 'decid', 'deck', 'decl', 'declin', 'ded', 'dee', 'deem', 'deep', 'deeply', 'deer', 'def', 'defend', 'defens', 'defin', 'definit', 'defy', 'deg', 'degr', 'degrad', 'del', 'delay', 'delet', 'delib', 'delicy', 'delight', 'deliry', 'delivery', 'delud', 'delv', 'dem', 'demand', 'demil', 'democr', 'demon', 'demonst', 'den', 'deniro', 'denou', 'dent', 'deny', 'denzel', 'dep', 'depart', 'depend', 'depict', 'deprav', 'depress', 'depth', 'deputy', 'der', 'derang', 'derek', 'des', 'desc', 'descend', 'describ', 'desert', 'deserv', 'design', 'desir', 'desp', 'despair', 'despit', 'destin', 'destiny', 'destroy', 'destruct', 'det', 'detach', 'detail', 'detect', 'determin', 'detery', 'detract', 'detroit', 'dev', 'devast', 'develop', 'devil', 'devo', 'devoid', 'devot', 'devy', 'dialog', 'diamond', 'dian', 'diary', 'dick', 'dict', 'did', 'die', 'died', 'diff', 'difficul', 'difficult', 'dig', 'digest', 'digit', 'dign', 'dil', 'dilemm', 'dim', 'dimend', 'dimin', 'din', 'dinosa', 'dir', 'direct', 'dirt', 'dirty', 'dis', 'disagr', 'disappear', 'disappoint', 'disast', 'disbeliev', 'disc', 'discern', 'disciplin', 'disco', 'discov', 'discovery', 'discuss', 'diseas', 'disgrac', 'disgu', 'disgust', 'dish', 'disjoint', 'dislik', 'dism', 'dismiss', 'disney', 'disord', 'dispatch', 'display', 'dispos', 'disregard', 'disrespect', 'dissolv', 'dist', 'distinct', 'distort', 'distract', 'distress', 'distribut', 'district', 'disturb', 'div', 'divers', 'divert', 'divid', 'divin', 'divorc', 'dixon', 'dj', 'do', 'doc', 'doct', 'docu', 'dodg', 'dog', 'dogm', 'dol', 'doll', 'dolph', 'dom', 'domest', 'domin', 'domino', 'don', 'donald', 'donn', 'dont', 'doo', 'doom', 'door', 'dor', 'dorothy', 'dos', 'dot', 'doubl', 'doubt', 'dougla', 'down', 'downey', 'downhil', 'download', 'downright', 'doyl', 'doz', 'dr', 'drab', 'dracul', 'draft', 'drag', 'dragon', 'drain', 'drak', 'dram', 'drama', 'draw', 'drawn', 'dre', 'dread', 'dream', 'dreamy', 'dreck', 'dress', 'drew', 'drift', 'dril', 'drink', 'drip', 'driv', 'drivel', 'dron', 'drop', 'drov', 'drown', 'drug', 'drum', 'drunk', 'dry', 'du', 'dub', 'duby', 'duck', 'dud', 'dudley', 'due', 'duel', 'duh', 'duk', 'dul', 'dumb', 'dumbest', 'dump', 'dun', 'duo', 'dur', 'dust', 'dustin', 'dutch', 'duty', 'duval', 'dvd', 'dvds', 'dwarf', 'dwel', 'dying', 'dyl', 'dynam', 'dysfunct', 'eag', 'eagl', 'ear', 'earl', 'earn', 'earnest', 'eas', 'east', 'eastern', 'eastwood', 'easy', 'eat', 'ebert', 'ecc', 'echo', 'econom', 'ed', 'eddy', 'edg', 'edgy', 'edi', 'edison', 'edit', 'educ', 'edward', 'edy', 'eery', 'effect', 'efficy', 'effort', 'effortless', 'eg', 'ego', 'egypt', 'eight', 'eighty', 'einstein', 'eith', 'el', 'elab', 'eld', 'elect', 'electron', 'eleg', 'eleph', 'elev', 'elimin', 'elit', 'elizabe', 'elliot', 'elm', 'els', 'else', 'elsewh', 'elud', 'elv', 'elvir', 'em', 'embark', 'embarrass', 'embody', 'embrac', 'emerg', 'emil', 'emm', 'emot', 'emp', 'empath', 'empathy', 'emphas', 'empir', 'employ', 'empty', 'emy', 'en', 'ench', 'enco', 'encount', 'end', 'endear', 'ending', 'endless', 'enemy', 'energet', 'energy', 'enforc', 'eng', 'engin', 'engl', 'england', 'engross', 'enh', 'enigm', 'enjoy', 'enl', 'enlight', 'enorm', 'enough', 'ens', 'ensembl', 'ensu', 'ent', 'enterpr', 'entertain', 'enthral', 'enthusiasm', 'enthusiast', 'entir', 'entitl', 'entry', 'environ', 'envy', 'ep', 'episod', 'epitom', 'eq', 'equ', 'equip', 'er', 'eras', 'erik', 'erot', 'errol', 'escap', 'esp', 'espec', 'esquir', 'ess', 'est', 'esth', 'estrang', 'et', 'etc', 'etern', 'eth', 'ethn', 'eug', 'europ', 'ev', 'evalu', 'evelyn', 'ever', 'every', 'everybody', 'everyday', 'everyon', 'everyth', 'everywh', 'evid', 'evil', 'evok', 'evolv', 'ex', 'exact', 'exag', 'examin', 'exampl', 'exceiv', 'excel', 'excess', 'exchang', 'excit', 'exclud', 'excrucy', 'excus', 'execut', 'exempl', 'exerc', 'exhaust', 'exhibit', 'exit', 'exorc', 'exot', 'expand', 'expect', 'expedit', 'expend', 'expens', 'expert', 'expery', 'expl', 'explain', 'explicit', 'explod', 'exploit', 'expos', 'exposit', 'express', 'exquisit', 'ext', 'extend', 'extery', 'extinct', 'extr', 'extra', 'extraordin', 'extrem', 'ey', 'eyebrow', 'eyr', 'fab', 'fabl', 'fabr', 'fac', 'facil', 'fact', 'fad', 'fai', 'fail', 'faint', 'fair', 'fairbank', 'fairy', 'faith', 'fak', 'fal', 'falk', 'fallon', 'fals', 'fam', 'famili', 'famy', 'fan', 'fant', 'fantast', 'fantasy', 'far', 'farc', 'farm', 'farrel', 'fart', 'fasc', 'fascin', 'fash', 'fassbind', 'fast', 'fat', 'fath', 'fault', 'fav', 'favo', 'favorit', 'favourit', 'fay', 'fbi', 'fear', 'feast', 'feat', 'fed', 'fee', 'feebl', 'feel', 'feet', 'feinston', 'fel', 'felix', 'fellin', 'fellow', 'felt', 'fem', 'femin', 'feminin', 'fent', 'fer', 'ferrel', 'fest', 'fet', 'fetch', 'fev', 'fi', 'fiant', 'fict', 'fido', 'field', 'fiend', 'fierc', 'fif', 'fifteen', 'fifty', 'fig', 'fight', 'fil', 'film', 'filmmak', 'filt', 'filthy', 'fin', 'find', 'finest', 'fing', 'finney', 'fir', 'firm', 'first', 'fish', 'fishburn', 'fist', 'fit', 'fiv', 'fix', 'flag', 'flair', 'flam', 'flash', 'flashback', 'flashy', 'flat', 'flav', 'flaw', 'flawless', 'fle', 'fleet', 'flem', 'flesh', 'fli', 'flick', 'flight', 'flimsy', 'flip', 'flirt', 'flo', 'flock', 'flood', 'flop', 'flor', 'florid', 'flow', 'fluff', 'fluid', 'fly', 'flyn', 'foc', 'focus', 'fog', 'foil', 'folk', 'follow', 'fond', 'fontain', 'food', 'fool', 'foot', 'footbal', 'for', 'forbid', 'forc', 'ford', 'foreign', 'foremost', 'forest', 'forev', 'forg', 'forget', 'forgot', 'form', 'formul', 'formula', 'forrest', 'fort', 'fortun', 'forty', 'forward', 'fost', 'fought', 'foul', 'found', 'four', 'fox', 'foxx', 'frag', 'fragil', 'frail', 'fram', 'franch', 'francisco', 'franco', 'frank', 'frankenstein', 'franklin', 'franky', 'frant', 'fraud', 'fre', 'freak', 'freaky', 'fred', 'freddy', 'freedom', 'freem', 'freez', 'french', 'frenzy', 'frequ', 'fresh', 'fri', 'friday', 'friend', 'fright', 'frog', 'from', 'front', 'fronty', 'frost', 'froz', 'fruit', 'frust', 'fry', 'fu', 'fuel', 'ful', 'fulc', 'fulfil', 'fun', 'funct', 'fund', 'funda', 'funniest', 'funny', 'furnit', 'furtherm', 'fury', 'fut', 'fuzzy', 'fx', 'gabl', 'gabriel', 'gadget', 'gag', 'gain', 'gal', 'galactic', 'galaxy', 'gallery', 'gam', 'gambl', 'gamer', 'gandh', 'gang', 'gangst', 'gap', 'gar', 'garb', 'garbo', 'gard', 'garland', 'garn', 'gary', 'gas', 'gasp', 'gat', 'gath', 'gav', 'gay', 'gaz', 'gear', 'geek', 'gem', 'gen', 'gend', 'genet', 'geni', 'genius', 'genr', 'gentl', 'gentlem', 'genuin', 'geny', 'georg', 'ger', 'gerard', 'germ', 'germany', 'gershwin', 'gest', 'get', 'ghetto', 'ghost', 'giallo', 'giant', 'gibson', 'gielgud', 'gift', 'gig', 'giggl', 'gil', 'gilbert', 'gilliam', 'gimmick', 'gin', 'ging', 'giovann', 'girl', 'girlfriend', 'giv', 'glad', 'glady', 'glam', 'glant', 'glar', 'glass', 'gle', 'glen', 'glimps', 'glob', 'gloom', 'glor', 'glory', 'glov', 'glow', 'glu', 'go', 'goal', 'goat', 'god', 'godard', 'godfath', 'godzill', 'goe', 'goer', 'going', 'gold', 'goldberg', 'goldbl', 'goldsworthy', 'goldy', 'gon', 'gonn', 'good', 'goodby', 'goodm', 'goody', 'goof', 'goofy', 'gor', 'gordon', 'gorg', 'gory', 'gosh', 'got', 'goth', 'gott', 'govern', 'govind', 'grab', 'grac', 'grad', 'gradu', 'graham', 'grainy', 'gram', 'grand', 'grandfath', 'grandm', 'grandmoth', 'grandp', 'grant', 'graph', 'grasp', 'grass', 'grat', 'gratuit', 'grav', 'graveyard', 'gray', 'grayson', 'gre', 'great', 'greatest', 'gree', 'greedy', 'greek', 'green', 'greet', 'greg', 'grew', 'grey', 'grief', 'griev', 'griffi', 'grim', 'grin', 'grinch', 'grind', 'grip', 'gritty', 'gro', 'gross', 'grotesqu', 'ground', 'group', 'grow', 'grown', 'grudg', 'gruesom', 'guar', 'guarantee', 'guard', 'guess', 'guest', 'guid', 'guil', 'guilt', 'guin', 'guine', 'guit', 'gum', 'gun', 'gundam', 'gunfight', 'gung', 'gut', 'guy', 'gwyne', 'gypo', 'gypsy', 'ha', 'habit', 'hack', 'hackm', 'hackney', 'hadley', 'hag', 'hail', 'hain', 'hair', 'hal', 'half', 'halfway', 'hallmark', 'halloween', 'hallucin', 'ham', 'hamilton', 'hamlet', 'hammy', 'han', 'hand', 'handicap', 'handl', 'handsom', 'hang', 'hank', 'hannah', 'hap', 'hapless', 'happy', 'har', 'harass', 'harb', 'hard', 'hardc', 'hardy', 'hark', 'harlow', 'harm', 'harmless', 'harold', 'harp', 'harriet', 'harrison', 'harrow', 'harry', 'harsh', 'hart', 'hartley', 'harvey', 'hat', 'hatch', 'haunt', 'havoc', 'hawk', 'hawn', 'hay', 'haywor', 'hbo', 'hea', 'head', 'headach', 'heal', 'healthy', 'heap', 'hear', 'heard', 'heart', 'heartbreak', 'heartfelt', 'heartwarm', 'heat', 'heav', 'heavy', 'heck', 'hect', 'heel', 'height', 'heist', 'hel', 'held', 'helicopt', 'hello', 'helm', 'helmet', 'help', 'helpless', 'henchm', 'henry', 'hent', 'hepburn', 'her', 'herbert', 'herd', 'here', 'herm', 'hero', 'heroin', 'hesit', 'heston', 'hey', 'hi', 'hick', 'hid', 'high', 'highest', 'highlight', 'highway', 'hil', 'him', 'hind', 'hint', 'hip', 'hippy', 'hir', 'hist', 'hit', 'hitch', 'hitchcock', 'hitl', 'hk', 'hmmm', 'ho', 'hoffm', 'hog', 'hokey', 'hol', 'hold', 'holiday', 'hollow', 'hollywood', 'holm', 'holocaust', 'holy', 'hom', 'homeless', 'homicid', 'homosex', 'hon', 'honest', 'honesty', 'hong', 'hono', 'hood', 'hook', 'hoop', 'hoot', 'hop', 'hopeless', 'hopkin', 'hor', 'horn', 'horny', 'horr', 'horrend', 'horrid', 'hors', 'hospit', 'host', 'hostel', 'hostil', 'hot', 'hotel', 'hound', 'hour', 'hous', 'household', 'housew', 'how', 'howard', 'howev', 'howl', 'http', 'hudson', 'hug', 'hugh', 'huh', 'hulk', 'hum', 'humbl', 'humo', 'humy', 'hundr', 'hung', 'hungry', 'hunt', 'hurry', 'hurt', 'husband', 'hustl', 'huston', 'hybrid', 'hyd', 'hyp', 'hypnot', 'hyst', 'ian', 'ic', 'icon', 'id', 'ide', 'idea', 'ident', 'ideolog', 'idiot', 'idol', 'ie', 'if', 'ign', 'ii', 'il', 'illeg', 'illog', 'illud', 'illust', 'im', 'imagery', 'imagin', 'imdb', 'imit', 'immedy', 'immens', 'immers', 'immigr', 'immort', 'imo', 'imp', 'impact', 'impecc', 'imperson', 'impl', 'implaus', 'imply', 'import', 'impos', 'imposs', 'impress', 'imprison', 'improb', 'improv', 'impuls', 'in', 'inacc', 'inadvert', 'inappropry', 'incap', 'incarn', 'incest', 'inch', 'incid', 'inclin', 'includ', 'incoh', 'incompet', 'incomprehens', 'inconsist', 'incorp', 'incorrect', 'increas', 'incred', 'ind', 'indee', 'independ', 'indian', 'indiff', 'individ', 'induc', 'indulg', 'indust', 'industry', 'indy', 'inept', 'inevit', 'inexpery', 'inexpl', 'inf', 'infam', 'infect', 'infery', 'infinit', 'inflict', 'influ', 'info', 'inform', 'ing', 'ingeny', 'ingredy', 'ingrid', 'inh', 'inhabit', 'inherit', 'init', 'inject', 'injury', 'injust', 'inm', 'innoc', 'innov', 'innuendo', 'ins', 'insec', 'insect', 'insert', 'insid', 'insight', 'insign', 'insipid', 'insist', 'insomn', 'inspect', 'inspir', 'inst', 'instal', 'instead', 'instinct', 'institut', 'instru', 'instruct', 'insult', 'int', 'intact', 'integr', 'intellect', 'intellig', 'intend', 'intens', 'interact', 'interest', 'interf', 'intern', 'internet', 'interpret', 'interrupt', 'intertwin', 'interv', 'interview', 'intery', 'intim', 'intol', 'intrigu', 'intro', 'introduc', 'intrud', 'inv', 'invad', 'invas', 'invest', 'investig', 'invis', 'invit', 'involv', 'iq', 'ir', 'iraq', 'ireland', 'iron', 'irony', 'irrelev', 'irrit', 'is', 'isabel', 'ish', 'islam', 'island', 'isol', 'israel', 'issu', 'it', 'ita', 'item', 'iturb', 'iv', 'jack', 'jacket', 'jackson', 'jacky', 'jacob', 'jacqu', 'jad', 'jaff', 'jag', 'jail', 'jak', 'jam', 'jamy', 'jan', 'jap', 'japanes', 'jar', 'jason', 'jaw', 'jay', 'jazz', 'jeal', 'jealousy', 'jean', 'jed', 'jeff', 'jeffrey', 'jen', 'jenn', 'jenny', 'jeremy', 'jerk', 'jerry', 'jersey', 'jes', 'jess', 'jessic', 'jet', 'jew', 'jewel', 'jil', 'jim', 'jimmy', 'joan', 'job', 'jock', 'jody', 'joe', 'joel', 'joey', 'johansson', 'john', 'johnny', 'johnson', 'join', 'joint', 'jok', 'joly', 'jon', 'jonath', 'jord', 'jos', 'joseph', 'josh', 'journ', 'journey', 'jov', 'joy', 'jr', 'juan', 'jud', 'judg', 'judy', 'juic', 'jul', 'juliet', 'july', 'jump', 'jun', 'jungl', 'junk', 'juny', 'jury', 'just', 'justin', 'juvenil', 'kan', 'kansa', 'kapo', 'kar', 'karl', 'karloff', 'kat', 'kathleen', 'kathryn', 'kathy', 'katy', 'kay', 'kaz', 'keaton', 'keen', 'keep', 'kei', 'kel', 'ken', 'kenne', 'kennedy', 'kent', 'kept', 'kevin', 'key', 'khan', 'kick', 'kid', 'kiddy', 'kidm', 'kidnap', 'kil', 'kim', 'kind', 'king', 'kingdom', 'kinnear', 'kirk', 'kiss', 'kit', 'kitch', 'kitty', 'klin', 'kne', 'knew', 'knif', 'knight', 'knightley', 'knock', 'know', 'knowledg', 'known', 'kolchak', 'kong', 'kor', 'kore', 'kri', 'kubrick', 'kudo', 'kum', 'kung', 'kurosaw', 'kurt', 'kyl', 'la', 'lab', 'label', 'lac', 'lack', 'lacklust', 'lad', 'lady', 'laid', 'lak', 'lam', 'lamb', 'lampoon', 'lan', 'land', 'landmark', 'landscap', 'lang', 'langu', 'lant', 'laput', 'lar', 'larg', 'larry', 'las', 'last', 'lat', 'latest', 'latin', 'latino', 'laugh', 'laught', 'launch', 'laur', 'laurel', 'laury', 'lav', 'law', 'lawr', 'lawy', 'lay', 'lazy', 'le', 'lead', 'leagu', 'lean', 'leap', 'learn', 'least', 'leath', 'leav', 'lect', 'led', 'lee', 'left', 'leg', 'legend', 'legitim', 'leigh', 'lemmon', 'len', 'lend', 'leng', 'lengthy', 'lennon', 'leo', 'leon', 'leonard', 'les', 'lesb', 'less', 'lesson', 'lest', 'let', 'leth', 'lev', 'level', 'lew', 'lex', 'li', 'liam', 'lib', 'liberty', 'libr', 'licens', 'lie', 'lif', 'life', 'lifeless', 'lifestyl', 'lifetim', 'lift', 'light', 'lightn', 'lik', 'likew', 'lil', 'lily', 'limb', 'limit', 'lin', 'lincoln', 'lind', 'lindsay', 'linear', 'ling', 'link', 'lion', 'lionel', 'lip', 'lis', 'list', 'lit', 'littl', 'liu', 'liv', 'liz', 'lizard', 'lloyd', 'load', 'loath', 'loc', 'lock', 'log', 'loi', 'lol', 'lon', 'london', 'long', 'longest', 'longor', 'look', 'loom', 'loop', 'loos', 'lor', 'lord', 'lorett', 'los', 'loss', 'lost', 'lot', 'lou', 'loud', 'lousy', 'lov', 'love', 'low', 'lowest', 'loy', 'loyal', 'luc', 'luca', 'lucil', 'luck', 'lucky', 'lucy', 'ludicr', 'lugos', 'lui', 'luk', 'luka', 'lumet', 'lun', 'lunch', 'lundgr', 'lung', 'lur', 'lurk', 'lush', 'lust', 'luth', 'luxury', 'lying', 'lynch', 'lyr', 'mabel', 'mac', 'macabr', 'macarth', 'macdonald', 'machin', 'macho', 'macy', 'mad', 'made', 'madm', 'madonn', 'mads', 'mae', 'maf', 'mag', 'magazin', 'maggy', 'magn', 'maid', 'mail', 'main', 'mainstream', 'maintain', 'maj', 'mak', 'makeup', 'mal', 'malon', 'mam', 'man', 'mand', 'mandy', 'mang', 'manhat', 'maniac', 'manifest', 'manip', 'mankind', 'manufact', 'many', 'map', 'mar', 'marc', 'march', 'margaret', 'margin', 'marilyn', 'marin', 'mario', 'mark', 'market', 'marl', 'marlon', 'marqu', 'marry', 'marsh', 'marshal', 'mart', 'marth', 'martin', 'marty', 'marvel', 'marx', 'mary', 'mask', 'masoch', 'mason', 'mass', 'massacr', 'mast', 'masterpiec', 'masterson', 'masturb', 'mat', 'match', 'mathieu', 'matrix', 'matthau', 'matthew', 'maureen', 'max', 'maxim', 'may', 'mayb', 'mayhem', 'mccarthy', 'mccoy', 'mclaglen', 'mcqueen', 'me', 'meadow', 'meal', 'mean', 'meand', 'meaningless', 'meant', 'meantim', 'meanwhil', 'meas', 'meat', 'mech', 'med', 'medicin', 'mediev', 'mediocr', 'medit', 'meek', 'meet', 'meg', 'mel', 'meliss', 'melodram', 'melody', 'melt', 'melvyn', 'mem', 'memb', 'men', 'menac', 'ment', 'mer', 'merc', 'merciless', 'mercy', 'merit', 'mermaid', 'merry', 'meryl', 'mesm', 'mess', 'messy', 'met', 'metaph', 'method', 'mex', 'mexico', 'mey', 'mgm', 'miam', 'mic', 'mich', 'michael', 'michel', 'mick', 'mickey', 'mid', 'middl', 'midget', 'midl', 'midnight', 'midst', 'might', 'mighty', 'miik', 'mik', 'mil', 'mild', 'mildr', 'milit', 'milk', 'millionair', 'milo', 'mim', 'min', 'mind', 'mindless', 'minim', 'minisery', 'minnell', 'minut', 'mir', 'mirac', 'mirand', 'mis', 'miscast', 'misery', 'misfit', 'misfortun', 'misguid', 'mislead', 'miss', 'missil', 'mist', 'mistak', 'mistress', 'misunderstand', 'misunderstood', 'mitch', 'mitchel', 'mix', 'mixt', 'miyazak', 'mm', 'mob', 'mobl', 'mobst', 'mock', 'mod', 'model', 'modern', 'modest', 'modesty', 'moe', 'mol', 'molest', 'mom', 'moment', 'mon', 'money', 'monit', 'monk', 'monkey', 'monolog', 'monoton', 'monst', 'mont', 'montan', 'month', 'monty', 'monu', 'mood', 'moody', 'moon', 'moor', 'mor', 'morbid', 'more', 'moreov', 'morg', 'mormon', 'morn', 'moron', 'mort', 'moss', 'most', 'mot', 'moth', 'motorcyc', 'mou', 'mount', 'mountain', 'mourn', 'mous', 'mouth', 'mov', 'movie', 'movies', 'movy', 'mr', 'mrs', 'ms', 'mst', 'mtv', 'much', 'muddl', 'mug', 'mult', 'multipl', 'mum', 'mummy', 'mund', 'muppet', 'murd', 'murky', 'murph', 'murray', 'mus', 'musc', 'muse', 'muslim', 'must', 'mut', 'mutil', 'myer', 'myrtl', 'myst', 'mystery', 'myth', 'mytholog', 'nad', 'nail', 'naiv', 'nak', 'nam', 'nant', 'nar', 'narrow', 'naschy', 'nasty', 'nat', 'nata', 'natal', 'nath', 'naughty', 'naus', 'navy', 'naz', 'nbc', 'nd', 'near', 'nearby', 'neat', 'necess', 'neck', 'ned', 'nee', 'needless', 'neg', 'neglect', 'neighb', 'neighbo', 'neil', 'neith', 'nelson', 'nemes', 'neo', 'nephew', 'nerd', 'nerv', 'net', 'netflix', 'network', 'neurot', 'neut', 'nev', 'nevertheless', 'new', 'newcom', 'newm', 'newspap', 'next', 'nic', 'nichola', 'nicholson', 'nick', 'nicol', 'nicola', 'niec', 'night', 'nightclub', 'nightm', 'nin', 'ninj', 'niro', 'niv', 'no', 'nobl', 'nobody', 'nod', 'noir', 'nois', 'nol', 'nolt', 'nomin', 'non', 'nonetheless', 'nonsens', 'nop', 'nor', 'norm', 'northam', 'northern', 'nos', 'nostalg', 'not', 'notch', 'noteworthy', 'noth', 'novak', 'novel', 'now', 'nowaday', 'nowh', 'nuant', 'nuclear', 'nud', 'num', 'numb', 'nun', 'nurs', 'nut', 'ny', 'nyc', 'object', 'oblig', 'obnoxy', 'obsc', 'observ', 'obsess', 'obstac', 'obtain', 'obvy', 'oc', 'occ', 'occas', 'occult', 'occup', 'occupy', 'occur', 'octob', 'od', 'odyssey', 'off', 'offb', 'offend', 'oft', 'oh', 'oil', 'ok', 'okay', 'ol', 'old', 'oldest', 'oliv', 'olivy', 'olymp', 'om', 'omin', 'omit', 'on', 'one', 'onlin', 'onto', 'op', 'oper', 'opin', 'oppon', 'opportun', 'oppos', 'opposit', 'oppress', 'opt', 'optim', 'or', 'orang', 'orchest', 'ord', 'ordin', 'org', 'orgy', 'origin', 'orl', 'orph', 'orson', 'ory', 'osc', 'oth', 'othello', 'otherw', 'otto', 'ought', 'out', 'outcom', 'outd', 'outdo', 'outfit', 'outland', 'outlaw', 'outlin', 'outright', 'outsid', 'outstand', 'ov', 'over', 'overact', 'overal', 'overblown', 'overboard', 'overcom', 'overdon', 'overlong', 'overlook', 'overshadow', 'overt', 'overwhelm', 'ow', 'owl', 'own', 'oz', 'pac', 'pacino', 'pack', 'pad', 'pag', 'paid', 'pain', 'paint', 'pair', 'pal', 'palac', 'palestin', 'palm', 'paltrow', 'pamel', 'pan', 'pant', 'pap', 'par', 'parad', 'parallel', 'paramount', 'parano', 'paranoid', 'park', 'parody', 'parrot', 'parson', 'part', 'particip', 'particul', 'partn', 'party', 'pass', 'passeng', 'past', 'pat', 'patch', 'path', 'pathet', 'patho', 'patric', 'patrick', 'patriot', 'patron', 'pattern', 'paty', 'pau', 'paul', 'paus', 'paxton', 'pay', 'paycheck', 'payoff', 'pc', 'peac', 'peak', 'pearl', 'peck', 'peculi', 'pedest', 'pee', 'peer', 'peg', 'pen', 'penelop', 'penguin', 'penny', 'peopl', 'people', 'pep', 'per', 'perc', 'perceiv', 'perfect', 'perform', 'perhap', 'peril', 'period', 'perm', 'permit', 'perpet', 'perry', 'person', 'perspect', 'persuad', 'pervers', 'pervert', 'pet', 'petty', 'pfeiff', 'pg', 'phantasm', 'phantom', 'phas', 'phenom', 'phenomenon', 'phil', 'philip', 'phillip', 'philosoph', 'phoenix', 'phon', 'phony', 'photo', 'photograph', 'phrase', 'phys', 'piano', 'pick', 'pickford', 'pict', 'pie', 'piec', 'pier', 'pierc', 'pig', 'pil', 'pilot', 'pin', 'pink', 'pion', 'pip', 'pir', 'pistol', 'pit', 'pitch', 'pity', 'pivot', 'pix', 'plac', 'place', 'plagu', 'plain', 'plan', 'planet', 'plant', 'plast', 'plat', 'platform', 'plaus', 'play', 'playboy', 'playwright', 'pleas', 'please', 'plenty', 'plight', 'plod', 'plot', 'plu', 'plug', 'plum', 'pocket', 'poe', 'poem', 'poet', 'poetry', 'poign', 'point', 'pointless', 'poison', 'pok', 'pokemon', 'pol', 'polansk', 'policem', 'policy', 'polit', 'pond', 'pool', 'poor', 'pop', 'popcorn', 'popul', 'porn', 'porno', 'pornograph', 'port', 'portrait', 'portray', 'pos', 'posey', 'posit', 'poss', 'possess', 'post', 'pot', 'pound', 'pour', 'poverty', 'pow', 'powel', 'pra', 'pract', 'prank', 'pray', 'pre', 'preach', 'preachy', 'prec', 'precy', 'pred', 'predecess', 'predict', 'pref', 'prefer', 'pregn', 'prejud', 'prem', 'premy', 'prep', 'prepost', 'prequel', 'pres', 'preserv', 'presid', 'press', 'preston', 'presum', 'pretend', 'pretenty', 'pretty', 'prev', 'prevail', 'preview', 'prevy', 'prey', 'pri', 'pric', 'priceless', 'prid', 'priest', 'prim', 'primit', 'princess', 'princip', 'principl', 'print', 'prison', 'priv', 'privileg', 'priz', 'pro', 'prob', 'problem', 'proc', 'process', 'proclaim', 'produc', 'prof', 'profess', 'profil', 'profit', 'profound', 'program', 'progress', 'project', 'prolog', 'prom', 'promin', 'promot', 'prompt', 'pronount', 'proof', 'prop', 'propagand', 'property', 'prophecy', 'prophet', 'proport', 'propos', 'prosecut', 'prospect', 'prostitut', 'protagon', 'protect', 'protest', 'proud', 'prov', 'provid', 'provoc', 'provok', 'ps', 'pseudo', 'psych', 'psycho', 'psycholog', 'psychopa', 'psychot', 'psychy', 'pub', 'publ', 'puerto', 'pul', 'pulp', 'pumba', 'pump', 'pun', 'punch', 'punk', 'puppet', 'puppy', 'pur', 'purchas', 'purpl', 'purpos', 'pursu', 'pursuit', 'push', 'put', 'puzzl', 'python', 'quaid', 'qual', 'quant', 'quart', 'quas', 'queen', 'quentin', 'quest', 'quick', 'quiet', 'quin', 'quintess', 'quirky', 'quit', 'quot', 'rabbit', 'rac', 'rachel', 'rack', 'rad', 'radio', 'rady', 'rag', 'raid', 'rail', 'rain', 'rainy', 'rais', 'raj', 'ralph', 'ram', 'rambl', 'rambo', 'ramon', 'ramp', 'ran', 'ranch', 'randolph', 'random', 'randy', 'rang', 'rank', 'rant', 'rao', 'rap', 'rapid', 'rapt', 'rar', 'rat', 'rath', 'ratso', 'rav', 'raw', 'ray', 'raymond', 'raz', 'rd', 'rea', 'reach', 'react', 'read', 'ready', 'real', 'really', 'realm', 'rear', 'reason', 'rebel', 'rec', 'recal', 'receiv', 'recit', 'reckless', 'recogn', 'recognit', 'recommend', 'record', 'recov', 'recr', 'recruit', 'recyc', 'red', 'redeem', 'redempt', 'redneck', 'reduc', 'redund', 'ree', 'reel', 'reev', 'ref', 'refer', 'reflect', 'refresh', 'refug', 'refus', 'reg', 'regain', 'regard', 'regardless', 'regim', 'regret', 'regul', 'rehash', 'rehears', 'reid', 'reign', 'reincarn', 'reinforc', 'reis', 'reject', 'rel', 'relax', 'releas', 'relentless', 'relev', 'reliev', 'relig', 'religy', 'reluct', 'rely', 'remad', 'remain', 'remak', 'remark', 'rememb', 'remind', 'reminisc', 'remot', 'remov', 'ren', 'renaiss', 'rend', 'rendit', 'rent', 'rep', 'repetit', 'replac', 'replay', 'reply', 'report', 'repr', 'repres', 'repress', 'republ', 'repuls', 'reput', 'request', 'requir', 'rerun', 'res', 'rescu', 'research', 'resembl', 'reserv', 'resid', 'resist', 'resolv', 'reson', 'resort', 'resourc', 'respect', 'respond', 'respons', 'rest', 'resta', 'restrain', 'restraint', 'restrict', 'result', 'resum', 'resurrect', 'ret', 'retain', 'retard', 'retir', 'retriev', 'retrospect', 'return', 'reun', 'reunit', 'rev', 'revel', 'reveng', 'revers', 'review', 'revisit', 'revolt', 'revolv', 'reward', 'rewrit', 'rex', 'reynold', 'rhym', 'rhythm', 'ric', 'rich', 'richard', 'richardson', 'rick', 'ricky', 'rid', 'riddl', 'ridic', 'riff', 'rifl', 'rig', 'right', 'ring', 'riot', 'rip', 'ripoff', 'ris', 'risk', 'rit', 'ritchy', 'riv', 'rivet', 'road', 'roam', 'roar', 'rob', 'robbery', 'robbin', 'robby', 'robert', 'robertson', 'robin', 'robinson', 'robot', 'rochest', 'rock', 'rocket', 'rocky', 'rod', 'rog', 'rohm', 'rol', 'rom', 'romeo', 'romero', 'romp', 'ron', 'ronald', 'ronny', 'roof', 'rooky', 'room', 'rooney', 'root', 'rop', 'ros', 'rosario', 'rosem', 'ross', 'rot', 'roth', 'rough', 'round', 'rous', 'rout', 'routin', 'row', 'rowland', 'roy', 'rub', 'ruby', 'rud', 'rug', 'ruin', 'rukh', 'rul', 'rum', 'run', 'runaway', 'rur', 'rush', 'russ', 'russel', 'ruth', 'ruthless', 'ryan', 'sabot', 'sabrin', 'sack', 'sacr', 'sad', 'saddl', 'saf', 'sag', 'said', 'sail', 'saint', 'sak', 'sal', 'salesm', 'salm', 'saloon', 'salt', 'salv', 'sam', 'samanth', 'sammo', 'samura', 'san', 'sand', 'sandl', 'sandr', 'sang', 'sant', 'sap', 'sappy', 'sar', 'sarah', 'sarandon', 'sarcasm', 'sarcast', 'sassy', 'sat', 'satir', 'satisfact', 'satisfy', 'saturday', 'sav', 'saw', 'say', 'scal', 'scan', 'scand', 'scar', 'scarc', 'scarecrow', 'scarfac', 'scariest', 'scarlet', 'scary', 'scat', 'scen', 'scenario', 'scenery', 'schedule', 'scheme', 'schlock', 'schneider', 'school', 'schools', 'sci', 'scif', 'scooby', 'scoop', 'scop', 'scor', 'scorses', 'scot', 'scotland', 'scratch', 'scream', 'screaming', 'screams', 'screen', 'screening', 'screenplay', 'screens', 'screenwriter', 'screenwriters', 'screw', 'screwball', 'screwed', 'script', 'scripted', 'scripting', 'scripts', 'scrooge', 'se', 'sea', 'seag', 'seal', 'sean', 'search', 'season', 'seat', 'sebast', 'sec', 'second', 'secret', 'sect', 'seduc', 'see', 'seedy', 'seek', 'seem', 'seen', 'seg', 'sel', 'seldom', 'select', 'self', 'sem', 'sen', 'send', 'sens', 'senseless', 'sensit', 'sent', 'sentinel', 'senty', 'seny', 'sep', 'septemb', 'sequ', 'sequel', 'ser', 'serb', 'serg', 'serv', 'sery', 'sess', 'set', 'settl', 'setup', 'sev', 'seventy', 'sew', 'sex', 'sexy', 'seymo', 'sf', 'sg', 'sgt', 'sh', 'shad', 'shadow', 'shaggy', 'shah', 'shahid', 'shak', 'shakespear', 'shaky', 'shal', 'shallow', 'sham', 'shameless', 'shangha', 'shap', 'shar', 'shark', 'sharon', 'sharp', 'shat', 'shav', 'shaw', 'she', 'shed', 'sheen', 'sheet', 'shel', 'shelf', 'shelley', 'shelt', 'shepard', 'shepherd', 'sheriff', 'shield', 'shift', 'shin', 'ship', 'shirley', 'shirt', 'sho', 'shock', 'shoddy', 'shoot', 'shootout', 'shop', 'shor', 'short', 'shortcom', 'shot', 'shotgun', 'should', 'shout', 'shov', 'show', 'showcas', 'showdown', 'shown', 'shut', 'shy', 'sibl', 'sick', 'sid', 'sidekick', 'sidewalk', 'sidney', 'sigh', 'sight', 'sign', 'sil', 'silv', 'sim', 'simil', 'simmon', 'simon', 'simpl', 'simply', 'simpson', 'simult', 'sin', 'sinatr', 'sing', 'singl', 'sink', 'sint', 'sir', 'sirk', 'sissy', 'sist', 'sit', 'sitcom', 'situ', 'six', 'sixteen', 'sixty', 'siz', 'skat', 'skept', 'sketch', 'ski', 'skil', 'skin', 'skinny', 'skip', 'skit', 'skul', 'sky', 'slack', 'slam', 'slap', 'slapstick', 'slash', 'slat', 'slaught', 'slav', 'slay', 'sleaz', 'sleazy', 'sleep', 'sleepwalk', 'slic', 'slick', 'slid', 'slight', 'slightest', 'slim', 'slimy', 'slip', 'slo', 'sloppy', 'slow', 'slug', 'slut', 'sly', 'smack', 'smal', 'smart', 'smash', 'smel', 'smi', 'smil', 'smok', 'smoo', 'smug', 'smuggl', 'snak', 'snap', 'snatch', 'sneak', 'snip', 'snl', 'snob', 'snow', 'snowm', 'snuff', 'so', 'soap', 'sob', 'soc', 'socc', 'socy', 'soderbergh', 'soft', 'sol', 'sold', 'soldy', 'solid', 'solo', 'solv', 'somebody', 'someday', 'somehow', 'someon', 'someth', 'sometim', 'somewh', 'son', 'sondr', 'song', 'sonny', 'soon', 'soph', 'soprano', 'sor', 'sorrow', 'sorry', 'sort', 'sou', 'sought', 'soul', 'sound', 'soundtrack', 'soup', 'sour', 'sourc', 'southern', 'soviet', 'sox', 'soyl', 'spac', 'spacey', 'spad', 'spaghett', 'spain', 'span', 'spar', 'spark', 'sparkl', 'spawn', 'speak', 'spear', 'spec', 'spect', 'spectac', 'spectacul', 'specy', 'spee', 'speech', 'spel', 'spend', 'spent', 'spi', 'spic', 'spid', 'spielberg', 'spik', 'spil', 'spin', 'spir', 'spirit', 'spit', 'splatter', 'splendid', 'split', 'spock', 'spoil', 'spok', 'spont', 'spoof', 'spooky', 'spoon', 'sport', 'spot', 'spotlight', 'spout', 'spread', 'spree', 'spring', 'springer', 'spy', 'squ', 'squad', 'squeez', 'st', 'stab', 'stabl', 'stack', 'stad', 'staff', 'stag', 'stair', 'stak', 'stal', 'stalk', 'stallon', 'stamp', 'stan', 'stand', 'standard', 'standout', 'stanley', 'stant', 'stanwyck', 'star', 'stardom', 'stardust', 'starg', 'stark', 'start', 'startl', 'starv', 'stat', 'statu', 'stay', 'ste', 'steady', 'steam', 'steel', 'stell', 'step', 'steph', 'stephany', 'stereotyp', 'sterl', 'stern', 'stev', 'stewart', 'stick', 'stiff', 'stil', 'stilt', 'stim', 'stink', 'stir', 'stock', 'stol', 'stomach', 'ston', 'stood', 'stoog', 'stop', 'stor', 'storm', 'story', 'storylin', 'storytel', 'straight', 'straightforward', 'stranded', 'strange', 'strangely', 'stranger', 'strangers', 'stream', 'streep', 'street', 'streets', 'streisand', 'strength', 'strengths', 'stress', 'stretch', 'stretched', 'strict', 'strictly', 'strike', 'strikes', 'striking', 'string', 'strings', 'strip', 'stroke', 'strong', 'stronger', 'strongest', 'strongly', 'struck', 'structure', 'struggle', 'struggles', 'struggling', 'stuart', 'stuck', 'stud', 'studio', 'study', 'stuff', 'stumbl', 'stun', 'stunt', 'stupid', 'styl', 'sub', 'subject', 'sublim', 'submarin', 'submit', 'subplot', 'subsequ', 'subst', 'substitut', 'subt', 'subtext', 'subtitl', 'subtl', 'suburb', 'subvert', 'subway', 'success', 'suck', 'sud', 'sue', 'suff', 'sufficy', 'sug', 'suggest', 'suicid', 'suit', 'sul', 'sum', 'summ', 'sun', 'sund', 'sunday', 'sung', 'sunk', 'sunny', 'sunr', 'sunset', 'sunshin', 'sup', 'superb', 'superbl', 'superf', 'superhero', 'superm', 'supern', 'superst', 'supery', 'supply', 'support', 'suppos', 'suppress', 'suprem', 'sur', 'surf', 'surfac', 'surgery', 'surpass', 'surpr', 'surrend', 'surround', 'surv', 'sus', 'susp', 'suspect', 'suspend', 'suspens', 'suspicy', 'sustain', 'sutherland', 'swallow', 'sway', 'swe', 'swear', 'swed', 'sweep', 'sweet', 'swept', 'swift', 'swim', 'swing', 'switch', 'sword', 'sydney', 'symbol', 'sympath', 'sympathet', 'sympathy', 'syndrom', 'synops', 'system', 'tabl', 'taboo', 'tack', 'tackl', 'tacky', 'tact', 'tad', 'tag', 'tail', 'tak', 'tal', 'talk', 'talky', 'tam', 'tang', 'tank', 'tap', 'tar', 'tarantino', 'target', 'tarz', 'task', 'tast', 'tasteless', 'tat', 'tattoo', 'taught', 'tax', 'tayl', 'tcm', 'tea', 'teach', 'team', 'tear', 'teas', 'tech', 'techn', 'technicol', 'technolog', 'ted', 'tedy', 'tee', 'teen', 'tel', 'televid', 'temp', 'templ', 'tempt', 'ten', 'tend', 'tens', 'tent', 'ter', 'term', 'termin', 'terr', 'territ', 'terry', 'test', 'testa', 'texa', 'text', 'th', 'thank', 'that', 'the', 'thelm', 'them', 'therapy', 'there', 'theref', 'thick', 'thief', 'thiev', 'thin', 'thing', 'think', 'thinking', 'third', 'thirty', 'this', 'tho', 'thoma', 'thompson', 'thorn', 'thorough', 'though', 'thought', 'thousand', 'thread', 'threat', 'threatened', 'threatening', 'threatens', 'three', 'threw', 'thrill', 'thrilled', 'thriller', 'thrillers', 'thrilling', 'thrills', 'throat', 'throughout', 'throw', 'throwing', 'thrown', 'throws', 'thru', 'thu', 'thug', 'thumb', 'thund', 'thunderbird', 'thurm', 'tick', 'ticket', 'tid', 'tie', 'tied', 'tierney', 'tig', 'tight', 'til', 'tim', 'timberlak', 'time', 'timeless', 'timmy', 'timon', 'timothy', 'tin', 'tiny', 'tip', 'tir', 'tiresom', 'tit', 'titl', 'toby', 'tod', 'today', 'toe', 'togeth', 'toilet', 'tok', 'tokyo', 'tol', 'told', 'tom', 'tomato', 'tomb', 'tome', 'tommy', 'tomorrow', 'ton', 'tongu', 'tonight', 'tony', 'too', 'took', 'tool', 'top', 'topless', 'tor', 'torch', 'torn', 'toronto', 'tort', 'toss', 'tot', 'touch', 'tough', 'tour', 'tow', 'toward', 'town', 'toy', 'trac', 'track', 'tracy', 'trad', 'trademark', 'tradit', 'traff', 'trag', 'tragedy', 'trail', 'train', 'trait', 'tramp', 'transcend', 'transf', 'transform', 'transit', 'transl', 'transmit', 'transp', 'transpl', 'transport', 'trap', 'trash', 'trashy', 'traum', 'trav', 'travel', 'travesty', 'tre', 'treas', 'trek', 'tremend', 'trend', 'tri', 'triangl', 'trib', 'tribut', 'trick', 'trig', 'trilog', 'trio', 'trip', 'tripl', 'trit', 'triumph', 'triv', 'trom', 'troop', 'troubl', 'tru', 'truck', 'trum', 'trust', 'truth', 'try', 'tub', 'tuck', 'tun', 'tunnel', 'turd', 'turk', 'turkey', 'turmoil', 'turn', 'turtl', 'tv', 'twelv', 'twenty', 'twic', 'twilight', 'twin', 'twist', 'two', 'tyl', 'typ', 'ug', 'ugh', 'uh', 'uk', 'ultim', 'ultimat', 'ultr', 'um', 'un', 'unansw', 'unattract', 'unaw', 'unbear', 'unbeliev', 'unc', 'uncanny', 'uncomfort', 'unconscy', 'unconv', 'unconvint', 'uncov', 'uncut', 'und', 'undead', 'undeny', 'under', 'underground', 'undermin', 'undernea', 'underst', 'understand', 'understood', 'undertak', 'underw', 'underwear', 'underworld', 'undoubt', 'uneasy', 'unev', 'unexpect', 'unexplain', 'unfair', 'unfold', 'unforg', 'unforget', 'unfortun', 'unfunny', 'unhappy', 'uniform', 'unimagin', 'uninspir', 'unint', 'uninterest', 'unit', 'univers', 'unknown', 'unleash', 'unless', 'unlik', 'unnecess', 'unorigin', 'unpleas', 'unpredict', 'unr', 'unravel', 'unrel', 'uns', 'unsatisfy', 'unseen', 'unsettl', 'unst', 'unsuspect', 'unsympathet', 'unus', 'unw', 'unwatch', 'unwil', 'up', 'upcom', 'upd', 'uplift', 'upon', 'upset', 'upsid', 'urb', 'urg', 'us', 'useless', 'ustinov', 'ut', 'util', 'uw', 'vac', 'vacu', 'vad', 'vagu', 'vain', 'val', 'valentin', 'valid', 'valley', 'valu', 'vampir', 'van', 'vaness', 'vanill', 'vant', 'vary', 'vast', 'vault', 'veg', 'vega', 'vehic', 'vein', 'ven', 'venezuel', 'veng', 'venom', 'vent', 'ver', 'verb', 'verdict', 'verg', 'verhoev', 'vers', 'versatil', 'vert', 'vet', 'vhs', 'via', 'vib', 'vibr', 'vic', 'vict', 'victim', 'victor', 'vicy', 'vid', 'video', 'vietnam', 'view', 'viewpoint', 'vigil', 'vignet', 'vil', 'villain', 'vint', 'viol', 'vir', 'virgin', 'virt', 'virtu', 'vis', 'viscont', 'visit', 'vit', 'viv', 'vivid', 'voc', 'voic', 'void', 'voight', 'volum', 'volunt', 'vomit', 'von', 'vonnegut', 'vot', 'voy', 'vs', 'vulg', 'vuln', 'wacky', 'wag', 'wagn', 'wait', 'waitress', 'wak', 'wal', 'walk', 'wallac', 'walsh', 'walt', 'wan', 'wand', 'wang', 'wann', 'wannab', 'want', 'war', 'ward', 'wardrob', 'warhol', 'warm', 'warn', 'warp', 'warry', 'was', 'wash', 'washington', 'wast', 'wat', 'watch', 'watson', 'wav', 'wax', 'way', 'wayn', 'weak', 'weakest', 'weal', 'wealthy', 'weapon', 'wear', 'weary', 'weath', 'weav', 'web', 'websit', 'wed', 'wee', 'week', 'weekend', 'weight', 'weird', 'wel', 'welcom', 'well', 'wendigo', 'wendy', 'went', 'werewolf', 'werewolv', 'wes', 'west', 'western', 'wet', 'whack', 'whal', 'what', 'whatev', 'whatsoev', 'wheel', 'wheelchair', 'whenev', 'wherea', 'wheth', 'whilst', 'whin', 'whiny', 'whip', 'whistl', 'whit', 'who', 'whoev', 'whol', 'wholesom', 'whoop', 'whor', 'whos', 'why', 'wick', 'wid', 'widescreen', 'widmark', 'widow', 'wield', 'wif', 'wig', 'wil', 'wild', 'william', 'wilson', 'win', 'winchest', 'wind', 'window', 'wing', 'wint', 'wip', 'wir', 'wis', 'wisdom', 'wish', 'wit', 'witch', 'witchcraft', 'within', 'without', 'witty', 'wiv', 'wizard', 'woe', 'wolf', 'wom', 'wond', 'wonderland', 'wong', 'wont', 'woo', 'wood', 'woody', 'wor', 'word', 'work', 'world', 'worm', 'worn', 'worry', 'wors', 'worst', 'worthless', 'worthwhil', 'worthy', 'would', 'wound', 'wow', 'wrap', 'wreck', 'wrench', 'wrestl', 'wretch', 'wright', 'writ', 'wrong', 'wrot', 'wtf', 'ww', 'wwe', 'wwi', 'www', 'ya', 'yank', 'yard', 'yawn', 'ye', 'yeah', 'year', 'yearn', 'years', 'yel', 'yellow', 'yep', 'yesterday', 'yet', 'yoka', 'york', 'you', 'young', 'youngest', 'youngst', 'youth', 'youtub', 'zan', 'zen', 'zero', 'zizek', 'zomb', 'zomby', 'zon', 'zoom', 'zorro']預(yù)測結(jié)果
使用隨機(jī)森林分類器進(jìn)行分類
from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators = 100) forest = forest.fit(train_data_features, train["sentiment"])輸出提交結(jié)果
test = pd.read_csv("./data/testData.tsv", header=0, delimiter="\t",quoting=3 ) num_reviews = len(test["review"]) clean_test_reviews = [] for i in range(num_reviews):clean_review = review_to_words( test["review"][i] )clean_test_reviews.append( clean_review ) test_data_features = vectorizer.transform(clean_test_reviews) test_data_features = test_data_features.toarray() result = forest.predict(test_data_features) output = pd.DataFrame( data={"id":test["id"], "sentiment":result} ) output.to_csv( "Bag_of_Words_model.csv", index=False, quoting=3 )嘗試使用xgb
from xgboost import XGBClassifier from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(train_data_features, train["sentiment"], test_size=0.2) model = XGBClassifier() eval_set = [(X_test, y_test)] model.fit(X_train, y_train, early_stopping_rounds=10, eval_metric="logloss", eval_set=eval_set, verbose=True) result = model.predict(test_data_features) output = pd.DataFrame( data={"id":test["id"], "sentiment":result} ) output.to_csv( "xgbBag_of_Words_model.csv", index=False, quoting=3 ) [0] validation_0-logloss:0.676219 Will train until validation_0-logloss hasn't improved in 10 rounds. [1] validation_0-logloss:0.662174 [2] validation_0-logloss:0.651057 [3] validation_0-logloss:0.640874 [4] validation_0-logloss:0.632644 [5] validation_0-logloss:0.625076 [6] validation_0-logloss:0.617619 [7] validation_0-logloss:0.611682 [8] validation_0-logloss:0.605249 [9] validation_0-logloss:0.599587 [10] validation_0-logloss:0.594965 [11] validation_0-logloss:0.589799 [12] validation_0-logloss:0.585117 [13] validation_0-logloss:0.580564 [14] validation_0-logloss:0.576377 [15] validation_0-logloss:0.572584 [16] validation_0-logloss:0.568511 [17] validation_0-logloss:0.565177 [18] validation_0-logloss:0.561793 [19] validation_0-logloss:0.558281 [20] validation_0-logloss:0.55503 [21] validation_0-logloss:0.552451 [22] validation_0-logloss:0.549323 [23] validation_0-logloss:0.546664 [24] validation_0-logloss:0.544006 [25] validation_0-logloss:0.54108 [26] validation_0-logloss:0.538433 [27] validation_0-logloss:0.535872 [28] validation_0-logloss:0.533465 [29] validation_0-logloss:0.5312 [30] validation_0-logloss:0.528723 [31] validation_0-logloss:0.526622 [32] validation_0-logloss:0.524268 [33] validation_0-logloss:0.522295 [34] validation_0-logloss:0.519956 [35] validation_0-logloss:0.518042 [36] validation_0-logloss:0.515848 [37] validation_0-logloss:0.514131 [38] validation_0-logloss:0.512278 [39] validation_0-logloss:0.510431 [40] validation_0-logloss:0.508723 [41] validation_0-logloss:0.506938 [42] validation_0-logloss:0.505074 [43] validation_0-logloss:0.50362 [44] validation_0-logloss:0.501969 [45] validation_0-logloss:0.500489 [46] validation_0-logloss:0.499067 [47] validation_0-logloss:0.497414 [48] validation_0-logloss:0.496192 [49] validation_0-logloss:0.494645 [50] validation_0-logloss:0.493216 [51] validation_0-logloss:0.49187 [52] validation_0-logloss:0.490369 [53] validation_0-logloss:0.489028 [54] validation_0-logloss:0.487349 [55] validation_0-logloss:0.486212 [56] validation_0-logloss:0.485081 [57] validation_0-logloss:0.483909 [58] validation_0-logloss:0.482761 [59] validation_0-logloss:0.481767 [60] validation_0-logloss:0.480625 [61] validation_0-logloss:0.479329 [62] validation_0-logloss:0.478402 [63] validation_0-logloss:0.477328 [64] validation_0-logloss:0.476377 [65] validation_0-logloss:0.475029 [66] validation_0-logloss:0.473751 [67] validation_0-logloss:0.472692 [68] validation_0-logloss:0.471596 [69] validation_0-logloss:0.470421 [70] validation_0-logloss:0.469413 [71] validation_0-logloss:0.468299 [72] validation_0-logloss:0.467431 [73] validation_0-logloss:0.466318 [74] validation_0-logloss:0.465558 [75] validation_0-logloss:0.464642 [76] validation_0-logloss:0.463728 [77] validation_0-logloss:0.462841 [78] validation_0-logloss:0.46207 [79] validation_0-logloss:0.461132 [80] validation_0-logloss:0.460134 [81] validation_0-logloss:0.45898 [82] validation_0-logloss:0.458173 [83] validation_0-logloss:0.457472 [84] validation_0-logloss:0.456591 [85] validation_0-logloss:0.456256 [86] validation_0-logloss:0.455629 [87] validation_0-logloss:0.454958 [88] validation_0-logloss:0.454081 [89] validation_0-logloss:0.453485 [90] validation_0-logloss:0.452779 [91] validation_0-logloss:0.452121 [92] validation_0-logloss:0.45126 [93] validation_0-logloss:0.450549 [94] validation_0-logloss:0.450048 [95] validation_0-logloss:0.44925 [96] validation_0-logloss:0.448478 [97] validation_0-logloss:0.447839 [98] validation_0-logloss:0.447183 [99] validation_0-logloss:0.446421還是隨機(jī)森林好用
總結(jié)
以上是生活随笔為你收集整理的自然语言处理之词袋模型Bag_of_words的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 三白话经典算法系列 Shell排序实现
- 下一篇: 解决VS2005 远程工具无法通过同步软