个人和结对项目 - 英语单词词频统计
個人或結對編程項目 英語單詞詞頻統計程序
?(最新版本在這里)
實現一個命令行程序,支持幾種模式下的單詞詞頻統計
Implement a console application to tally the frequency of words under a directory.
For all text files (file extension: "txt") under a directory (recursively), calculate the frequency of each word, and output the result into a text file. ?
2 options to write the program:
??? a) Write the code in C++ or C#, using .Net Framework, the running environment is 32-bit Win7 or Win10. with VS studio 2012-2015 profiling tool
??? b) Write the code in Java, using latest JDK, run it on Win10, or Linux platform?with appropriate Java profiling tool
Run performance analysis tool on your code, find performance bottlenecks and improve.
Enable Code Quality Analysis for your code and get rid of all warnings.
Code Quality Analysis:? http://msdn.microsoft.com/en-us/library/dd264897.aspx?
Write 10 simple test cases to make sure your program can handle these cases correctly (e.g.? a good test case could be: one of the sub-directories is empty).
Submission:
Submit your source code and exe to TA, TA will run it on his testing environment and check for
? ? ? ?- correctness ??(incorrect program will get 0 points)
? ? ? ?- performance
? ? ? ?- write a blog (see blog requirement below)
Definition:
A word: a string starting with?one English alphabet letters, then followed by optional alphanumerical characters. ?Words are separated by delimiters. If a string contains non-alphanumerical characters, it’s not a word. Word is case insensitive, i.e. “file”, “FILE” and “File” are considered the same word.
“file123” is a word, and “123file” is NOT a word.
-?Alphabetic letters:? A-Z, a-z.
- Alphanumerical characters: A-Z, a-z, 0-9.
-?Delimiter: space, non-alphanumerical letters.
-?Output text file: filename is <your email name>.txt
-?Each line has this format
<word>: number
Where <word> is the string, showing in all upper-case format.??E.g. if only “File” and “file” appear in the test file,? the program should?show “FILE”.
Where “number” is the number of times this word appears in the file(s)??The output should be sorted with most frequent word first.? If 2 words have the same frequency, list the words by dictionary order.
Requirements:
1)????? Simple mode.?? Output simple word frequency.
Myapp.exe <directory-name>
Will output <your-name>.txt file in current directory, the text file contains word ranking list.
2)????? Extended mode.?
在執行 Myapp.exe?-e2?<directory-name>時,找出最頻繁出現的連續兩個詞(列出前10名)。例如,在一本英文小說中,“good?morning”?出現次數最多。
在執行 Myapp.exe?-e3?<directory-name>時,找出最頻繁出現的連續三個詞(列出前10名)。例如“how are you"。
這里連續的詞是指由單個空格分隔的詞。
The app will output <your-name>.txt file in current directory, the text file contains word ranking list.
3)????? support -v, verb tally mode.?
Myapp.exe?-v?<directory-name>時,找出最頻繁出現的動詞,包括這個動詞出現的各種變態。
在學習英語的過程中我們學習過很多動詞,知道動詞有原型,和各種時態/語態下的變形。 例如:
原型: do
變形:does, did, done, doing.
那么,一篇英語文章里, 有多少個動詞”do” 和它的各種變形呢? 最頻繁的動詞前10名是什么呢?這就是我們這個練習的目的。
任務分解:
1)自己構造(或者助教提供)一個動詞及其變形的文本文件。 每一行開頭是原型,后面跟著各種變形,例如:
???? do does did done doing
???? get gets got gotten getting
???? ...
2) 處理
Myapp.exe?-v?<directory-name>
??? 對于每一個符合條件的文件處理, 然后統計出所有文件中最頻繁動詞的前10名。
Blog Requirement:
You can publish this to BOTH your own blog, and your team blog (to help your team blog get some traffic)
1)????? Before you implement this project, Record your estimate about the time you WILL spend in each component of your program.
2)????? After you had implemented this project, record the ACTUAL time you spent in each component of your program.
3)????? Describe how much time you spent on improving the performance of your program, and show a performance analysis graph (generated by VS2012 perf analysis tool), if possible, please show the most costly function in your program.
4)????? Share your 10 test cases, and how did you make sure your program can produce the correct result. (programs with incorrect result will get 0 points,? regardless of speed)
5)????? Describe what you had learned in this exercise.
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的个人和结对项目 - 英语单词词频统计的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: usb连接不上 艾德克斯电源_第十二届(
- 下一篇: oracle 创建模式语句,ORACLE