當前位置：首頁 > 编程语言 > java >内容正文

java

Day 20: 斯坦福CoreNLP —— 用Java给Twitter进行情感分析

發布時間：2025/3/21 java 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 Day 20: 斯坦福CoreNLP —— 用Java给Twitter进行情感分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

今天學習如何使用斯坦福CoreNLP Java API來進行情感分析(sentiment analysis)。前幾天，我還寫了一篇關于如何使用TextBlob API在Python里做情感分析，我已經開發了一個應用程序，會篩選出給定關鍵詞的推文(tweets)的情感，現在看看它能做什么。

應用

該演示應用程序在OpenShift http://sentiments-t20.rhcloud.com/ 運行，它有兩個功能：

第一個功能是，如果你給定Twitter搜索條件的列表會，它會顯示最近20推關于給定的搜索詞的情緒。必須要勾選下圖所示的復選框來啟用此功能，（情感）積極的推文將顯示綠色，而消極的推文是紅色的。

第二個功能是做一些文字上的情感分析，如下圖

什么是斯坦福CoreNLP？

斯坦福CoreNLP是一個Java自然語言分析庫，它集成了所有的自然語言處理工具，包括詞性的終端（POS）標注器，命名實體識別（NER），分析器，對指代消解系統，以及情感分析工具，并提供英語分析的模型文件。

準備

基本的Java知識是必需的，安裝最新的Java開發工具包（JDK ），可以是OpenJDK 7或Oracle JDK 7。

從官方網站下載斯坦福CoreNLP包。

注冊一個OpenShift帳戶，它是完全免費的，可以分配給每個用戶1.5 GB的內存和3 GB的磁盤空間。

安裝RHC客戶端工具，需要有ruby 1.8.7或更新的版本，如果已經有ruby gem，輸入 sudo gem install rhc ，確保它是最新版本。要更新RHC的話，執行命令sudo gem update rhc，如需其他協助安裝RHC命令行工具，請參閱該頁面： https://www.openshift.com/developers/rhc-client-tools-install

通過 rhc setup 命令設置您的OpenShift帳戶，此命令將幫助你創建一個命名空間，并上傳你的SSH keys到OpenShift服務器。

Github倉庫

今天的演示應用程序的代碼可以在GitHub找到：day20-stanford-sentiment-analysis-demo

在兩分鐘內啟動并運行SentimentsApp

開始創建應用程序，名稱為sentimentsapp。

$ rhc create-app sentimentsapp jbosseap --from-code=https://github.com/shekhargulati/day20-stanford-sentiment-analysis-demo.git

還可以使用如下指令：

$ rhc create-app sentimentsapp jbosseap -g medium --from-code=https://github.com/shekhargulati/day20-stanford-sentiment-analysis-demo.git

這將為應用程序創建一個容器，設置所有需要的SELinux政策和cgroup的配置，OpenShift也將創建一個私人git倉庫并克隆到本地。然后，它會復制版本庫到本地系統。最后，OpenShift會給外界提供一個DNS，該應用程序將在http://newsapp-{domain-name}.rhcloud.com/ 下可以訪問（將 domain-name 更換為自己的域名）。

該應用程序還需要對應Twitter應用程序的4個環境變量，通過去https://dev.twitter.com/apps/new 創建一個新的Twitter應用程序，然后創建如下所示的4個環境變量。

$ rhc env set TWITTER_OAUTH_ACCESS_TOKEN=<please enter value> -a sentimentsapp$ rhc env set TWITTER_OAUTH_ACCESS_TOKEN_SECRET=<please enter value> -a sentimentsapp$rhc env set TWITTER_OAUTH_CONSUMER_KEY=<please enter value> -a sentimentsapp$rhc env set TWITTER_OAUTH_CONSUMER_SECRET=<please enter value> -a sentimentsapp

重新啟動應用程序，以確保服務器可以讀取環境變量。

$ rhc restart-app --app sentimentsapp

開始在pom.xml中為stanford-corenlp和twitter4j增加Maven的依賴關系，使用3.3.0版本斯坦福corenlp作為情感分析的API。

<dependency><groupId>edu.stanford.nlp</groupId><artifactId>stanford-corenlp</artifactId><version>3.3.0</version> </dependency><dependency><groupId>org.twitter4j</groupId><artifactId>twitter4j-core</artifactId><version>[3.0,)</version> </dependency>

該twitter4j依賴關系需要Twitter搜索。

通過更新 pom.xml 文件里的幾個特性將Maven項目更新到Java 7：

<maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target>

現在就可以更新Maven項目了（右鍵單擊>Maven>更新項目）。

啟用CDI

使用CDI來進行依賴注入。CDI、上下文和依賴注入是一個Java EE 6規范，能夠使依賴注入在Java EE 6的項目中。

在 src/main/webapp/WEB-INF 文件夾下建一個名為beans.xml中一個新的XML文件，啟動CDI

搜索Twitter的關鍵字

創建了一個新的類TwitterSearch，它使用Twitter4J API來搜索Twitter關鍵字。該API需要的Twitter應用程序配置參數，使用的環境變量得到這個值，而不是硬編碼。

import java.util.Collections; import java.util.List;import twitter4j.Query; import twitter4j.QueryResult; import twitter4j.Status; import twitter4j.Twitter; import twitter4j.TwitterException; import twitter4j.TwitterFactory; import twitter4j.conf.ConfigurationBuilder;public class TwitterSearch {public List<Status> search(String keyword) {ConfigurationBuilder cb = new ConfigurationBuilder();cb.setDebugEnabled(true).setOAuthConsumerKey(System.getenv("TWITTER_OAUTH_CONSUMER_KEY")).setOAuthConsumerSecret(System.getenv("TWITTER_OAUTH_CONSUMER_SECRET")).setOAuthAccessToken(System.getenv("TWITTER_OAUTH_ACCESS_TOKEN")).setOAuthAccessTokenSecret(System.getenv("TWITTER_OAUTH_ACCESS_TOKEN_SECRET"));TwitterFactory tf = new TwitterFactory(cb.build());Twitter twitter = tf.getInstance();Query query = new Query(keyword + " -filter:retweets -filter:links -filter:replies -filter:images");query.setCount(20);query.setLocale("en");query.setLang("en");;try {QueryResult queryResult = twitter.search(query);return queryResult.getTweets();} catch (TwitterException e) {// ignoree.printStackTrace();}return Collections.emptyList();}}

在上面的代碼中，篩選了Twitter的搜索結果，以確保沒有轉推(retweet)、或帶鏈接的推文、或有圖片的推文，這樣做的原因是為了確保我們得到的是有文字的推。

情感分析器(SentimentAnalyzer)

創建了一個叫SentimentAnalyzer的類，這個類就是對某一條推文進行情感分析的。

public class SentimentAnalyzer {public TweetWithSentiment findSentiment(String line) {Properties props = new Properties();props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");StanfordCoreNLP pipeline = new StanfordCoreNLP(props);int mainSentiment = 0;if (line != null && line.length() > 0) {int longest = 0;Annotation annotation = pipeline.process(line);for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);int sentiment = RNNCoreAnnotations.getPredictedClass(tree);String partText = sentence.toString();if (partText.length() > longest) {mainSentiment = sentiment;longest = partText.length();}}}if (mainSentiment == 2 || mainSentiment > 4 || mainSentiment < 0) {return null;}TweetWithSentiment tweetWithSentiment = new TweetWithSentiment(line, toCss(mainSentiment));return tweetWithSentiment;} }

復制 englishPCFG.ser.gz 和 sentiment.ser.gz 模型到src/main/resources/edu/stanford/nlp/models/lexparser 和src/main/resources/edu/stanford/nlp/models/sentiment 文件夾下。

創建SentimentsResource

最后，創建了JAX-RS資源類。

public class SentimentsResource {@Injectprivate SentimentAnalyzer sentimentAnalyzer;@Injectprivate TwitterSearch twitterSearch;@GET@Produces(value = MediaType.APPLICATION_JSON)public List<Result> sentiments(@QueryParam("searchKeywords") String searchKeywords) {List<Result> results = new ArrayList<>();if (searchKeywords == null || searchKeywords.length() == 0) {return results;}Set<String> keywords = new HashSet<>();for (String keyword : searchKeywords.split(",")) {keywords.add(keyword.trim().toLowerCase());}if (keywords.size() > 3) {keywords = new HashSet<>(new ArrayList<>(keywords).subList(0, 3));}for (String keyword : keywords) {List<Status> statuses = twitterSearch.search(keyword);System.out.println("Found statuses ... " + statuses.size());List<TweetWithSentiment> sentiments = new ArrayList<>();for (Status status : statuses) {TweetWithSentiment tweetWithSentiment = sentimentAnalyzer.findSentiment(status.getText());if (tweetWithSentiment != null) {sentiments.add(tweetWithSentiment);}}Result result = new Result(keyword, sentiments);results.add(result);}return results;} }

上述代碼執行以下操作：

檢查搜索關鍵字(searchkeywords)是否“不是無效且不為空”，然后將其拆分到一個數組里，只考慮三個搜索條件。

然后對每一個搜索條件找到對應的推文，并做情感分析。

最后將返回結果列表給用戶。

今天就是這些，歡迎反饋。

原文 Day 20: Stanford CoreNLP--Performing Sentiment Analysis of Twitter using Java
翻譯整理 SegmentFault

總結

以上是生活随笔為你收集整理的Day 20: 斯坦福CoreNLP —— 用Java给Twitter进行情感分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Day 19: EmberJS 入门指南
下一篇： Day 21：Docker 入门教程