mahout推荐10-尝试GroupLens数据集
生活随笔
收集整理的這篇文章主要介紹了
mahout推荐10-尝试GroupLens数据集
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
數(shù)據(jù)集下載地址:http://grouplens.org/datasets/movielens/ 之前用的是100K的,現(xiàn)在需要下載MovieLens 10M,使用里面的ratings.dat
前提:因?yàn)槲募环蟤ahout要求的文件輸入格式,需要進(jìn)行轉(zhuǎn)換,但是example里提供了一個(gè)解析這個(gè)文件的類(lèi)GrouplensDataModel,所以直接用了。
package mahout;import java.io.File;import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.eval.LoadEvaluator; import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.Recommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; import org.apache.mahout.cf.taste.similarity.precompute.example.GroupLensDataModel;public class GroupLensDataModelTest {public static void main(String[] args) throws Exception {//使用定制的GrouplensDataModel,如果沒(méi)有轉(zhuǎn)換數(shù)據(jù)集成為csv格式的DataModel dataModel = new GroupLensDataModel(new File("data/ratings.dat"));//皮爾遜相關(guān)系數(shù),衡量用戶(hù)相似度UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);//構(gòu)建用戶(hù)鄰居,100個(gè)UserNeighborhood userNeighborhood = new NearestNUserNeighborhood(100,userSimilarity, dataModel);//推薦引擎Recommender recommender = new GenericUserBasedRecommender(dataModel,userNeighborhood, userSimilarity);//運(yùn)行LoadEvaluator.runLoad(recommender);} }?運(yùn)行試試,如果你的內(nèi)存足夠大的話(huà)。
輸出結(jié)果:
我的文件還沒(méi)有下載下來(lái)呢!!!!!!!!!!
補(bǔ)上:
輸出結(jié)果:
14/08/05 10:05:13 INFO file.FileDataModel: Creating FileDataModel for file C:\Users\ADMINI~1\AppData\Local\Temp\ratings.txt 14/08/05 10:05:17 INFO file.FileDataModel: Reading file info... 14/08/05 10:05:18 INFO file.FileDataModel: Processed 1000000 lines 14/08/05 10:05:19 INFO file.FileDataModel: Processed 2000000 lines 14/08/05 10:05:20 INFO file.FileDataModel: Processed 3000000 lines 14/08/05 10:05:21 INFO file.FileDataModel: Processed 4000000 lines 14/08/05 10:05:23 INFO file.FileDataModel: Processed 5000000 lines 14/08/05 10:05:24 INFO file.FileDataModel: Processed 6000000 lines 14/08/05 10:05:25 INFO file.FileDataModel: Processed 7000000 lines 14/08/05 10:05:26 INFO file.FileDataModel: Processed 8000000 lines 14/08/05 10:05:27 INFO file.FileDataModel: Processed 9000000 lines 14/08/05 10:05:30 INFO file.FileDataModel: Processed 10000000 lines 14/08/05 10:05:30 INFO file.FileDataModel: Read lines: 10000054 14/08/05 10:05:31 INFO model.GenericDataModel: Processed 10000 users 14/08/05 10:05:31 INFO model.GenericDataModel: Processed 20000 users 14/08/05 10:05:33 INFO model.GenericDataModel: Processed 30000 users 14/08/05 10:05:33 INFO model.GenericDataModel: Processed 40000 users 14/08/05 10:05:34 INFO model.GenericDataModel: Processed 50000 users 14/08/05 10:05:34 INFO model.GenericDataModel: Processed 60000 users 14/08/05 10:05:35 INFO model.GenericDataModel: Processed 69878 users 14/08/05 10:05:39 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 982 tasks in 4 threads 14/08/05 10:05:39 INFO eval.StatsCallable: Average time per recommendation: 163ms 14/08/05 10:05:39 INFO eval.StatsCallable: Approximate memory used: 445MB / 815MB 14/08/05 10:05:39 INFO eval.StatsCallable: Unable to recommend in 0 cases?沒(méi)有輸出結(jié)果:
在代碼最后增加這么幾行代碼測(cè)試:
//增加推薦://為用戶(hù)1推薦10件物品1,10List<RecommendedItem> recommendedItems = recommender.recommend(1, 10);//輸出for (RecommendedItem item : recommendedItems) {System.out.println(item);}?查看輸出結(jié)果:還是沒(méi)有結(jié)果,怪了,后期再搞搞。
14/08/05 10:09:48 INFO file.FileDataModel: Creating FileDataModel for file C:\Users\ADMINI~1\AppData\Local\Temp\ratings.txt 14/08/05 10:09:48 INFO file.FileDataModel: Reading file info... 14/08/05 10:09:49 INFO file.FileDataModel: Processed 1000000 lines 14/08/05 10:09:50 INFO file.FileDataModel: Processed 2000000 lines 14/08/05 10:09:52 INFO file.FileDataModel: Processed 3000000 lines 14/08/05 10:09:52 INFO file.FileDataModel: Processed 4000000 lines 14/08/05 10:09:54 INFO file.FileDataModel: Processed 5000000 lines 14/08/05 10:09:56 INFO file.FileDataModel: Processed 6000000 lines 14/08/05 10:09:56 INFO file.FileDataModel: Processed 7000000 lines 14/08/05 10:09:57 INFO file.FileDataModel: Processed 8000000 lines 14/08/05 10:09:58 INFO file.FileDataModel: Processed 9000000 lines 14/08/05 10:10:00 INFO file.FileDataModel: Processed 10000000 lines 14/08/05 10:10:00 INFO file.FileDataModel: Read lines: 10000054 14/08/05 10:10:01 INFO model.GenericDataModel: Processed 10000 users 14/08/05 10:10:01 INFO model.GenericDataModel: Processed 20000 users 14/08/05 10:10:02 INFO model.GenericDataModel: Processed 30000 users 14/08/05 10:10:02 INFO model.GenericDataModel: Processed 40000 users 14/08/05 10:10:02 INFO model.GenericDataModel: Processed 50000 users 14/08/05 10:10:03 INFO model.GenericDataModel: Processed 60000 users 14/08/05 10:10:06 INFO model.GenericDataModel: Processed 69878 users 14/08/05 10:10:08 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 985 tasks in 4 threads 14/08/05 10:10:08 INFO eval.StatsCallable: Average time per recommendation: 116ms 14/08/05 10:10:08 INFO eval.StatsCallable: Approximate memory used: 578MB / 795MB 14/08/05 10:10:08 INFO eval.StatsCallable: Unable to recommend in 0 cases?
轉(zhuǎn)載于:https://www.cnblogs.com/jsunday/p/3889947.html
總結(jié)
以上是生活随笔為你收集整理的mahout推荐10-尝试GroupLens数据集的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: ASIHTTPRequest的环境配置和
- 下一篇: 一键自动发布ipa(更新svn,拷贝资源