第一个Spark程序
1、Java下Spark開發(fā)環(huán)境搭建(from?http://www.cnblogs.com/eczhou/p/5216918.html)
1.1、jdk安裝
安裝oracle下的jdk,我安裝的是jdk 1.7,安裝完新建系統(tǒng)環(huán)境變量JAVA_HOME,變量值為“C:\Program Files\Java\jdk1.7.0_79”,視自己安裝路勁而定。
同時(shí)在系統(tǒng)變量Path下添加C:\Program Files\Java\jdk1.7.0_79\bin和C:\Program Files\Java\jre7\bin。
1.2 spark環(huán)境變量配置
去http://spark.apache.org/downloads.html網(wǎng)站下載相應(yīng)hadoop對(duì)應(yīng)的版本,我下載的是spark-1.6.0-bin-hadoop2.6.tgz,spark版本是1.6,對(duì)應(yīng)的hadoop版本是2.6
解壓下載的文件,假設(shè)解壓 目錄為:D:\spark-1.6.0-bin-hadoop2.6。將D:\spark-1.6.0-bin-hadoop2.6\bin添加到系統(tǒng)Path變量,同時(shí)新建SPARK_HOME變量,變量值為:D:\spark-1.6.0-bin-hadoop2.6
1.3 hadoop工具包安裝
spark是基于hadoop之上的,運(yùn)行過程中會(huì)調(diào)用相關(guān)hadoop庫,如果沒配置相關(guān)hadoop運(yùn)行環(huán)境,會(huì)提示相關(guān)出錯(cuò)信息,雖然也不影響運(yùn)行,但是這里還是把hadoop相關(guān)庫也配置好吧。
1.3.1 去下載hadoop 2.6編譯好的包https://www.barik.net/archive/2015/01/19/172716/,我下載的是hadoop-2.6.0.tar.gz,
1.3.2 解壓下載的文件夾,將相關(guān)庫添加到系統(tǒng)Path變量中:D:\hadoop-2.6.0\bin;同時(shí)新建HADOOP_HOME變量,變量值為:D:\hadoop-2.6.0
1.4 eclipse環(huán)境
直接新建java工程,將D:\spark-1.6.0-bin-hadoop2.6\lib下的spark-assembly-1.6.0-hadoop2.6.0.jar添加到工程中就可以了。
?
2、Java寫Spark WordCount程序
package?cn.spark.study;import?java.util.Arrays;
import?org.apache.spark.SparkConf;
import?org.apache.spark.api.java.JavaPairRDD;
import?org.apache.spark.api.java.JavaRDD;
import?org.apache.spark.api.java.JavaSparkContext;
import?org.apache.spark.api.java.function.FlatMapFunction;
import?org.apache.spark.api.java.function.Function2;
import?org.apache.spark.api.java.function.PairFunction;
import?org.apache.spark.api.java.function.VoidFunction;
import?scala.Tuple2;
public?class?WordCount?{
????public?static?void?main(String[]?args)?{
????????
????????//創(chuàng)建?SparkConf對(duì)象,對(duì)程序進(jìn)行必要的配置
????????SparkConf?conf?=?new?SparkConf()
????????.setAppName("WordCount").setMaster("local");
????????
????????//通過conf創(chuàng)建上下文對(duì)象
????????JavaSparkContext?sc?=?new?JavaSparkContext(conf);
????????
????????//創(chuàng)建初始RDD
????????JavaRDD<String>?lines?=?sc.textFile("D://spark.txt");
????????
????????//----用各種Transformation算子對(duì)RDD進(jìn)行操作-----------------------------------------
????????JavaRDD<String>?words?=?lines.flatMap(new?FlatMapFunction<String,?String>()?{
????????????private?static?final?long?serialVersionUID?=?1L;
????????????@Override
????????????public?Iterable<String>?call(String?line)?throws?Exception?{
????????????????//?TODO?Auto-generated?method?stub
????????????????return?Arrays.asList(line.split("?"));
????????????}
????????});
????????
????????JavaPairRDD<String,Integer>?pairs?=?words.mapToPair(new?PairFunction<String,?String,?Integer>()?{
????????????private?static?final?long?serialVersionUID?=?1L;
????????????@Override
????????????public?Tuple2<String,?Integer>?call(String?word)?throws?Exception?{
????????????????//?TODO?Auto-generated?method?stub
????????????????return?new?Tuple2<String,Integer>(word,1);
????????????}
????????});
????????
????????JavaPairRDD<String,Integer>?wordCounts?=?pairs.reduceByKey(new?Function2<Integer,?Integer,?Integer>()?{
????????????
????????????private?static?final?long?serialVersionUID?=?1L;
????????????
????????????@Override
????????????public?Integer?call(Integer?v1,?Integer?v2)?throws?Exception?{
????????????????//?TODO?Auto-generated?method?stub
????????????????return?v1?+?v2;
????????????}
????????});
????????
????????
????????//----用一個(gè)?action?算子觸發(fā)job-----------------------------------------
????????wordCounts.foreach(new?VoidFunction<Tuple2<String,Integer>>()?{
????????????
????????????@Override
????????????public?void?call(Tuple2<String,?Integer>?wordCount)?throws?Exception?{
????????????????//?TODO?Auto-generated?method?stub
????????????????System.out.println(wordCount._1?+?"?appeared?"?+?wordCount._2?+?"?times");
????????????}
????????});
????}
}?
?
轉(zhuǎn)載于:https://www.cnblogs.com/key1309/p/5303557.html
總結(jié)
以上是生活随笔為你收集整理的第一个Spark程序的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 数学建模按赛题划分常用代码
- 下一篇: dubbo教程(绝对的入门到入土)