Geospark加载PostgreSQL数据库
title: (七)GeoSpark與PostgreSQL
date: 2021-05-12 16:38:23
tags:
- GeoSpark
PostGreSQL以及Mysql均有空間引擎擴(kuò)展,且GeoSpark是針對(duì)大數(shù)據(jù)的空間分析,而SHP、GeoJson等只是小數(shù)據(jù)集的,PostGIS就能勝任其分析任務(wù)。所以對(duì)于大數(shù)據(jù)集的,我們還是要用空間關(guān)系數(shù)據(jù)庫存儲(chǔ),利用Spark SQL從數(shù)據(jù)庫中加載數(shù)據(jù),獲得DataFrame,然后利用Geospark轉(zhuǎn)為幾何彈性數(shù)據(jù)集RDD。
Geospark加載PostgreSQL數(shù)據(jù)庫
關(guān)于PostgreSQL和PostGIS的介紹,后續(xù)會(huì)有更新,今天還是以Geospark為核心,來介紹如何用GeoSpark加載我們的數(shù)據(jù)庫中的空間數(shù)據(jù)。
假設(shè)我們已經(jīng)安裝了PostGreSQL以及PostGIS擴(kuò)展,并將我們這里的公園數(shù)據(jù)(park)導(dǎo)入到PostGreSQL中。
這里我的Postgresq的連接參數(shù)為:
url=192.168.10.174:5432 table=prks user=postgres password=root首先我們需要加入postgresql的jdbc依賴
<!-- https://mvnrepository.com/artifact/org.postgresql/postgresql --> <dependency><groupId>org.postgresql</groupId><artifactId>postgresql</artifactId><version>42.2.5</version> </dependency>初始化Spark,并注冊(cè)GeosparkSQL
SparkSession spark = SparkSession.builder().config("spark.serializer","org.apache.spark.serializer.KryoSerializer").config("spark.kryo.registrator", "org.datasyslab.geospark.serde.GeoSparkKryoRegistrator").master("local[*]").appName("Learn07").getOrCreate();GeoSparkSQLRegistrator.registerAll(spark); GeoSparkVizRegistrator.registerAll(spark);利用SparkSQL讀取我們導(dǎo)入到postgresql中的數(shù)據(jù)
首先利用SparkSQL讀取我們導(dǎo)入到postgresql中的數(shù)據(jù),在postgis中有一個(gè)geom字段,他里面存放的是wkb格式的幾何信息,對(duì)于sparksql來說,geom字段就是一個(gè)varchar字段,我們還需要使用geospark的相關(guān)函數(shù)來將其轉(zhuǎn)為Geometry。
String url = "jdbc:postgresql://192.168.10.174:5432/geospark"; String table = "parks";Properties connectionProperties = new Properties(); connectionProperties.put("user","postgres"); connectionProperties.put("password","root"); connectionProperties.put("driver","org.postgresql.Driver");Dataset<Row> df = spark.read().jdbc(url, table, connectionProperties); df.createOrReplaceTempView("parks"); df.show(); +---+------+---------+--------------------+--------------+---------------+---------------+-------+---------+----------+--------------------+ |gid|parkid|refparkid| parkname| neighborho| ewstreet| nsstreet|dogpark|washrooms|specialfea| geom| +---+------+---------+--------------------+--------------+---------------+---------------+-------+---------+----------+--------------------+ | 1| 1| -9999| null| Kitsilano| null| null| N| null| null|01060000000100000...| | 2| 2| 208| Rosemary Brown Park| Kitsilano| W 11th Avenue| Vine Street| N| N| N|01060000000100000...| | 3| 3| 141| Tea Swamp Park|Mount Pleasant| E 15th Avenue| Sophia Street| N| N| N|01060000000200000...| | 4| 4| -9999| null| Strathcona| null| null| N| null| null|01060000000200000...| | 5| 5| 202| Morton Park| West End| Morton Avenue| Denman Street| N| N| N|01060000000200000...| | 6| 6| -9999| Mcbride Park| Kitsilano| null| null| N| null| null|01060000000200000...| | 7| 7| -9999| Granville Park| Fairview| null| null| N| null| null|01060000000200000...| | 8| 8| -9999| null|Mount Pleasant| null| null| N| null| null|01060000000300000...| | 9| 9| 15| Creekside Park|Mount Pleasant|Terminal Avenue| Quebec Street| N| N| Y|01060000000200000...| | 10| 10| 134|China Creek South...|Mount Pleasant| E 10th Avenue| Clark Drive| N| N| N|01060000000200000...|利用Geospark將geom列轉(zhuǎn)為Geometry
從postgresql中讀取的geom列為wkb格式的Geometry,因此需要利用Geospark將其轉(zhuǎn)為Geometry。
// 在將DataFrame轉(zhuǎn)為RDD的時(shí)候,Geospark默認(rèn)第一列為Geometry String sql = "select ST_GeomFromWKB(geom) as geom, parkname, parkid from parks"; df = spark.sql(sql); df.show(); +--------------------+--------------------+------+ | geom| parkname|parkid| +--------------------+--------------------+------+ |POLYGON ((-123.15...| null| 1| |POLYGON ((-123.15...| Rosemary Brown Park| 2| |MULTIPOLYGON (((-...| Tea Swamp Park| 3| |MULTIPOLYGON (((-...| null| 4| |MULTIPOLYGON (((-...| Morton Park| 5| |MULTIPOLYGON (((-...| Mcbride Park| 6| |MULTIPOLYGON (((-...| Granville Park| 7| |MULTIPOLYGON (((-...| null| 8| |MULTIPOLYGON (((-...| Creekside Park| 9| |MULTIPOLYGON (((-...|China Creek South...| 10|Adapter靜態(tài)類在DataFrame和RDD之間進(jìn)行轉(zhuǎn)換操作
Geospark提供了Adapter靜態(tài)類,可以在DataFrame和RDD之間進(jìn)行轉(zhuǎn)換操作。
JavaRDD<Geometry> rdd = Adapter.toJavaRdd(df); rdd.foreach((geometry -> {System.out.println(geometry); })); POLYGON ((-123.15566057081632 49.26206733490204, -123.15564728017853 49.26241791476514, -123.15548939905344 49.262415429329856, -123.15550257747702 49.26206484963618, -123.15566057081632 49.26206733490204)) null 1 POLYGON ((-123.15760176703519 49.261936547646954, -123.15718706338478 49.2619299178749, -123.15719832396375 49.26162160945501, -123.15761313807661 49.26162814910161, -123.15760218456263 49.26192530535148, -123.15760176703519 49.261936547646954)) Rosemary Brown Park 2 MULTIPOLYGON (((-123.09870507685639 49.25665872626679, -123.09884321830909 49.25666085651373, -123.09898135977342 49.25666298659548, -123.09918885538166 49.25666622613204, -123.09917717616912 49.25700195924662, -123.09897035961777 49.256998988908855, -123.09883221757528 49.25699703868207, -123.0986933956194 49.25699508887092, -123.09855525341915 49.256993048379265, -123.09856693541536 49.25665659585471, -123.09870507685639 49.25665872626679)), ((-123.0988872811734 49.257388923290726, -123.09874913822517 49.257387062899134, -123.09854232044736 49.257384271663746, -123.09855332287414 49.257047909676935, -123.09876081971025 49.257050970113596, -123.09889896208026 49.25705301035933, -123.0988872811734 49.257388923290726))) Tea Swamp Park 3 MULTIPOLYGON (((-123.08789868681455 49.27697178369337, -123.08779279965931 49.276970245390025, -123.08768827290808 49.27696870595726, -123.0875838595297 49.276967166344015, -123.08747933279125 49.27696562672224, -123.08749381859947 49.2766309730507, -123.0875983449497 49.27663269252672, -123.08770287146653 49.27663450184158, -123.08780728446555 49.27663622121462, -123.08791317138568 49.276638029305445, -123.08789868681455 49.27697178369337)), ((-123.08777557719297 49.27735877134432, -123.08767116251035 49.277356962010295, -123.08756663446789 49.27735515266783, -123.08746210643292 49.2773533432308, -123.08747670583442 49.27701868949804, -123.08758111979606 49.27702049900829, -123.08768564713182 49.277022308338154, -123.0877915350338 49.27702420647329, -123.08777557719297 49.27735877134432))) null 4 MULTIPOLYGON (((-123.14231536153594 49.288085334402886, -123.14152321088636 49.28757269767203, -123.14181135642984 49.28738168560991, -123.1422763290905 49.287682390935885, -123.14231536153594 49.288085334402886)), ((-123.14225003608222 49.28741127411166, -123.14200556426914 49.28725275241468, -123.14221142848969 49.287116250139206, -123.1422217546953 49.287118935432616, -123.14225003608222 49.28741127411166))) Morton Park 5 MULTIPOLYGON (((-123.1815623274226 49.26844720942817, -123.17990316962349 49.26842093595351, -123.17992703362962 49.2677831787414, -123.18158548965377 49.26780936303689, -123.1815623274226 49.26844720942817)), ((-123.17948874889649 49.26841438577292, -123.17783027685374 49.26838907073858, -123.17785348555584 49.267750775367986, -123.17951250489656 49.267776628815625, -123.17948874889649 49.26841438577292))) Mcbride Park 6 MULTIPOLYGON (((-123.14133650443698 49.25886332633381, -123.14136437634508 49.25810730827778, -123.14199539077381 49.25811759634792, -123.14262005884645 49.25812770893823, -123.14262638528903 49.25796357183548, -123.14346603622347 49.25797737135294, -123.14345971282438 49.25814159843223, -123.14343107963043 49.25889761794439, -123.14196594195184 49.25887361649547, -123.14133650443698 49.25886332633381)), ((-123.14578559773607 49.25853754368109, -123.14372265753794 49.25850460441758, -123.1437353757422 49.25816230034241, -123.14579830035647 49.25819478972661, -123.14578559773607 49.25853754368109))) Granville Park 7 MULTIPOLYGON (((-123.08896518268944 49.263513027273426, -123.08895956291931 49.263854420100515, -123.08824281220937 49.263848674370315, -123.08824279938008 49.26384147968065, -123.0882483235879 49.26350728166151, -123.08896518268944 49.263513027273426)), ((-123.08945644895253 49.26424084147778, -123.08923033699166 49.26423876797776, -123.08923596790757 49.26390456985079, -123.08946207752874 49.263906193672696, -123.08945644895253 49.26424084147778)), ((-123.0889541299678 49.264236732613874, -123.08876100016299 49.264235172400404, -123.08876652075031 49.26390088444774, -123.08876650784482 49.26389368975824, -123.08895895645973 49.26389534042054, -123.0889541299678 49.264236732613874))) null 8參考鏈接:https://www.jianshu.com/p/fab6d59e84ed
與50位技術(shù)專家面對(duì)面20年技術(shù)見證,附贈(zèng)技術(shù)全景圖總結(jié)
以上是生活随笔為你收集整理的Geospark加载PostgreSQL数据库的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: SparkSQL操作Hive
- 下一篇: java gui编程:swing创建窗体