當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

GeoMesa-空间数据存储引擎入门学习手册

發布時間：2025/1/21 编程问答 17 豆豆

生活随笔收集整理的這篇文章主要介紹了 GeoMesa-空间数据存储引擎入门学习手册小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

GeoMesa-空間數據存儲引擎

geomesa簡介、架構體系、數據存儲、spark等

第一部分：GeoMesa簡介

GeoMesa是一款開源的基于分布式計算系統的?面向海?量量時空數據查詢與分析的?工具包
GeoMesa基于GeoToolsAPI進?行行設計，與GeoServer等進?行行集成提供OGC標準的服務。

?支持多種可擴展的、基于云端的數據存儲架構，包括ApacheAccumulo,HBase，Cassandra，Google Bigtable，以及?用于流計算的Apache Kafka 。

提供了了Spark，并增加了了正對空間數據的UDT、UDF和UDAF，?方便便?用戶直接使?用SparkSQL進?行行空間數據查詢與分析。
Git 地址：https://github.com/locationtech/geomesa。
Build地址：https://github.com/locationtech/geomesa/releases

第二部分：GeoMesa架構體系

1：模塊劃分

接口層
- 6.1. GeoTools Feature Types
- 6.2. Index Overview
- 6.3. Index Basics
- 6.4. Index Versioning
- 6.5. Index Configuration
- 6.6. Runtime Configuration
- 6.7. Query Planning
- 6.8. Explaining Query Plans
- 6.9. Query Properties
- 6.10. Filter Functions
- 6.11. Analytic Querying
- 6.12. Authorizations
- 6.13. Query Auditing
6.14. Moving and Migrating Data
- 6.15. Reserved Words
命令行

Command-Line Tools
底層存儲
- HBase Data Store
- Accumulo Data Store
- Bigtable Data Store
- Cassandra Data Store
- Kafka Data Store
- Redis Data Store
- FileSystem Data Store
- Kudu Data Store
- Lambda Data Store

2：時空索引-R樹

據R樹的這種數據結構，當我們需要進行一個高維空間查詢時，我們只需要遍歷少數幾個葉子結點所包含的指針，查看這些指針指向的數據是否滿足要求即可。

Oracle Spatial、MySQL Spatial、PostgreSQL(PostGIS) 都是基于R樹進行空間搜索操作，即對空間字段(Geometry Column)創建R樹索引。
R樹存在的問題:
單獨創建索引?文件
數據更更新問題：為了了達到平衡狀態，新插?入數據需要更更新整個R樹。
不不適合NoSQL的結構
HBase本身只提供基于?行行鍵和全表掃描的查詢，?而?行行鍵索引單?一，對于多維度的查詢困難。

3：GeoHash

1簡介

GeoHash是一種地址編碼方法。他能夠把二維的空間經緯度數據編碼成一個字符串。GeoHash具有以下特點：
1、GeoHash用一個字符串表示經度和緯度兩個坐標。在數據庫中可以實現在一列上應用索引
2、GeoHash表示的并不是一個點，而是一個區域；
3、GeoHash編碼的前綴可以表示更大的區域。例如wx4g0ec1，它的前綴wx4g0e表示包含編碼wx4g0ec1在內的更大范圍。這個特性可以用于附近地點搜索。

2：GeoHash的計算過程：

1.將經緯度轉換成二進制：

比如這樣一個點（39.923201, 116.390705）緯度的范圍是（-90，90），其中間值為0。對于緯度39.923201，在區間（0，90）中，因此得到一個1；（0，90）區間的中間值為45度，緯度39.923201小于45，因此得到一個0，依次計算下去，即可得到緯度的二進制表示。

最后得到緯度的二進制表示為： 10111000110001111001
同理可以得到經度116.390705的二進制表示為： 11010010110001000100

2.合并緯度、經度的二進制：

合并方法是將經度、緯度二進制按照奇偶位合并： 1110011101001000111100000011010101100001

如下圖：

3：按照Base32進行編碼：

Base32編碼表（其中一種）：

將上述合并后二進制編碼后結果為： wx4g0ec1

3：特點：

字符串越長，表示的范圍越小越精確;字符串長度越小，表示的范圍越大越寬泛。
字符串越相似表示距離越相近。

4：GeoMesa時空索引

1：各種空間填充曲線

2：空間查詢

用戶定義查詢窗口

層次劃分

計算查詢范圍(Range)

3：GeoMesa時空索引

GeoMesa使用了基于Z-order填充曲線的GeoHash空間索引技術，
并針對時間維度進行了擴展，具體分為:
? Z2:空間，點索引
? Z3:時間+空間，點索引
? XZ2:空間，線\面索引
? XZ3:時間+空間，線\面索引。

git實現：https://github.com/locationtech/sfcurve

? https://github.com/locationtech/geomesa/tree/master/geomesa-z3

5：GeoMesa HBase 索引

RowKey設計

屬性索引
Z-Index Shards: 預拆分，范圍為1-127，默認為4。
Z-Index Time Interval

查詢：

可以按照屬性查詢
可以按照空間范圍查詢
可以按照時間查詢
Geomesa會綜合選擇一種最快的查詢方式執行

第三部分：GeoMesa 數據存儲

1：數據結構

SimpleFeatureType：空間數據結構描述，包含空間WKT、時間信息、屬性信息等。

import org.locationtech.geomesa.utils.interop.SimpleFeatureTypes; SimpleFeatureTypes.createType("example", "name:String,dtg:Date,*geom:Point:srid=4326");

屬性類型

Attribute TypeBindingIndexable

String	java.lang.String	Yes
Integer	java.lang.Integer	Yes
Double	java.lang.Double	Yes
Long	java.lang.Long	Yes
Float	java.lang.Float	Yes
Boolean	java.lang.Boolean	Yes
UUID	java.util.UUID	Yes
Date	java.util.Date	Yes
Timestamp	java.sql.Timestamp	Yes
Point	org.locationtech.jts.geom.Point	Yes
LineString	org.locationtech.jts.geom.LineString	Yes
Polygon	org.locationtech.jts.geom.Polygon	Yes
MultiPoint	org.locationtech.jts.geom.MultiPoint	Yes
MultiLineString	org.locationtech.jts.geom.MultiLineString	Yes
MultiPolygon	org.locationtech.jts.geom.MultiPolygon	Yes
GeometryCollection	org.locationtech.jts.geom.GeometryCollection	Yes
Geometry	org.locationtech.jts.geom.Geometry	Yes
List[A]	java.util.List	Yes
Map[A,B]	java.util.Map<A, B>	No
Bytes	byte[]	No

第四部分：GeoMesa Hbase應用

1：Geomesa Hbase操作

2：GeoServer集合 Geomesa

Geomesa實現了GeoTools接口，提供了基于HTTP，方法和標準OGC服務的訪問形式。

Web Feature Service (WFS)
Web Mapping Service (WMS)
Web Processing Service (WPS)
Web Coverage Service (WCS)

數據更新流程

數據查詢流程

第五部分：Geomesa Spark

1：geomesa-spark-jts ：基于Spark 的JTS控件庫。

依賴包：

<dependency><groupId>org.locationtech.geomesa</groupId><artifactId>geomesa-spark-jts_2.11</artifactId> </dependency>

從文件系統讀取

import org.locationtech.jts.geom._ import org.apache.spark.sql.types._ import org.locationtech.geomesa.spark.jts._import spark.implicits._val schema = StructType(Array(StructField("name",StringType, nullable=false),StructField("pointText", StringType, nullable=false),StructField("polygonText", StringType, nullable=false),StructField("latitude", DoubleType, nullable=false),StructField("longitude", DoubleType, nullable=false)))val dataFile = this.getClass.getClassLoader.getResource("jts-example.csv").getPath val df = spark.read.schema(schema).option("sep", "-").option("timestampFormat", "yyyy/MM/dd HH:mm:ss ZZ").csv(dataFile)val alteredDF = df.withColumn("polygon", st_polygonFromText($"polygonText")).withColumn("point", st_makePoint($"latitude", $"longitude"))

構造df

import spark.implicits._ val point = new GeometryFactory().createPoint(new Coordinate(3.4, 5.6)) val df = Seq(point).toDF("point")

2：geomesa-spark-core：基于Geotools實現的Spark核心庫，將空間數據轉換為RDD。

包括SimpleFeatures、 Multiple backends （Accumulo, HBase, FileSystem, Kudu, GeoMesa Convert library轉換后文件系統, 其他 GeoTools 數據源）

3：geomesa-spark-sql：基于SparkSQl 的空間查詢庫

// DataStore params to a hypothetical GeoMesa Accumulo table val dsParams = Map("accumulo.instance.id" -> "instance","accumulo.zookeepers" -> "zoo1,zoo2,zoo3","accumulo.user" -> "user","accumulo.password" -> "*****","accumulo.catalog" -> "geomesa_catalog","geomesa.security.auths" -> "USER,ADMIN")// Create SparkSession val sparkSession = SparkSession.builder().appName("testSpark").config("spark.sql.crossJoin.enabled", "true").master("local[*]").getOrCreate()// Create DataFrame using the "geomesa" format val dataFrame = sparkSession.read.format("geomesa").options(dsParams).option("geomesa.feature", "chicago").load() dataFrame.createOrReplaceTempView("chicago")// Query against the "chicago" schema val sqlQuery = "select * from chicago where st_contains(st_makeBBOX(0.0, 0.0, 90.0, 90.0), geom)" val resultDataFrame = sparkSession.sql(sqlQuery)resultDataFrame.show /* +-------+------+-----------+--------------------+-----------------+ |__fid__|arrest|case_number| dtg| geom| +-------+------+-----------+--------------------+-----------------+ | 4| true| 4|2016-01-04 00:00:...|POINT (76.5 38.5)| | 5| true| 5|2016-01-05 00:00:...| POINT (77 38)| | 6| true| 6|2016-01-06 00:00:...| POINT (78 39)| | 7| true| 7|2016-01-07 00:00:...| POINT (20 20)| | 9| true| 9|2016-01-09 00:00:...| POINT (50 50)| +-------+------+-----------+--------------------+-----------------+ */

第六部分：Cassandra應用

1：Cassandra config

spring.data.cassandra.cluster-name=Test Cluster
spring.data.cassandra.keyspace-name= user_space
spring.data.cassandra.contact-points=127.0.0.1
spring.data.cassandra.port=9042

2：run test

java -cp geomesa-tutorials-cassandra/geomesa-tutorials-cassandra-quickstart/target/geomesa-tutorials-cassandra-quickstart-2.4.0-SNAPSHOT.jar org.geomesa.example.cassandra.CassandraQuickStart --cassandra.contact.point 127.0.0.1:9042 --cassandra.keyspace geomesa --cassandra.catalog sample_table

3：output

bin/geomesa-cassandra export --output-format leaflet --contact-point 127.0.0.1:9042 --key-space geomesa --catalog sample_table

第七部分：其他應用

1：GeoMesa Kafka DataStore

?使?用Kafka作為數據存儲DataStore
?通過GeoTools DataStore標準接?口進?行行訪問

?Consumer與Producer可以分布在不不同server

??支持要素緩存，定時寫?入kafka

[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-ENViAIoI-1619677430713)(resource/kafka.png)]

2：GeoMesa Lambda DataStore

數據存儲在兩個層:transienttier(Kafka)和 a persistent tier(HBase)
數據定時寫?持久層
使?用ZK同步數據緩存狀態，保證數據一次寫?
進?行行數據查詢會從兩個存儲層分別進?，然后合并查詢結果返回給?用戶。

與50位技術專家面對面20年技術見證，附贈技術全景圖

總結

以上是生活随笔為你收集整理的GeoMesa-空间数据存储引擎入门学习手册的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： geotools学习（一）Intelli
下一篇： Geospark从Shapefile中加