python gis库_使用开放的python库自动化GIS和遥感工作流
python gis庫
Over my career I’ve worked on many geospatial related projects using the ArcGIS platform, which I absolutely love. That means I get to consult in projects with cutting-edge geospatial technologies, like Multidimensional Rasters, Deep Learning, and spatial IoT automation. With that in mind, I always try to keep track of how to perform the same operations I’m working on without Esri’s beautifully crafted systems.
在我的職業生涯中,我曾使用我絕對喜歡的ArcGIS平臺從事過許多與地理空間相關的項目。 這意味著我可以在具有尖端地理空間技術的項目中進行咨詢,例如多維柵格,深度學習和空間物聯網自動化 。 考慮到這一點,我始終嘗試在沒有 Esri精心設計的系統的情況下跟蹤如何執行與我相同的操作。
Over the last few weekends during an exceptionally tedious quarantine I've worked on this little script that would reproduce something I've been developing with ESRI's Living Atlas to achieve NDVI Zonal Statistics (Normalized Difference Vegetation Index, a classic in Remote Sensing) for rural land plots.
在過去的幾個周末中,我在一個非常繁瑣的隔離中工作,編寫了這個小腳本,該腳本將重現我與ESRI的《生活地圖集》一直在開發的內容,以實現針對農村的 NDVI分區統計 (標準差植被指數 ,這是遙感領域的經典) 地塊 。
The plan here was to perform an entire geoprocessing and remote sensing routine without having to resort to any GIS Desktop software. We're starting out with a layer that contains some parcels in which we're intersted and a layer with protected areas where special legislation applies. Nothing else is allowed outside python! This is our plan du jour:
這里的計劃是執行整個地理處理和遙感例程,而不必借助任何GIS Desktop軟件。 我們首先從一個包含一些地塊的圖層開始,然后到一個包含受保護區域的圖層,其中要應用特殊法規。 python之外沒有其他任何內容! 這是我們的計劃 :
讓我們開始吧 (Let's get started)
So. First things first. We have to import the geojson file that has the land plots stored in it. For that, we'll be using the geopandas library. If you're not familiar with it, my advice is get acquainted with it. Geopandas basically emulates the functions we've been using in classic GIS Desktop softwares (GRASS, GDAL, ArcGIS, QGIS…) in python in a way consistent with pandas — a very popular tool among data scientists — to allow spatial operations on geometric types.
所以。 首先是第一件事。 我們必須導入存儲有土地圖的geojson文件。 為此,我們將使用geopandas庫 。 如果您不熟悉它,那么我的建議是熟悉的。 Geopandas基本上以python方式模擬了我們在經典GIS桌面軟件(GRASS,GDAL,ArcGIS,QGIS…)中使用的功能,與pandas (數據科學家中非常流行的工具)相一致,從而可以對幾何類型進行空間運算。
In order to import our geojson, the first thing we must note are the data types in geopandas. Basically, when we ran the .read_file method, we assigned a geodataframe type to the polygons variable. Inside every geodataframe there will always be a geoseries , which we can access via the .geometry method. Once we find the geoseries , we can make use of the .isvalid method, that produces a list of True/False values for each record in our series.
為了導入geojson,首先要注意的是geopandas中的數據類型。 基本上,當我們運行.read_file方法,我們分配一個geodataframe類型的polygons變量。 在每個geodataframe ,總會有一個geoseries ,我們可以通過.geometry方法進行訪問。 找到geoseries ,就可以使用.isvalid方法,該方法為序列中的每個記錄生成一個True / False值列表。
And of course there are invalid geometries hanging in our dataset. It comes as no surprise, since those land plots came from the CAR Registry, where every rural land owner in Brazil have to self-declare the extent of their own properties.
當然,我們的數據集中還有無效的幾何圖形。 毫不奇怪,因為這些地塊來自中非共和國汽車登記處,巴西的每個農村土地所有者都必須自行申報自己財產的范圍。
So, how do we fix that? Maybe you're used to running the excelent invalid geometries checking tools present in ArcGIS or QGIS, which generate even a report on what the problem was with each record in a table. But we don't have access to those in geopandas. Instead, we'll do a little trick to correct the geometries, by applying a 0 meter buffer to all geometries.
那么,我們該如何解決呢? 也許您已經習慣運行ArcGIS或QGIS中提供的出色的無效幾何檢查工具,它們甚至生成有關表中每個記錄存在問題的報告。 但是我們無法訪問那些在大熊貓中的人。 取而代之的是,我們將通過對所有幾何圖形應用0米的緩沖區來糾正幾何圖形。
And now we’ll finally get to take a look at our polygons, by using the .plot method, which is actually inherited from thematplotlib components in geopandas .
現在,我們終于可以使用.plot方法來查看多邊形,該方法實際上是從geopandas的matplotlib組件繼承而來的。
This is a fast and useful way to get a quick notion of what our data looks like spatially, but it's not the same as a map.
這是一種快速而有用的方法,可以快速了解我們的數據在空間上看起來是什么樣子,但與地圖不同。
計算面積 (Calculating areas)
Since we want to know the area of each of our land plots in hectares, we need to make sure our geodataframe is in a Projected Coordinate System. If you're not familiar with coordinate reference systems (CRS), the bare minimum you need to know is that Geographic Coordinate Systems operate in degrees (like latitude and longitude) and Projected Coordinate Systems operate in distances, enabling us to calculate areas using the metric system. From the plotted graph we just created we can see that the polygons are probably plotted accordingly to latitude and longitude, somewhere around -15.7°, -47.7°. If we run print(polygons.crs) we'll get epsg:4326 as response. There are many coordinate systems available, so the EPSG system is a very good way to keep track of them; 4326 means our geodataframe is in the WGS84 Coordinate System — a Geographic Coordinate System indeed.
由于我們想知道每公頃土地的面積(以公頃為單位),因此需要確保地理數據框位于“ 投影坐標系”中 。 如果您不熟悉坐標參考系統(CRS),則需要了解的最低限度是地理坐標系以度(例如緯度和經度)運行,而投影坐標系以距離進行操作,這使我們能夠使用公制。 從我們剛剛創建的繪圖圖中,我們可以看到多邊形可能是根據緯度和經度繪制的,大約在-15.7°,-47.7°左右。 如果運行print(polygons.crs) epsg:4326 print(polygons.crs)我們將得到epsg:4326作為響應。 有許多坐標系可用,因此EPSG系統是跟蹤它們的一種很好的方法。 4326表示我們的地理數據框位于WGS84坐標系中—確實是地理坐標系 。
So we do need to transform the coordinate system of our geodataframe. To do that, we must first decide which system we'll transform it into. At this point I'm very used to converting from one system to another, but if you're a bit lost, a good way to choose a projected coordinate system is just using the Universal Transverse of Mercator Projection of the WGS84 system. So all you have to do is find out in which UTM Zone your data is in, so you know that's the zone where your data will have the least area distortion. A good way to do that is this little web application.
因此,我們確實需要變換地理數據框的坐標系。 為此,我們必須首先決定將其轉換為哪個系統。 在這一點上,我已經很習慣從一個系統轉換為另一個系統,但是如果您有點迷茫,選擇投影坐標系的一種好方法就是使用WGS84系統的墨卡托投影的通用橫向 。 因此,您要做的就是找出您的數據位于哪個UTM 區域中,因此您知道這是數據區域失真最小的區域。 這樣做的一個好方法是這個小的Web應用程序 。
We knew our data was somewhere near the [-15.7°, -47.7°] coordinates, so now we know that is equivalent to the 23S UTM Zone, and it's right by the city of Brasília!我們知道我們的數據在[-15.7°,-47.7°]坐標附近,所以現在我們知道這相當于23S UTM區域,就在巴西利亞市附近!So all that's left is visiting the EPSG Authority website and checking the EPSG code for our chosen projection. Next, we need to define a new geodataframe variable using the .to_crs method.
因此,剩下的就是訪問EPSG管理局網站并檢查我們選擇的投影的EPSG代碼。 接下來,我們需要使用.to_crs方法定義一個新的 geodataframe變量。
We can now finally create a new column in that new geodataframe, which I named area_ha and calculate the area (in meters, since we're using a UTM projection) that we'll have to divide by 10,000 for it to be in hectares.
現在,我們終于可以在新的地理數據框中創建一個新列,我將其命名為area_ha并計算必須除以10,000的面積(以米為單位,因為我們使用的是UTM投影)。
This is the head of our new geodataframe. The values that now populate the area_ha field do seem to be in the right scale.這是我們新的地理數據框架的負責人。 現在填充area_ha字段的值似乎在正確的范圍內。檢查覆蓋 (Checking for overlays)
We can now import the second layer we were provided with, the one with all the protected areas (UCs, or Unidades de Conserva??o) that might represent legal restrictions to agricultural use in the land plots we're analysing. We'll import it just like we did with the previous geojson. The main difference this time is that this piece of data actually has a lot more fields, with the name of the protected area as well as the date of creation and more legal stuff.
現在,我們可以導入第二層,即第二層,其中包含所有保護區( UCs或Unidades deConserva??o ),這可能代表了我們正在分析的土地上對農業使用的法律限制。 我們將像導入先前的geojson一樣導入它。 這次的主要區別是該數據實際上具有更多字段,其中包括保護區的名稱,創建日期和更多合法內容。
A much richer dataset, plotted with geopanda's .head method.使用geopanda的.head方法繪制的數據集更加豐富。We're interested in getting all that rich information into the attribute table of the land plots intersecting a given protected area. Geopandas actually makes that very easy to do, with the overlay method. It does literally exactly what we're looking for. In our case, it will create a new geodataframe with one record for each intersecting part of each land plot, with the information about the protected area it intersected. With that we can calculate a new field for the area of intersection in hectares, just like we did before.
我們有興趣將所有豐富的信息納入與給定保護區相交的土地圖的屬性表中。 實際上,使用overlay方法可以使Geopandas變得非常容易。 它從字面上 正是我們要尋找的。 在我們的案例中,它將創建一個新的地理數據框,其中每個土地圖的每個相交部分都有一個記錄,并包含與該保護區相交的信息。 這樣,我們可以像以前一樣為以公頃為單位的交點區域計算一個新字段。
It looks like the same plot from before, but now our land plots have been broken into smaller parts if they intersect any protected areas. Plus they contain a lot more useful information now.它看起來就像以前的地塊,但是現在,如果我們的地塊與任何保護區相交,就會被分成較小的部分。 另外,它們現在包含更多有用的信息。現在為遙感位 (Now for the remote sensing bit)
We're done dealing with the GIS bit of this article, and now things get interesting. We want to get a good sense of how the vegetation is doing in each land plot, so we'll try to get the latest (free) satellite imagery we can get and run some analyses with it. There are many options of free satellite imagery, like the Landsat family, but ESA's Sentinel2 mission provides free imagery in a decent spatial scale (20x20m pixel).
我們已經完成了本文的GIS處理,現在事情變得有趣了。 我們希望對每個土地上的植被狀況有一個很好的了解,因此我們將嘗試獲取最新的(免費)衛星圖像,我們可以獲取并進行一些分析。 有許多免費衛星圖像可供選擇,例如Landsat家族 ,但ESA的Sentinel2任務可在體面的空間比例(20x20m像素)內提供免費圖像。
To access that, there's more than one library we could use. The Sentinel Hub is a solid one, ArcGIS Living Atlas is another one. But my personal choice is Google Earth Engine for two simple reasons: with small changes we can access data from Landsat and Sentinel with the same code, plus it's (still) free to use — but let us not forget what happened with Google Maps API. The real downside here is that the GEE's python API is poorly documented.
要訪問該庫,我們可以使用多個庫。 Sentinel Hub是堅固的,ArcGIS Living Atlas是另一個。 但是我個人選擇Google Earth Engine是出于兩個簡單的原因:只需進行少量更改,我們就可以使用相同的代碼訪問Landsat 和 Sentinel的數據,而且(仍然)可以免費使用它-但我們不要忘記Google Maps API的情況 。 真正的缺點是GEE的python API的文獻資料很少。
What we want to do is fairly simple. We have to call the Sentinel 2 Image Collection from GEE, filter it for area of interest, date and cloud percentage, and get the latest five in the collection. In order to get the area of interest, we have to run one more command with geopandas, the .total_bounds method, which will give us the coordinates for a rectangle that contains the full extent of our land plots.
我們想要做的很簡單。 我們必須從GEE調用Sentinel 2 Image Collection ,對它的關注區域,日期和云百分比進行過濾,并獲取該集合中的最新五個。 為了獲得感興趣的區域,我們必須使用geopandas再運行一個命令,即.total_bounds方法,該命令將為我們提供包含整個土地圖范圍的矩形的坐標。
We have taken those five latest images and stored them in our collectionList variable. But let's have a look to make sure the satellite imagery actually corresponds to our expectations. To do that, let's use Earth Engine's .Image , set it to a RGB composition — in Sentinel 2, that means using the bands 4, 3 and 2 — and plot it using Ipython's .display .
我們已經拍攝了這五個最新圖像,并將它們存儲在我們的collectionList變量中。 但是,讓我們看一下以確保衛星圖像確實符合我們的期望。 為此,讓我們使用Earth Engine的.Image ,將其設置為RGB合成-在Sentinel 2中,這意味著使用波段4、3和2-并使用Ipython的.display 。
Looking good, Brasília.看起來不錯,巴西利亞。計算NDVI (Calculating the NDVI)
This bit is actually pretty simple. GEE makes it very easy to calculate normalized differences, which is precisely what NDVI is:
這一點實際上很簡單。 GEE使得計算歸一化差異非常容易,這正是NDVI的含義:
The normalized difference vegetation index (NDVI) is a simple graphical indicator that assesses whether or not the target being observed contains live green vegetation. Healthy vegetation (chlorophyll) reflects more near-infrared (NIR) and green light compared to other wavelengths. But it absorbs more red and blue light.
標準化差異植被指數 ( NDVI )是一個簡單的圖形指標,用于評估被觀察的目標是否包含活綠色植被 。 與其他波長相比,健康的植被(葉綠素)反射更多的近紅外(NIR)和綠光。 但它吸收更多的紅色和藍色光。
NDVI quantifies vegetation by measuring the difference between near-infrared (which vegetation strongly reflects) and red light (which vegetation absorbs). Satellite sensors like Landsat and Sentinel-2 both have the necessary bands with NIR and red.
NDVI通過測量近紅外 (植被強烈反射)和紅光 (植被吸收)之間的差異來量化植被。 諸如Landsat和Sentinel-2之類的衛星傳感器都有必不可少的NIR和紅色波段。
The NDVI formula, or the normalized difference between near-infrared and red light.NDVI公式,或近紅外光和紅光之間的歸一化差異。In GEE, we just need to use the .normalizedDifference method with the right bands. In our case, it's the bands 8 (NIR) and 4 (Red). Normally, that would be it, but I'm an old-fashioned GIS geek, so I need to have a look at my data in a map to make sure it went ok.
在GEE中,我們只需要在正確的頻段上使用.normalizedDifference方法。 在我們的例子中,是第8波段(NIR)和第4波段(紅色)。 通常,就是這樣,但是我是一個老式的GIS怪胎,因此我需要查看地圖中的數據以確保一切正常。
For that, I'll create a simple Folium map — a way to render Leaflet maps with python — and load both the NDVI raster and the land plots geojson to it.
為此,我將創建一個簡單的Folium貼圖-一種使用python渲染Leaflet貼圖的方法,然后將NDVI柵格和地形圖geojson加載到其中。
區域統計很有趣 (Zonal Statistics is fun)
We're almost through. What we need to do now is resume the NDVI data we just calculated for the five Sentinel 2 images in each land plot. Folks in the remote sensing have been doing this for many years, with something called Zonal Statistics. This tool is present in many GIS Desktop softwares, and it's possible even in GEE python API — again, poorly documented. For simplicity, ease of access and liability, my preferred tool for this is actually rasterstats.
我們快完成了。 現在我們需要做的是恢復剛剛為每個土地圖中的五個Sentinel 2圖像計算的NDVI數據。 遙感領域的人們已經做了很多年了,這就是所謂的Zonal Statistics 。 該工具存在于許多GIS桌面軟件中,并且即使在GEE python API中也有可能-同樣,文獻記錄不多。 為了簡化,易于訪問和承擔責任,我為此首選的工具實際上是rasterstats 。
The problem is, so far we've been dealing with GEE own format for dealing with imagery. And rasterstats will be needing those oldschool .TIF files. So we gotta export our imagery to a local folder.
問題是,到目前為止,我們一直在處理GEE自己的格式以處理圖像。 光柵統計儀將需要那些老式的.TIF文件。 因此,我們必須將圖像導出到本地文件夾。
Finally, this is where the magic happens. Let us have a little for loop that opens each local NDVI raster file, calculates the stats for each land plot, and then append the results for mean, minimum, maximum and standard deviation to a pandas' dataframe.
最后,這就是魔術發生的地方。 讓我們有一個for循環,用于打開每個本地NDVI柵格文件,計算每個土地圖的統計信息,然后將平均值,最小,最大和標準偏差的結果附加到熊貓的數據框中。
We have created a field for each stat for each image, so we'll end up with five fields for each type of statistics requested (min, max, mean and std. dev.)我們為每個圖像的每個統計信息創建了一個字段,因此對于每種請求的統計信息類型(最小值,最大值,均值和標準差)最終都有五個字段。We can now visualize the NDVI statistics in each of our land plots by plotting our data, just like before, except this time we can tell matplotlib which column and which color scheme to use. Since the results seem ok, we can export the geodataframe to a geojson file and share with our not-so-tech colleagues.
現在,我們可以像以前一樣通過繪制數據來可視化每個地塊中的NDVI統計數據,除了這次我們可以告訴matplotlib使用哪個列和哪個配色方案。 由于結果看起來還可以,因此我們可以將地理數據框導出到geojson文件,并與我們的非高科技同事共享。
This is the visualization of the mean NDVI attribute for our most recent Sentinel 2 image.這是我們最新的Sentinel 2圖像的平均NDVI屬性的可視化。那是所有人 (That's all folks)
That's it for this piece! We have successfully implemented classic geoprocessing and remote sensing workflows without having to use ArcGIS or QGIS once! You can find this whole workflow in my Github page. If you have questions or suggestions, don't hesitate to drop me a line anytime!
這就是這個! 我們已經成功實現了經典的地理處理和遙感工作流程,而無需一次使用ArcGIS或QGIS! 您可以在我的Github頁面上找到整個工作流程。 如果您有任何疑問或建議,請隨時聯系我!
翻譯自: https://towardsdatascience.com/automating-gis-and-remote-sensing-workflows-with-open-python-libraries-e71dd6b049ee
python gis庫
總結
以上是生活随笔為你收集整理的python gis库_使用开放的python库自动化GIS和遥感工作流的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 秦朝霸业手游攻略怎么玩(秦朝历代皇帝列表
- 下一篇: 三星Galaxy Book 3系列渲染图