【GDAL】GDAL栅格数据结构学习笔记(一): 关于Metadata
在維護一段代碼時看到前任程序員寫的獲取柵格數(shù)據(jù)的CellSize的功能,竟然在知道GDAL的情況下去調(diào)用AE的接口來解算,覺得費解。
原來的思路是使用AE的Raster對象讀取出Raster的文件大小和真實投影坐標對構(gòu)造的矩形外框,再來算每個cell的長寬,覺得實在無語。
于是研究了下GDAL怎么獲取到一些數(shù)據(jù)基本信息(Metadata)的。
搬運一下GDAL官方對其數(shù)據(jù)模型的Metadata的描述:
GDAL metadata is auxiliary format and application specific textual data kept as a list of name/value pairs. The names are required to be well behaved tokens (no spaces, or odd characters). The values can be of any length, and contain anything except an embedded null (ASCII zero).
The metadata handling system is not well tuned to handling very large bodies of metadata. Handling of more than 100K of metadata for a dataset is likely to lead to performance degradation.
Metadata is split into named groups called domains, with the default domain having no name (NULL or ""). Some specific domains exist for special purposes. Note that currently there is no way to enumerate all the domains available for a given object, but applications can "test" for any domains they know how to interpret.
這里需要注意一下描述中我標紅的部分,意思是GDAL本身確實提供了非常豐富的Metadata,但是不是所有數(shù)據(jù)都有這些內(nèi)容。另外第一段標紅的部分里講,GDAL在存儲較大的metadata時(100k以上)會有一定性能問題。當然100k已經(jīng)很多內(nèi)容了。
這里的metadata由api中的 Dataset.GetMetadata(String) 方法獲取,得到的是一個字符串數(shù)組對象。按照純粹的理論而言,這個數(shù)組里的內(nèi)容是非常豐富的,包括以下部分(復(fù)制自GDAL官網(wǎng)):
The following metadata items have well defined semantics in the default domain:
- AREA_OR_POINT: May be either "Area" (the default) or "Point". Indicates whether a pixel value should be assumed to represent a sampling over the region of the pixel or a point sample at the center of the pixel. This is not intended to influence interpretation of georeferencing which remains area oriented.
- NODATA_VALUES: The value is a list of space separated pixel values matching the number of bands in the dataset that can be collectively used to identify pixels that are nodata in the dataset. With this style of nodata a pixel is considered nodata in all bands if and only if all bands match the corresponding value in the NODATA_VALUES tuple. This metadata is not widely honoured by GDAL drivers, algorithms or utilities at this time.
- MATRIX_REPRESENTATION: This value, used for Polarimetric SAR datasets, contains the matrix representation that this data is provided in. The following are acceptable values:
- SCATTERING
- SYMMETRIZED_SCATTERING
- COVARIANCE
- SYMMETRIZED_COVARIANCE
- COHERENCY
- SYMMETRIZED_COHERENCY
- KENNAUGH
- SYMMETRIZED_KENNAUGH
- POLARIMETRIC_INTERP: This metadata item is defined for Raster Bands for polarimetric SAR data. This indicates which entry in the specified matrix representation of the data this band represents. For a dataset provided as a scattering matrix, for example, acceptable values for this metadata item are HH, HV, VH, VV. When the dataset is a covariance matrix, for example, this metadata item will be one of Covariance_11, Covariance_22, Covariance_33, Covariance_12, Covariance_13, Covariance_23 (since the matrix itself is a hermitian matrix, that is all the data that is required to describe the matrix).
- METADATATYPE: If IMAGERY Domain present, the item consist the reader which processed the metadata. Now present such readers:
- DG: DigitalGlobe imagery metadata
- GE: GeoEye (or formally SpaceImaging) imagery metadata
- OV: OrbView imagery metadata
- DIMAP: Pleiades imagery metadata
- MSP: Resurs DK-1 imagery metadata
- ODL: Landsat imagery metadata
SUBDATASETS Domain
The SUBDATASETS domain holds a list of child datasets. Normally this is used to provide pointers to a list of images stored within a single multi image file.
For example, an NITF with two images might have the following subdataset list.
SUBDATASET_1_NAME=NITF_IM:0:multi_1b.ntf SUBDATASET_1_DESC=Image 1 of multi_1b.ntf SUBDATASET_2_NAME=NITF_IM:1:multi_1b.ntf SUBDATASET_2_DESC=Image 2 of multi_1b.ntf
The value of the _NAME is the string that can be passed to GDALOpen() to access the file. The _DESC value is intended to be a more user friendly string that can be displayed to the user in a selector.
Drivers which support subdatasets advertize the DMD_SUBDATASETS capability. This information is reported when the --format and --formats options are passed to the command line utilities.
Currently, drivers which support subdatasets are: ADRG, ECRGTOC, GEORASTER, GTiff, HDF4, HDF5, netCDF, NITF, NTv2, OGDI, PDF, PostGISRaster, Rasterlite, RPFTOC, RS2, WCS, and WMS.
IMAGE_STRUCTURE Domain
Metadata in the default domain is intended to be related to the image, and not particularly related to the way the image is stored on disk. That is, it is suitable for copying with the dataset when it is copied to a new format. Some information of interest is closely tied to a particular file format and storage mechanism. In order to prevent this getting copied along with datasets it is placed in a special domain called IMAGE_STRUCTURE that should not normally be copied to new formats.
Currently the following items are defined by RFC 14 as having specific semantics in the IMAGE_STRUCTURE domain.
- COMPRESSION: The compression type used for this dataset or band. There is no fixed catalog of compression type names, but where a given format includes a COMPRESSION creation option, the same list of values should be used here as there.
- NBITS: The actual number of bits used for this band, or the bands of this dataset. Normally only present when the number of bits is non-standard for the datatype, such as when a 1 bit TIFF is represented through GDAL as GDT_Byte.
- INTERLEAVE: This only applies on datasets, and the value should be one of PIXEL, LINE or BAND. It can be used as a data access hint.
- PIXELTYPE: This may appear on a GDT_Byte band (or the corresponding dataset) and have the value SIGNEDBYTE to indicate the unsigned byte values between 128 and 255 should be interpreted as being values between -128 and -1 for applications that recognise the SIGNEDBYTE type.
RPC Domain
The RPC metadata domain holds metadata describing the Rational Polynomial Coefficient geometry model for the image if present. This geometry model can be used to transform between pixel/line and georeferenced locations. The items defining the model are:
- ERR_BIAS: Error - Bias. The RMS bias error in meters per horizontal axis of all points in the image (-1.0 if unknown)
- ERR_RAND: Error - Random. RMS random error in meters per horizontal axis of each point in the image (-1.0 if unknown)
- LINE_OFF: Line Offset
- SAMP_OFF: Sample Offset
- LAT_OFF: Geodetic Latitude Offset
- LONG_OFF: Geodetic Longitude Offset
- HEIGHT_OFF: Geodetic Height Offset
- LINE_SCALE: Line Scale
- SAMP_SCALE: Sample Scale
- LAT_SCALE: Geodetic Latitude Scale
- LONG_SCALE: Geodetic Longitude Scale
- HEIGHT_SCALE: Geodetic Height Scale
- LINE_NUM_COEFF (1-20): Line Numerator Coefficients. Twenty coefficients for the polynomial in the Numerator of the rn equation. (space separated)
- LINE_DEN_COEFF (1-20): Line Denominator Coefficients. Twenty coefficients for the polynomial in the Denominator of the rn equation. (space separated)
- SAMP_NUM_COEFF (1-20): Sample Numerator Coefficients. Twenty coefficients for the polynomial in the Numerator of the cn equation. (space separated)
- SAMP_DEN_COEFF (1-20): Sample Denominator Coefficients. Twenty coefficients for the polynomial in the Denominator of the cn equation. (space separated)
These fields are directly derived from the document prospective GeoTIFF RPC document (http://geotiff.maptools.org/rpc_prop.html) which in turn is closely modeled on the NITF RPC00B definition.
The line and pixel offset expressed with LINE_OFF and SAMP_OFF are with respect to the center of the pixel.
IMAGERY Domain (remote sensing)
For satellite or aerial imagery the IMAGERY Domain may be present. It depends on exist special metadata files near the image file. The files at the same directory with image file tested by the set of metadata readers, if files can be processed by the metadata reader, it fill the IMAGERY Domain with the following items:
- SATELLITEID: A satellite or scanner name
- CLOUDCOVER: Cloud coverage. The value between 0 - 100 or 999 if not available
- ACQUISITIONDATETIME: The image acquisition date time in UTC
xml: Domains
Any domain name prefixed with "xml:" is not normal name/value metadata. It is a single XML document stored in one big string.
因為官方的說明只給了這些內(nèi)容,所以我們大致能知道的是,在GDAL的數(shù)據(jù)模型中,Metadata是以數(shù)組形式存儲鍵值對(Key Value Pairs),形式為【key=value】,且按照不同的Domains來進行分組。
實際上, Dataset.GetMetadata(String)所要求的參數(shù)是Domain的名稱,而Default Domain則可以以【NULL】或者【“”】進行賦值。
喜聞樂見的是,好像這些信息非常的有價值。
然而悲劇的是,如官方所述,不是所有的數(shù)據(jù)都完整包含每個鍵值對。
在我的本地測試里使用的一景Landsat8數(shù)據(jù)就無法獲取到除 【AREA_OR_POINT】以外的Metadata。
所以在本地的代碼里目前的寫法是使用Linq+三目判定,由于項目暫且僅使用Landsat8數(shù)據(jù)(除15m全色以外的波段),故可以直接為無法從Metadata中得到cellSize值時按30m計算。
若是需要做類似的功能,個人建議是使用直接數(shù)值計算影像左上角點和右下角點兩個Point的投影坐標值后,按得到的距離除以數(shù)據(jù)的長寬作為CellSize。當然,需要自己考慮是否忽略小數(shù)或精確多少位。不使用AE提供的接口是因為創(chuàng)建的COM對象釋放困難,且AE的接口一般存在較大開銷,在低端功能上能省則省吧。
當然這里還有一些疑問,如柵格數(shù)據(jù)的CellSize的單位是m還是啥怎么判斷,純數(shù)學(xué)方式判定會有多大誤差等等。
至于如何保證數(shù)據(jù)有metadata,也需要在進一步研究后回來更新本篇。
?
PS:以上涉及代碼的部分均為C#環(huán)境下.
感謝閱讀。
轉(zhuǎn)載于:https://www.cnblogs.com/DannielZhang/p/5167814.html
與50位技術(shù)專家面對面20年技術(shù)見證,附贈技術(shù)全景圖總結(jié)
以上是生活随笔為你收集整理的【GDAL】GDAL栅格数据结构学习笔记(一): 关于Metadata的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 今天学了瀑布流的js方法
- 下一篇: 将hadoop源代码导入eclipse