當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

xml.etree ElementTree简介

發(fā)布時(shí)間：2024/9/19 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 xml.etree ElementTree简介小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

xml.etree ElementTree介紹

ET簡(jiǎn)介

ET有重要的兩個(gè)類，一個(gè)是ElementTree，另一個(gè)是Element.

ET使用

假設(shè)有xml文件內(nèi)容如下：

<?xml version="1.0"?> <data><country name="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc><neighbor name="Austria" direction="E"/><neighbor name="Switzerland" direction="W"/></country><country name="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighbor name="Malaysia" direction="N"/></country><country name="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighbor name="Costa Rica" direction="W"/><neighbor name="Colombia" direction="E"/></country> </data>

解析xml:

導(dǎo)入數(shù)據(jù)或者文件：parse(source, parser=None)

獲取xml根：tree.getroot()

解析字符串并獲取根：ET.fromstring(country_data_as_string)

parse(source, parser=None) #source:xml文件 #parser:選擇解釋器實(shí)例，默認(rèn)XMLParser#讀入文件 tree = ET.parse('country_data.xml') root = tree.getroot() #獲取根#解析字符串獲取根 root = ET.fromstring(country_data_as_string)

root的屬性

root有一個(gè)標(biāo)記和一個(gè)屬性字典。

>>> root.tag 'data' #這個(gè)是標(biāo)簽名 >>> root.attrib {} #這個(gè)是標(biāo)簽屬性

遍歷節(jié)點(diǎn)孩子的標(biāo)簽名和屬性

>>> for child in root: ... print(child.tag, child.attrib) ... country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'}

**通過(guò)下標(biāo)鎖定元素

root是根節(jié)點(diǎn)，root[0]是根節(jié)點(diǎn)下的第一個(gè)元素,root[0] [1]是根節(jié)點(diǎn)的第一個(gè)節(jié)點(diǎn)的第二個(gè)元素

>>> root[0][1].text '2008'

獲取子元素迭代器

通過(guò)Element.iter()可以獲取到某元素的子元素的迭代器，如：root.iter(‘neighbor’)獲取根元素下的名為neighbor的元素的迭代器

>>> for neighbor in root.iter('neighbor'): ... print(neighbor.attrib) ... {'name': 'Austria', 'direction': 'E'} {'name': 'Switzerland', 'direction': 'W'} {'name': 'Malaysia', 'direction': 'N'} {'name': 'Costa Rica', 'direction': 'W'} {'name': 'Colombia', 'direction': 'E'}

查找節(jié)點(diǎn)（元素）

Element.findall()只查找?guī)в袠?biāo)記的節(jié)點(diǎn)，這些元素是當(dāng)前節(jié)點(diǎn)的直接子節(jié)點(diǎn)。

Element.text訪問(wèn)節(jié)點(diǎn)的文本內(nèi)容

Element.find()查找第一個(gè)指定便簽的子節(jié)點(diǎn)

Element.get()獲取節(jié)點(diǎn)屬性

>>> for country in root.findall('country'): ... rank = country.find('rank').text ... name = country.get('name') ... print(name, rank) ... Liechtenstein 1 Singapore 4 Panama 68

更改節(jié)點(diǎn)信息

ElementTree.write()寫入并創(chuàng)建xml文件

Element.set()給節(jié)點(diǎn)設(shè)置屬性

Element.append()給節(jié)點(diǎn)添加新的子節(jié)點(diǎn)

>>> for rank in root.iter('rank'): ... new_rank = int(rank.text) + 1 ... rank.text = str(new_rank) ... rank.set('updated', 'yes') ... >>> tree.write('output.xml')

移除節(jié)點(diǎn)

Element.remove()能移除整個(gè)節(jié)點(diǎn)，包括它的子節(jié)點(diǎn)。

>>> for country in root.findall('country'): ... rank = int(country.find('rank').text) ... if rank > 50: ... root.remove(country) ... >>> tree.write('output.xml')

XML文件將變成：（這里刪除了一個(gè)節(jié)點(diǎn)，包括它的子節(jié)點(diǎn)）

<?xml version="1.0"?> <data><country name="Liechtenstein"><rank updated="yes">2</rank><year>2008</year><gdppc>141100</gdppc><neighbor name="Austria" direction="E"/><neighbor name="Switzerland" direction="W"/></country><country name="Singapore"><rank updated="yes">5</rank><year>2011</year><gdppc>59900</gdppc><neighbor name="Malaysia" direction="N"/></country> </data>

構(gòu)建XML文件

ET.Element()創(chuàng)建一個(gè)節(jié)點(diǎn)

ET.SubElement()創(chuàng)建子節(jié)點(diǎn)

>>> a = ET.Element('a') >>> b = ET.SubElement(a, 'b') >>> c = ET.SubElement(a, 'c') >>> d = ET.SubElement(c, 'd') >>> ET.dump(a) <a><b /><c><d /></c></a>

xpath表達(dá)式

? ET支持xpath表達(dá)式，xpath是用來(lái)分析XML/HTML文件或數(shù)據(jù)的。我們可以通過(guò)xpath表達(dá)式來(lái)快速得鎖定HTML/XML的節(jié)點(diǎn)或元素。在HTML中標(biāo)簽通常稱為節(jié)點(diǎn)。（有一點(diǎn)值得注意的是，在ET中有部分xpath表達(dá)式不能直接使用）

? 這里就簡(jiǎn)單介紹一下xpath語(yǔ)法。

xpath語(yǔ)法

簡(jiǎn)單的語(yǔ)法

/ #根節(jié)點(diǎn)或節(jié)點(diǎn)間的過(guò)渡 // #跨越節(jié)點(diǎn)獲取節(jié)點(diǎn) . #選取當(dāng)前節(jié)點(diǎn) .. #當(dāng)前節(jié)點(diǎn)的父節(jié)點(diǎn)（在ET中不能通過(guò)子節(jié)點(diǎn)直接獲取父節(jié)點(diǎn)） @ #選取屬性 text() #選取文本（在ET中不能使用） [index] #根據(jù)index獲取第index個(gè)標(biāo)簽（index從1開始） [@Classname] #根據(jù)屬性名獲取標(biāo)簽 contains(p,content) #模糊查詢（p定位，content匹配字段）

xpath有點(diǎn)像文件路徑的寫法，很容易學(xué)習(xí)，下面的簡(jiǎn)單例子能幫助理解。

簡(jiǎn)單的例子

#根節(jié)點(diǎn)下的html的標(biāo)簽中的body標(biāo)簽中的div標(biāo)簽中的div標(biāo)簽 /html/body/div/div #獲取所以a標(biāo)簽 //a#獲取html的標(biāo)簽中的body標(biāo)簽中的所有a標(biāo)簽（可以跨越節(jié)點(diǎn)，不必一定是子節(jié)點(diǎn)） /html/body//a#在當(dāng)前節(jié)點(diǎn)下的a標(biāo)簽 ./a#當(dāng)前節(jié)點(diǎn)的父標(biāo)簽中的a標(biāo)簽 ../a#根節(jié)點(diǎn)下的html的標(biāo)簽中的body標(biāo)簽中的a標(biāo)簽中的class屬性 /html/body/a/@class#li便簽下的第3個(gè)li標(biāo)簽 //li[3]#通過(guò)屬性定位 //a[@href=""]#html的標(biāo)簽中的body標(biāo)簽中的a標(biāo)簽中的文本內(nèi)容 /html/body/a/text()#獲取a標(biāo)簽，并且屬性name中有字段"myname" //a[contains(@name,"myname")]

其他xpath博客連接： https://blog.csdn.net/qq_43203949/article/details/108203340.

在ET上使用xpath

如果學(xué)過(guò)parsel模塊包的，把findall()當(dāng)成extract()或者getall()方法就可以了。

import xml.etree.ElementTree as ETroot = ET.fromstring(countrydata)# Top-level elements（當(dāng)前節(jié)點(diǎn)就是根節(jié)點(diǎn)） root.findall(".")# All 'neighbor' grand-children of 'country' children of the top-level # elements root.findall("./country/neighbor")# Nodes with name='Singapore' that have a 'year' child root.findall(".//year/..[@name='Singapore']")# 'year' nodes that are children of nodes with name='Singapore' root.findall(".//*[@name='Singapore']/year")# All 'neighbor' nodes that are the second child of their parent root.findall(".//neighbor[2]")

總結(jié)

以上是生活随笔為你收集整理的xml.etree ElementTree简介的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： OPPO 与哈苏联合打造 2024 新一
下一篇：使用gluoncv.model_zoo进