生活随笔
收集整理的這篇文章主要介紹了
TensorFlow 对数据集标记的xml文件解析记录
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
環境
- Windows:10
- Python 3.7.10
- TensorFlow:2.3
- matplotlib:3.3.4
- lxml:4.7.1
最近要用TensorFlow做20種水果識別,對剛入手的數據集,開始對數據集進行檢驗。
原圖如下:
以下是通過精靈標注助手生成的xml 文件
<?xml version="1.0" ?>
<annotation>
<folder>菠蘿
</folder>
<filename>pineapple.jpg
</filename>
<path>C:\Users\Desktop\pineapple.jpg
</path>
<source><database>Unknown
</database>
</source>
<size><width>730
</width><height>413
</height><depth>3
</depth>
</size><segmented>0
</segmented><object><name>菠蘿
</name><pose>Unspecified
</pose><truncated>0
</truncated><difficult>0
</difficult><bndbox><xmin>125
</xmin><ymin>112
</ymin><xmax>543
</xmax><ymax>400
</ymax></bndbox>
</object><object><name>菠蘿
</name><pose>Unspecified
</pose><truncated>0
</truncated><difficult>0
</difficult><bndbox><xmin>547
</xmin><ymin>97
</ymin><xmax>721
</xmax><ymax>390
</ymax></bndbox>
</object>
</annotation>
安裝 matplotlib
pip
install matplotlib
安裝 lxml
pip
install lxml
通過以下代碼將xml中繪畫的矩形框顯示到圖片中。
import tensorflow
as tf
import matplotlib
.pyplot
as plt
from lxml
import etree
from matplotlib
.patches
import Rectangle img
= tf
.io
.read_file
(r'./pineapple.jpg')img
= tf
.image
.decode_jpeg
(img
)
print(img
.shape
)
plt
.imshow
(img
)
plt
.show
()xml
= open(r'./pineapple.xml', encoding
='utf-8').read
()
sel
= etree
.HTML
(xml
)
width
= sel
.xpath
('//size/width/text()')[0]
height
= sel
.xpath
('//size/height/text()')[0]
bndbox
= sel
.xpath
('//bndbox')
ax
= plt
.gca
()
for i
in range(0, len(bndbox
)):xmin
= sel
.xpath
('//bndbox/xmin/text()')[i
]ymin
= sel
.xpath
('//bndbox/ymin/text()')[i
]xmax
= sel
.xpath
('//bndbox/xmax/text()')[i
]ymax
= sel
.xpath
('//bndbox/ymax/text()')[i
]xmin
= int(xmin
)ymin
= int(ymin
)xmax
= int(xmax
)ymax
= int(ymax
)plt
.imshow
(img
.numpy
())rect
= Rectangle
((xmin
, ymin
), (xmax
- xmin
), (ymax
- ymin
), fill
=False, color
='red') ax
.axes
.add_patch
(rect
)
plt
.show
()
還原出入手的數據集用精靈標注助手標記的效果如下:
由于發現數據集中有多邊形和矩形框數據混合,所以通過以下代碼區分開來
以上xml文件一個一個的點開查看比較麻煩,用以下代碼進行處理查看:
import os
try:import xml
.etree
.cElementTree
as ET
except ImportError
:import xml
.etree
.ElementTree
as ET
txt_path
= 'C:\\Users\\vvcat\\Desktop\\xxx\\xxxxx\\outputs2\\'for txt_file
in os
.listdir
(txt_path
):txt_name
= os
.path
.splitext
(txt_file
)[0] txt_suffix
= os
.path
.splitext
(txt_file
)[1] file_name_path
= txt_path
+ txt_name
+ txt_suffixroot
= ET
.parse
(file_name_path
)bndboxs
= root
.getiterator
("bndbox")if bndboxs
== []:print(txt_name
+ txt_suffix
)
效果如下:
打開A(1).xml文件,內容如下:
通過以下代碼批量將xml中繪畫的矩形框顯示到圖片中,并保存成新的圖片。
import tensorflow
as tf
import matplotlib
.pyplot
as plt
from lxml
import etree
from matplotlib
.patches
import Rectangle
import glob
import osimages
= glob
.glob
('./inputs/*.jpg')
xmls
= glob
.glob
('./outputs/*.xml')
xmls_names
= [x
.split
('\\')[-1].split
('.xml')[0] for x
in xmls
]
images_names
= [x
.split
('\\')[-1].split
('.jpg')[0] for x
in images
]
names
= list(set(images_names
) & set(xmls_names
))
imgs
= [img
for img
in images
if img
.split
('\\')[-1].split
('.jpg')[0] in names
]
imgs
.sort
(key
=lambda x
: x
.split
('\\')[-1].split
('.jpg')[0])
xmls
.sort
(key
=lambda x
: x
.split
('\\')[-1].split
('.xml')[0])dstfile
= './output_image/'
fpath
= os
.path
.dirname
(dstfile
)
if not os
.path
.exists
(fpath
):os
.makedirs
(fpath
)
images_names
= ''for i
in range(0, len(xmls
)):img
= tf
.io
.read_file
(imgs
[i
])img
= tf
.image
.decode_jpeg
(img
) xml
= open(xmls
[i
], encoding
='utf-8').read
() sel
= etree
.HTML
(xml
) width
= sel
.xpath
('//size/width/text()')[0] height
= sel
.xpath
('//size/height/text()')[0] bndbox
= sel
.xpath
('//bndbox')ax
= plt
.gca
() for j
in range(0, len(bndbox
)):xmin
= sel
.xpath
('//bndbox/xmin/text()')[j
]ymin
= sel
.xpath
('//bndbox/ymin/text()')[j
]xmax
= sel
.xpath
('//bndbox/xmax/text()')[j
]ymax
= sel
.xpath
('//bndbox/ymax/text()')[j
]xmin
= int(xmin
)ymin
= int(ymin
)xmax
= int(xmax
)ymax
= int(ymax
)plt
.imshow
(img
.numpy
())rect
= Rectangle
((xmin
, ymin
), (xmax
- xmin
), (ymax
- ymin
), fill
=False, color
='red') ax
.axes
.add_patch
(rect
) images_names
= imgs
[i
].split
('\\')[-1]plt
.savefig
(dstfile
+ images_names
)plt
.close
()
總結
以上是生活随笔為你收集整理的TensorFlow 对数据集标记的xml文件解析记录的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。