當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

5. 文件格式

發(fā)布時間：2025/6/15 编程问答 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 5. 文件格式小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

PythonStandardLib/chpt5

登錄

首頁
CPUG
OpenBookProject
行者堂
更新
搜索
幫助
PythonStandardLib/chpt5

只讀網(wǎng)頁
信息
附件
? 更多操作：源碼?打印視圖?刪除緩存?------------------------?拼寫檢查?相似網(wǎng)頁?本站地圖?------------------------?改名?刪除?------------------------?訂閱?------------------------?刪除垃圾廣告?恢復(fù)成此版本?網(wǎng)頁打包?同步網(wǎng)頁?------------------------?加載?保存?SlideShow? ? ?

Python Standard Library

翻譯: Python 江湖群

2008-03-28 13:11:53

1. 文件格式

1.1. 概覽

1.1.1. Markup 語言

1.1.2. 配置文件

1.1.3. 壓縮檔案格式

1.2. xmllib 模塊

1.2.0.1. Example 5-1. 使用 xmllib 模塊獲取元素的信息

1.2.0.2. Example 5-2. 使用 xmllib 模塊

1.3. xml.parsers.expat 模塊

1.3.0.1. Example 5-3. 使用 xml.parsers.expat 模塊

1.3.0.2. Example 5-4. 使用 xml.parsers.expat 模塊讀取 ISO Latin-1 文本

1.4. sgmllib 模塊

1.4.0.1. Example 5-5. 使用 sgmllib 模塊提取 Title 元素

1.4.0.2. Example 5-6. 使用 sgmllib 模塊格式化 SGML 文檔

1.4.0.3. Example 5-7. 使用 sgmllib 模塊檢查格式

1.4.0.4. Example 5-8. 使用 sgmllib 模塊過濾 SGML 文檔

1.5. htmllib 模塊

1.5.0.1. Example 5-9. 使用 htmllib 模塊

1.6. htmlentitydefs 模塊

1.6.0.1. Example 5-10. 使用 htmlentitydefs 模塊

1.6.0.2. Example 5-11. 使用 htmlentitydefs 模塊翻譯實(shí)體

1.6.0.3. Example 5-12. 轉(zhuǎn)義 ISO Latin-1 實(shí)體

1.7. formatter 模塊

1.7.0.1. Example 5-13. 使用 formatter 模塊將 HTML 轉(zhuǎn)換為事件流

1.7.0.2. Example 5-14. 使用 formatter 模塊將 HTML 轉(zhuǎn)換為純文本

1.7.0.3. Example 5-15. 使用 formatter 模塊自定義 Writer

1.8. ConfigParser 模塊

1.8.0.1. Example 5-16. 使用 ConfigParser 模塊

1.8.0.2. Example 5-17. 使用 ConfigParser 模塊寫入配置數(shù)據(jù)

1.9. netrc 模塊

1.9.0.1. Example 5-18. 使用 netrc 模塊

1.10. shlex 模塊

1.10.0.1. Example 5-19. 使用 shlex 模塊

1.11. zipfile 模塊

1.11.1. 列出內(nèi)容

1.11.1.1. Example 5-20. 使用 zipfile 模塊列出 ZIP 文檔中的文件

1.11.2. 從 ZIP 文件中讀取數(shù)據(jù)

1.11.2.1. Example 5-21. 使用 zipfile 模塊從 ZIP 文件中讀取數(shù)據(jù)

1.11.3. 向 ZIP 文件寫入數(shù)據(jù)

1.11.3.1. Example 5-22. 使用 zipfile 模塊將文件儲存在 ZIP 文件里

1.11.3.2. Example 5-23. 使用 zipfile 模塊在 ZIP 文件中儲存字符串

1.12. gzip 模塊

1.12.0.1. Example 5-24. 使用 gzip 模塊讀取壓縮文件

1.12.0.2. Example 5-25. 給 gzip 模塊添加 seek/tell 支持

[index.html 返回首頁]

1. 文件格式

1.1. 概覽

本章將描述用于處理不同文件格式的模塊.

1.1.1. Markup 語言

Python 提供了一些用于處理可擴(kuò)展標(biāo)記語言( Extensible Markup Language , XML )?和超文本標(biāo)記語言( Hypertext Markup Language , HTML )的擴(kuò)展. Python 同樣提供了對?標(biāo)準(zhǔn)通用標(biāo)記語言( Standard Generalized Markup Language , SGML )的支持.

所有這些格式都有著相同的結(jié)構(gòu), 因?yàn)?HTML 和 XML 都來自 SGML . 每個文檔都是由?起始標(biāo)簽( start tags ), 結(jié)束標(biāo)簽( end tags ), 文本(又叫字符數(shù)據(jù)),?以及實(shí)體引用( entity references )構(gòu)成:

<document name="sample.xml"><header>This is a header</header><body>This is the body text. The text can containplain text ("character data"), tags, andentities.</body> </document>

在這個例子中,?<document>,?<header>, 以及?<body>?是起始標(biāo)簽.?每個起始標(biāo)簽都有一個對應(yīng)的結(jié)束標(biāo)簽, 使用斜線 "/" 標(biāo)記. 起始標(biāo)簽可以包含多個屬性,?比如這里的?name?屬性.

起始標(biāo)簽和它對應(yīng)的結(jié)束標(biāo)簽中的任何東西被稱為?元素( element ).?這里?document?元素包含?header?和?body?兩個元素.

"?是一個字符實(shí)體( character entity ).?字符實(shí)體用于在文本區(qū)域中表示特殊的保留字符, 使用?&?指示.?這里它代表一個引號, 常見字符實(shí)體還有 "<?(?<?)" 和 "?>?(?>?)" .

雖然 XML , HTML , SGML 使用相同的結(jié)構(gòu)塊, 但它們還有一些不同點(diǎn).?在 XML 中, 所有元素必須有起始和結(jié)束標(biāo)簽, 所有標(biāo)簽必須正確嵌套( well-formed ).?而且 XML 是區(qū)分大小寫的, 所以?<document>?和?<Document>?是不同的元素類型.

HTML 有很高靈活性, HTML 語法分析器一般會自動補(bǔ)全缺失標(biāo)簽; 例如,?當(dāng)遇到一個以?<P>?標(biāo)簽開始的新段落, 卻沒有對應(yīng)結(jié)束標(biāo)簽, 語法分析器會自動添加一個?</P>標(biāo)簽. HTML 也是區(qū)分大小寫的. 另一方面, XML 允許你定義任何元素,?而 HTML 使用一些由 HTML 規(guī)范定義的固定元素.

SGML 有著更高的靈活性, 你可以使用自己的聲明( declaration )?定義源文件如何轉(zhuǎn)換到元素結(jié)構(gòu), DTD ( document type description , 文件類型定義)可以用來?檢查結(jié)構(gòu)并補(bǔ)全缺失標(biāo)簽. 技術(shù)上來說, HTML 和 XML 都是 SGML 應(yīng)用,?有各自的 SGML 聲明, 而且 HTML 有一個標(biāo)準(zhǔn) DTD .

Python 提供了多個 makeup 語言分析器. 由于 SGML 是最靈活的格式, Python 的?sgmllib?事實(shí)上很簡單. 它不會去處理 DTD , 不過你可以繼承它來提供更復(fù)雜的功能.

Python 的 HTML 支持基于 SGML 分析器.?htmllib?將具體的格式輸出工作交給 formatter 對象.?formatter?模塊包含一些標(biāo)準(zhǔn)格式化標(biāo)志.

Python 的 XML 支持模塊很復(fù)雜. 先前是只有與?sgmllib?類似的?xmllib?, 后來加入了更高級的

expat?模塊(可選). 而最新版本中已經(jīng)準(zhǔn)備廢棄?xmllib?,啟用?xml?包作為工具集.

1.1.2. 配置文件

ConfigParser?模塊用于讀取簡單的配置文件, 類似 Windows 下的 INI 文件.

netrc?模塊用于讀取 .netrc 配置文件, shlex 模塊用于讀取類似 shell 腳本語法的配置文件.

1.1.3. 壓縮檔案格式

Python 的標(biāo)準(zhǔn)庫提供了對 GZIP 和 ZIP ( 2.0 及以后) 格式的支持.?基于 zlib 模塊,?gzip?和?zipfile?模塊分別用來處理這類文件.

1.2. xmllib 模塊

xmllib?已在當(dāng)前版本中申明不支持.

xmlib?模塊提供了一個簡單的 XML 語法分析器, 使用正則表達(dá)式將 XML 數(shù)據(jù)分離,?如?Example 5-1?所示. 語法分析器只對文檔做基本的檢查,?例如是否只有一個頂層元素, 所有的標(biāo)簽是否匹配.

XML 數(shù)據(jù)一塊一塊地發(fā)送給 xmllib 分析器(例如在網(wǎng)路中傳輸?shù)臄?shù)據(jù)). 分析器在遇到起始標(biāo)簽,?數(shù)據(jù)區(qū)域, 結(jié)束標(biāo)簽, 和實(shí)體的時候調(diào)用不同的方法.

如果你只是對某些標(biāo)簽感興趣, 你可以定義特殊的?start_tag?和?end_tag?方法, 這里?tag?是標(biāo)簽名稱. 這些?start?函數(shù)使用它們對應(yīng)標(biāo)簽的屬性作為參數(shù)調(diào)用(傳遞時為一個字典).

1.2.0.1. Example 5-1. 使用 xmllib 模塊獲取元素的信息

File: xmllib-example-1.pyimport xmllibclass Parser(xmllib.XMLParser):# get quotation numberdef _ _init_ _(self, file=None):xmllib.XMLParser._ _init_ _(self)if file:self.load(file)def load(self, file):while 1:s = file.read(512)if not s:breakself.feed(s)self.close()def start_quotation(self, attrs):print "id =>", attrs.get("id")raise EOFErrortry:c = Parser()c.load(open("samples/sample.xml")) except EOFError:pass*B*id => 031*b*

Example 5-2?展示了一個簡單(不完整)的內(nèi)容輸出引擎( rendering engine ).?分析器有一個元素堆棧(?_?_tags?), 它連同文本片斷傳遞給輸出生成器.?生成器會在 style 字典中查詢當(dāng)前標(biāo)簽的層次, 如果不存在, 它將根據(jù)樣式表創(chuàng)建一個新的樣式描述.

1.2.0.2. Example 5-2. 使用 xmllib 模塊

File: xmllib-example-2.pyimport xmllib import string, sysSTYLESHEET = {# each element can contribute one or more style elements"quotation": {"style": "italic"},"lang": {"weight": "bold"},"name": {"weight": "medium"}, }class Parser(xmllib.XMLParser):# a simple styling enginedef _ _init_ _(self, renderer):xmllib.XMLParser._ _init_ _(self)self._ _data = []self._ _tags = []self._ _renderer = rendererdef load(self, file):while 1:s = file.read(8192)if not s:breakself.feed(s)self.close()def handle_data(self, data):self._ _data.append(data)def unknown_starttag(self, tag, attrs):if self._ _data:text = string.join(self._ _data, "")self._ _renderer.text(self._ _tags, text)self._ _tags.append(tag)self._ _data = []def unknown_endtag(self, tag):self._ _tags.pop()if self._ _data:text = string.join(self._ _data, "")self._ _renderer.text(self._ _tags, text)self._ _data = []class DumbRenderer:def _ _init_ _(self):self.cache = {}def text(self, tags, text):# render text in the style given by the tag stacktags = tuple(tags)style = self.cache.get(tags)if style is None:# figure out a combined stylestyle = {}for tag in tags:s = STYLESHEET.get(tag)if s:style.update(s)self.cache[tags] = style # update cache# write to standard outputsys.stdout.write("%s =>\n" % style)sys.stdout.write(" " + repr(text) + "\n")# # try it outr = DumbRenderer() c = Parser(r) c.load(open("samples/sample.xml"))*B*{'style': 'italic'} =>'I\'ve had a lot of developers come up to me and\012say,"I haven\'t had this much fun in a long time. It surebeats\012writing ' {'style': 'italic', 'weight': 'bold'} =>'Cobol' {'style': 'italic'} =>'" -- ' {'style': 'italic', 'weight': 'medium'} =>'James Gosling' {'style': 'italic'} =>', on\012' {'weight': 'bold'} =>'Java' {'style': 'italic'} =>'.'*b*

1.3. xml.parsers.expat 模塊

(可選)?xml.parsers.expat?模塊是 James Clark's Expat XML parser 的接口.?Example 5-3?展示了這個功能完整且性能很好的語法分析器.

1.3.0.1. Example 5-3. 使用 xml.parsers.expat 模塊

File: xml-parsers-expat-example-1.pyfrom xml.parsers import expatclass Parser:def _ _init_ _(self):self._parser = expat.ParserCreate()self._parser.StartElementHandler = self.startself._parser.EndElementHandler = self.endself._parser.CharacterDataHandler = self.datadef feed(self, data):self._parser.Parse(data, 0)def close(self):self._parser.Parse("", 1) # end of datadel self._parser # get rid of circular referencesdef start(self, tag, attrs):print "START", repr(tag), attrsdef end(self, tag):print "END", repr(tag)def data(self, data):print "DATA", repr(data)p = Parser() p.feed("<tag>data</tag>") p.close()*B*START u'tag' {} DATA u'data' END u'tag'*b*

注意即使你傳入的是普通的文本, 這里的分析器仍然會返回 Unicode 字符串. 默認(rèn)情況下,?分析器將源文本作為 UTF-8 解析. 如果要使用其他編碼, 請確保 XML 文件包含?encoding?說明.?如?Example 5-4?所示.

1.3.0.2. Example 5-4. 使用 xml.parsers.expat 模塊讀取 ISO Latin-1 文本

File: xml-parsers-expat-example-2.pyfrom xml.parsers import expatclass Parser:def _ _init_ _(self):self._parser = expat.ParserCreate()self._parser.StartElementHandler = self.startself._parser.EndElementHandler = self.endself._parser.CharacterDataHandler = self.datadef feed(self, data):self._parser.Parse(data, 0)def close(self):self._parser.Parse("", 1) # end of datadel self._parser # get rid of circular referencesdef start(self, tag, attrs):print "START", repr(tag), attrsdef end(self, tag):print "END", repr(tag)def data(self, data):print "DATA", repr(data)p = Parser() p.feed("""\ <?xml version='1.0' encoding='iso-8859-1'?> <author> <name>fredrik lundh</name> <city>link?ping</city> </author> """ ) p.close()*B*START u'author' {} DATA u'\012' START u'name' {} DATA u'fredrik lundh' END u'name' DATA u'\012' START u'city' {} DATA u'link\366ping' END u'city' DATA u'\012' END u'author'*b*

1.4. sgmllib 模塊

sgmllib?模塊, 提供了一個基本的 SGML 語法分析器. 它與?xmllib?分析器基本相同,?但限制更少(而且不是很完善). 如?Example 5-5?所示.

和在?xmllib?中一樣, 這個分析器在遇到起始標(biāo)簽, 數(shù)據(jù)區(qū)域, 結(jié)束標(biāo)簽以及實(shí)體時調(diào)用內(nèi)部方法.?如果你只是對某些標(biāo)簽感興趣, 那么你可以定義特殊的方法.

1.4.0.1. Example 5-5. 使用 sgmllib 模塊提取 Title 元素

File: sgmllib-example-1.pyimport sgmllib import stringclass FoundTitle(Exception):passclass ExtractTitle(sgmllib.SGMLParser):def _ _init_ _(self, verbose=0):sgmllib.SGMLParser._ _init_ _(self, verbose)self.title = self.data = Nonedef handle_data(self, data):if self.data is not None:self.data.append(data)def start_title(self, attrs):self.data = []def end_title(self):self.title = string.join(self.data, "")raise FoundTitle # abort parsing!def extract(file):# extract title from an HTML/SGML streamp = ExtractTitle()try:while 1:# read small chunkss = file.read(512)if not s:breakp.feed(s)p.close()except FoundTitle:return p.titlereturn None# # try it outprint "html", "=>", extract(open("samples/sample.htm")) print "sgml", "=>", extract(open("samples/sample.sgm"))html => A Title. sgml => Quotations

重載?unknown_starttag?和?unknown_endtag?方法就可以處理所有的標(biāo)簽. 如?Example 5-6?所示.

1.4.0.2. Example 5-6. 使用 sgmllib 模塊格式化 SGML 文檔

File: sgmllib-example-2.pyimport sgmllib import cgi, sysclass PrettyPrinter(sgmllib.SGMLParser):# A simple SGML pretty printerdef _ _init_ _(self):# initialize base classsgmllib.SGMLParser._ _init_ _(self)self.flag = 0def newline(self):# force newline, if necessaryif self.flag:sys.stdout.write("\n")self.flag = 0def unknown_starttag(self, tag, attrs):# called for each start tag# the attrs argument is a list of (attr, value)# tuples. convert it to a string.text = ""for attr, value in attrs:text = text + " %s='%s'" % (attr, cgi.escape(value))self.newline()sys.stdout.write("<%s%s>\n" % (tag, text))def handle_data(self, text):# called for each text sectionsys.stdout.write(text)self.flag = (text[-1:] != "\n")def handle_entityref(self, text):# called for each entitysys.stdout.write("&%s;" % text)def unknown_endtag(self, tag):# called for each end tagself.newline()sys.stdout.write("<%s>" % tag)# # try it outfile = open("samples/sample.sgm")p = PrettyPrinter() p.feed(file.read()) p.close()*B*<chapter> <title> Quotations <title> <epigraph> <attribution> eff-bot, June 1997 <attribution> <para> <quote> Nobody expects the Spanish Inquisition! Amongst our weaponry are such diverse elements as fear, surprise, ruthless efficiency, and an almost fanatical devotion to Guido, and nice red uniforms — oh, damn! <quote> <para> <epigraph> <chapter>*b*

Example 5-7?檢查 SGML 文檔是否是如 XML 那樣 "正確格式化", 所有的元素是否正確嵌套, 起始和結(jié)束標(biāo)簽是否匹配等.

我們使用列表保存所有起始標(biāo)簽, 然后檢查每個結(jié)束標(biāo)簽是否匹配前個起始標(biāo)簽.?最后確認(rèn)到達(dá)文件末尾時沒有未關(guān)閉的標(biāo)簽.

1.4.0.3. Example 5-7. 使用 sgmllib 模塊檢查格式

File: sgmllib-example-3.pyimport sgmllibclass WellFormednessChecker(sgmllib.SGMLParser):# check that an SGML document is 'well-formed'# (in the XML sense).def _ _init_ _(self, file=None):sgmllib.SGMLParser._ _init_ _(self)self.tags = []if file:self.load(file)def load(self, file):while 1:s = file.read(8192)if not s:breakself.feed(s)self.close()def close(self):sgmllib.SGMLParser.close(self)if self.tags:raise SyntaxError, "start tag %s not closed" % self.tags[-1]def unknown_starttag(self, start, attrs):self.tags.append(start)def unknown_endtag(self, end):start = self.tags.pop()if end != start:raise SyntaxError, "end tag %s does't match start tag %s" %\(end, start)try:c = WellFormednessChecker()c.load(open("samples/sample.htm")) except SyntaxError:raise # report error else:print "document is well-formed"*B*Traceback (innermost last): ... SyntaxError: end tag head does't match start tag meta*b*

最后,?Example 5-8?中的類可以用來過濾 HTML 和 SGML 文檔. 繼承這個類,?然后實(shí)現(xiàn)?start?和?end?方法即可.

1.4.0.4. Example 5-8. 使用 sgmllib 模塊過濾 SGML 文檔

File: sgmllib-example-4.pyimport sgmllib import cgi, string, sysclass SGMLFilter(sgmllib.SGMLParser):# sgml filter. override start/end to manipulate# document elementsdef _ _init_ _(self, outfile=None, infile=None):sgmllib.SGMLParser._ _init_ _(self)if not outfile:outfile = sys.stdoutself.write = outfile.writeif infile:self.load(infile)def load(self, file):while 1:s = file.read(8192)if not s:breakself.feed(s)self.close()def handle_entityref(self, name):self.write("&%s;" % name)def handle_data(self, data):self.write(cgi.escape(data))def unknown_starttag(self, tag, attrs):tag, attrs = self.start(tag, attrs)if tag:if not attrs:self.write("<%s>" % tag)else:self.write("<%s" % tag)for k, v in attrs:self.write(" %s=%s" % (k, repr(v)))self.write(">")def unknown_endtag(self, tag):tag = self.end(tag)if tag:self.write("</%s>" % tag)def start(self, tag, attrs):return tag, attrs # overridedef end(self, tag):return tag # overrideclass Filter(SGMLFilter):def fixtag(self, tag):if tag == "em":tag = "i"if tag == "string":tag = "b"return string.upper(tag)def start(self, tag, attrs):return self.fixtag(tag), attrsdef end(self, tag):return self.fixtag(tag)c = Filter() c.load(open("samples/sample.htm"))

1.5. htmllib 模塊

htmlib?模塊包含了一個標(biāo)簽驅(qū)動的( tag-driven ) HTML 語法分析器,?它會將數(shù)據(jù)發(fā)送至一個格式化對象. 如?Example 5-9?所示.?更多關(guān)于如何解析 HTML 的例子請參閱?formatter?模塊.

1.5.0.1. Example 5-9. 使用 htmllib 模塊

File: htmllib-example-1.pyimport htmllib import formatter import stringclass Parser(htmllib.HTMLParser):# return a dictionary mapping anchor texts to lists# of associated hyperlinksdef _ _init_ _(self, verbose=0):self.anchors = {}f = formatter.NullFormatter()htmllib.HTMLParser._ _init_ _(self, f, verbose)def anchor_bgn(self, href, name, type):self.save_bgn()self.anchor = hrefdef anchor_end(self):text = string.strip(self.save_end())if self.anchor and text:self.anchors[text] = self.anchors.get(text, []) + [self.anchor]file = open("samples/sample.htm") html = file.read() file.close()p = Parser() p.feed(html) p.close()for k, v in p.anchors.items():print k, "=>", vprint*B*link => ['http://www.python.org']*b*

如果你只是想解析一個 HTML 文件, 而不是將它交給輸出設(shè)備,?那么?sgmllib?模塊會是更好的選擇.

1.6. htmlentitydefs 模塊

htmlentitydefs?模塊包含一個由 HTML 中 ISO Latin-1 字符實(shí)體構(gòu)成的字典.?如?Example 5-10?所示.

1.6.0.1. Example 5-10. 使用 htmlentitydefs 模塊

File: htmlentitydefs-example-1.pyimport htmlentitydefsentities = htmlentitydefs.entitydefsfor entity in "amp", "quot", "copy", "yen":print entity, "=", entities[entity]*B*amp = & quot = " copy = \302\251 yen = \302\245*b*

Example 5-11 展示了如何將正則表達(dá)式與這個字典結(jié)合起來翻譯字符串中的實(shí)體?(?cgi.escape?的逆向操作).

1.6.0.2. Example 5-11. 使用 htmlentitydefs 模塊翻譯實(shí)體

File: htmlentitydefs-example-2.pyimport htmlentitydefs import re import cgipattern = re.compile("&(\w+?);")def descape_entity(m, defs=htmlentitydefs.entitydefs):# callback: translate one entity to its ISO Latin valuetry:return defs[m.group(1)]except KeyError:return m.group(0) # use as isdef descape(string):return pattern.sub(descape_entity, string)print descape("<spam&eggs>") print descape(cgi.escape("<spam&eggs>"))*B*<spam&eggs> <spam&eggs>*b*

最后,?Example 5-12?展示了如何將 XML 保留字符和 ISO Latin-1 字符轉(zhuǎn)換為 XML 字符串.?與?cgi.escape?相似, 但它會替換非 ASCII 字符.

1.6.0.3. Example 5-12. 轉(zhuǎn)義 ISO Latin-1 實(shí)體

File: htmlentitydefs-example-3.pyimport htmlentitydefs import re, string# this pattern matches substrings of reserved and non-ASCII characters pattern = re.compile(r"[&<>\"\x80-\xff]+")# create character map entity_map = {}for i in range(256):entity_map[chr(i)] = "&%d;" % ifor entity, char in htmlentitydefs.entitydefs.items():if entity_map.has_key(char):entity_map[char] = "&%s;" % entitydef escape_entity(m, get=entity_map.get):return string.join(map(get, m.group()), "")def escape(string):return pattern.sub(escape_entity, string)print escape("<spam&eggs>") print escape("\303\245 i \303\245a \303\244 e \303\266")*B*<spam&eggs> å i åa ä e ö*b*

1.7. formatter 模塊

formatter?模塊提供了一些可用于?htmllib?的格式類( formatter classes ).

這些類有兩種,?formatter?和?writer?.?formatter 將 HTML 解析器的標(biāo)簽和數(shù)據(jù)流轉(zhuǎn)換為適合輸出設(shè)備的事件流(?event stream ), 而 writer 將事件流輸出到設(shè)備上. 如?Example 5-13?所示.

大多情況下, 你可以使用?AbstractFormatter?類進(jìn)行格式化.?它會根據(jù)不同的格式化事件調(diào)用 writer 對象的方法.?AbstractWriter?類在每次方法調(diào)用時打印一條信息.

1.7.0.1. Example 5-13. 使用 formatter 模塊將 HTML 轉(zhuǎn)換為事件流

File: formatter-example-1.pyimport formatter import htmllibw = formatter.AbstractWriter() f = formatter.AbstractFormatter(w)file = open("samples/sample.htm")p = htmllib.HTMLParser(f) p.feed(file.read()) p.close()file.close()*B*send_paragraph(1) new_font(('h1', 0, 1, 0)) send_flowing_data('A Chapter.') send_line_break() send_paragraph(1) new_font(None) send_flowing_data('Some text. Some more text. Some') send_flowing_data(' ') new_font((None, 1, None, None)) send_flowing_data('emphasized') new_font(None) send_flowing_data(' text. A') send_flowing_data(' link') send_flowing_data('[1]') send_flowing_data('.')*b*

formatter?模塊還提供了?NullWriter?類, 它會將任何傳遞給它的事件忽略; 以及?DumbWriter?類,?它會將事件流轉(zhuǎn)換為純文本文檔. 如?Example 5-14?所示.

1.7.0.2. Example 5-14. 使用 formatter 模塊將 HTML 轉(zhuǎn)換為純文本

File: formatter-example-2.pyimport formatter import htmllibw = formatter.DumbWriter() # plain text f = formatter.AbstractFormatter(w)file = open("samples/sample.htm")# print html body as plain text p = htmllib.HTMLParser(f) p.feed(file.read()) p.close()file.close()# print links print print i = 1 for link in p.anchorlist:print i, "=>", linki = i + 1*B*A Chapter.Some text. Some more text. Some emphasized text. A link[1].1 => http://www.python.org*b*

Example 5-15?提供了一個自定義的 Writer , 它繼承自?DumbWriter?類,?會記錄當(dāng)前字體樣式并根據(jù)字體美化輸出格式.

1.7.0.3. Example 5-15. 使用 formatter 模塊自定義 Writer

File: formatter-example-3.pyimport formatter import htmllib, stringclass Writer(formatter.DumbWriter):def _ _init_ _(self):formatter.DumbWriter._ _init_ _(self)self.tag = ""self.bold = self.italic = 0self.fonts = []def new_font(self, font):if font is None:font = self.fonts.pop()self.tag, self.bold, self.italic = fontelse:self.fonts.append((self.tag, self.bold, self.italic))tag, bold, italic, typewriter = fontif tag is not None:self.tag = tagif bold is not None:self.bold = boldif italic is not None:self.italic = italicdef send_flowing_data(self, data):if not data:returnatbreak = self.atbreak or data[0] in string.whitespacefor word in string.split(data):if atbreak:self.file.write(" ")if self.tag in ("h1", "h2", "h3"):word = string.upper(word)if self.bold:word = "*" + word + "*"if self.italic:word = "_" + word + "_"self.file.write(word)atbreak = 1self.atbreak = data[-1] in string.whitespacew = Writer() f = formatter.AbstractFormatter(w)file = open("samples/sample.htm")# print html body as plain text p = htmllib.HTMLParser(f) p.feed(file.read()) p.close()*B*_A_ _CHAPTER._Some text. Some more text. Some *emphasized* text. A link[1].*b*

1.8. ConfigParser 模塊

ConfigParser?模塊用于讀取配置文件.

配置文件的格式與 Windows INI 文件類似, 可以包含一個或多個區(qū)域( section ),?每個區(qū)域可以有多個配置條目.

這里有個樣例配置文件, 在?Example 5-16?用到了這個文件:

[book] title: The Python Standard Library author: Fredrik Lundh email: fredrik@pythonware.com version: 2.0-001115[ematter] pages: 250[hardcopy] pages: 350

Example 5-16?使用?ConfigParser?模塊讀取這個配制文件.

1.8.0.1. Example 5-16. 使用 ConfigParser 模塊

File: configparser-example-1.pyimport ConfigParser import stringconfig = ConfigParser.ConfigParser()config.read("samples/sample.ini")# print summary print print string.upper(config.get("book", "title")) print "by", config.get("book", "author"), print "(" + config.get("book", "email") + ")" print print config.get("ematter", "pages"), "pages" print# dump entire config file for section in config.sections():print sectionfor option in config.options(section):print " ", option, "=", config.get(section, option)*B*THE PYTHON STANDARD LIBRARY by Fredrik Lundh (fredrik@pythonware.com)250 pagesbooktitle = The Python Standard Libraryemail = fredrik@pythonware.comauthor = Fredrik Lundhversion = 2.0-001115_ _name_ _ = book ematter_ _name_ _ = ematterpages = 250 hardcopy_ _name_ _ = hardcopypages = 350*b*

Python 2.0 以后,?ConfigParser?模塊也可以將配置數(shù)據(jù)寫入文件, 如?Example 5-17?所示.

1.8.0.2. Example 5-17. 使用 ConfigParser 模塊寫入配置數(shù)據(jù)

File: configparser-example-2.pyimport ConfigParser import sysconfig = ConfigParser.ConfigParser()# set a number of parameters config.add_section("book") config.set("book", "title", "the python standard library") config.set("book", "author", "fredrik lundh")config.add_section("ematter") config.set("ematter", "pages", 250)# write to screen config.write(sys.stdout)*B*[book] title = the python standard library author = fredrik lundh[ematter] pages = 250*b*

1.9. netrc 模塊

netrc 模塊可以用來解析?.netrc?配置文件, 如 Example 5-18 所示.?該文件用于在用戶的 home 目錄儲存 FTP 用戶名和密碼. (別忘記設(shè)置這個文件的屬性為: "chmod 0600 ~/.netrc,"?這樣只有當(dāng)前用戶能訪問).

1.9.0.1. Example 5-18. 使用 netrc 模塊

File: netrc-example-1.pyimport netrc# default is $HOME/.netrc info = netrc.netrc("samples/sample.netrc")login, account, password = info.authenticators("secret.fbi") print "login", "=>", repr(login) print "account", "=>", repr(account) print "password", "=>", repr(password)*B*login => 'mulder' account => None password => 'trustno1'*b*

1.10. shlex 模塊

shlex?模塊為基于 Unix shell 語法的語言提供了一個簡單的 lexer (也就是 tokenizer).?如?Example 5-19?所示.

1.10.0.1. Example 5-19. 使用 shlex 模塊

File: shlex-example-1.pyimport shlexlexer = shlex.shlex(open("samples/sample.netrc", "r")) lexer.wordchars = lexer.wordchars + "._"while 1:token = lexer.get_token()if not token:breakprint repr(token)*B*'machine' 'secret.fbi' 'login' 'mulder' 'password' 'trustno1' 'machine' 'non.secret.fbi' 'login' 'scully' 'password' 'noway'*b*

1.11. zipfile 模塊

( 2.0 新增)?zipfile?模塊可以用來讀寫 ZIP 格式.

1.11.1. 列出內(nèi)容

使用?namelist?和?infolist?方法可以列出壓縮檔的內(nèi)容, 前者返回由文件名組成的列表,?后者返回由?ZipInfo?實(shí)例組成的列表. 如?Example 5-20?所示.

1.11.1.1. Example 5-20. 使用 zipfile 模塊列出 ZIP 文檔中的文件

File: zipfile-example-1.pyimport zipfilefile = zipfile.ZipFile("samples/sample.zip", "r")# list filenames for name in file.namelist():print name, print# list file information for info in file.infolist():print info.filename, info.date_time, info.file_size*B*sample.txt sample.jpg sample.txt (1999, 9, 11, 20, 11, 8) 302 sample.jpg (1999, 9, 18, 16, 9, 44) 4762*b*

1.11.2. 從 ZIP 文件中讀取數(shù)據(jù)

調(diào)用?read?方法就可以從 ZIP 文檔中讀取數(shù)據(jù). 它接受一個文件名作為參數(shù), 返回字符串.?如?Example 5-21?所示.

1.11.2.1. Example 5-21. 使用 zipfile 模塊從 ZIP 文件中讀取數(shù)據(jù)

File: zipfile-example-2.pyimport zipfilefile = zipfile.ZipFile("samples/sample.zip", "r")for name in file.namelist():data = file.read(name)print name, len(data), repr(data[:10])*B*sample.txt 302 'We will pe' sample.jpg 4762 '\377\330\377\340\000\020JFIF'*b*

1.11.3. 向 ZIP 文件寫入數(shù)據(jù)

向壓縮檔加入文件很簡單, 將文件名, 文件在 ZIP 檔中的名稱傳遞給?write?方法即可.

Example 5-22?將 samples 目錄中的所有文件打包為一個 ZIP 文件.

1.11.3.1. Example 5-22. 使用 zipfile 模塊將文件儲存在 ZIP 文件里

File: zipfile-example-3.pyimport zipfile import glob, os# open the zip file for writing, and write stuff to itfile = zipfile.ZipFile("test.zip", "w")for name in glob.glob("samples/*"):file.write(name, os.path.basename(name), zipfile.ZIP_DEFLATED)file.close()# open the file again, to see what's in itfile = zipfile.ZipFile("test.zip", "r") for info in file.infolist():print info.filename, info.date_time, info.file_size, info.compress_size*B*sample.wav (1999, 8, 15, 21, 26, 46) 13260 10985 sample.jpg (1999, 9, 18, 16, 9, 44) 4762 4626 sample.au (1999, 7, 18, 20, 57, 34) 1676 1103 ...*b*

write?方法的第三個可選參數(shù)用于控制是否使用壓縮. 默認(rèn)為?zipfile.ZIP_STORED?,?意味著只是將數(shù)據(jù)儲存在檔案里而不進(jìn)行任何壓縮. 如果安裝了?zlib?模塊, 那么就可以使用?zipfile.ZIP_DEFLATED?進(jìn)行壓縮.

zipfile?模塊也可以向檔案中添加字符串. 不過, 這需要一點(diǎn)技巧,?你需要創(chuàng)建一個?ZipInfo?實(shí)例, 并正確配置它.?Example 5-23?提供了一種簡單的解決辦法.

1.11.3.2. Example 5-23. 使用 zipfile 模塊在 ZIP 文件中儲存字符串

File: zipfile-example-4.pyimport zipfile import glob, os, timefile = zipfile.ZipFile("test.zip", "w")now = time.localtime(time.time())[:6]for name in ("life", "of", "brian"):info = zipfile.ZipInfo(name)info.date_time = nowinfo.compress_type = zipfile.ZIP_DEFLATEDfile.writestr(info, name*1000)file.close()# open the file again, to see what's in itfile = zipfile.ZipFile("test.zip", "r")for info in file.infolist():print info.filename, info.date_time, info.file_size, info.compress_size*B*life (2000, 12, 1, 0, 12, 1) 4000 26 of (2000, 12, 1, 0, 12, 1) 2000 18 brian (2000, 12, 1, 0, 12, 1) 5000 31*b*

1.12. gzip 模塊

gzip?模塊用來讀寫 gzip 格式的壓縮文件, 如?Example 5-24?所示.

1.12.0.1. Example 5-24. 使用 gzip 模塊讀取壓縮文件

File: gzip-example-1.pyimport gzipfile = gzip.GzipFile("samples/sample.gz")print file.read()*B*Well it certainly looks as though we're in for a splendid afternoon's sport in this the 127th Upperclass Twit of the Year Show.*b*

標(biāo)準(zhǔn)的實(shí)現(xiàn)并不支持?seek?和?tell?方法. 不過?Example 5-25?可以解決這個問題.

1.12.0.2. Example 5-25. 給 gzip 模塊添加 seek/tell 支持

File: gzip-example-2.pyimport gzipclass gzipFile(gzip.GzipFile):# adds seek/tell support to GzipFileoffset = 0def read(self, size=None):data = gzip.GzipFile.read(self, size)self.offset = self.offset + len(data)return datadef seek(self, offset, whence=0):# figure out new position (we can only seek forwards)if whence == 0:position = offsetelif whence == 1:position = self.offset + offsetelse:raise IOError, "Illegal argument"if position < self.offset:raise IOError, "Cannot seek backwards"# skip forward, in 16k blockswhile position > self.offset:if not self.read(min(position - self.offset, 16384)):breakdef tell(self):return self.offset# # try itfile = gzipFile("samples/sample.gz") file.seek(80)print file.read()*B*this the 127th Upperclass Twit of the Year Show.*b*

PythonStandardLib/chpt5 (2009-12-25 07:15:18由localhost編輯)

只讀網(wǎng)頁
信息
附件
? 更多操作：源碼?打印視圖?刪除緩存?------------------------?拼寫檢查?相似網(wǎng)頁?本站地圖?------------------------?改名?刪除?------------------------?訂閱?------------------------?刪除垃圾廣告?恢復(fù)成此版本?網(wǎng)頁打包?同步網(wǎng)頁?------------------------?加載?保存?SlideShow? ? ?

豆瓣贊助

Page.execute = 0.450s ?
getACL = 0.132s ?
init = 0.002s ?
load_multi_cfg = 0.000s ?
run = 1.038s ?
send_page = 0.993s ?
send_page_content = 0.458s ?
send_page_content|1 = 0.164s ?
send_page|1 = 0.198s ?
total = 1.040s

轉(zhuǎn)載于:https://www.cnblogs.com/soft115/archive/2011/11/29/2267607.html

總結(jié)

以上是生活随笔為你收集整理的5. 文件格式的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

文件格式

上一篇： TwinVQ解码框图
下一篇： iphone Quartz2D使用心得

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

5. 文件格式

PythonStandardLib/chpt5

1. 文件格式

1.1. 概覽

1.1.1. Markup 語言

1.1.2. 配置文件

1.1.3. 壓縮檔案格式

1.2. xmllib 模塊

1.2.0.1. Example 5-1. 使用 xmllib 模塊獲取元素的信息

1.2.0.2. Example 5-2. 使用 xmllib 模塊

1.3. xml.parsers.expat 模塊

1.3.0.1. Example 5-3. 使用 xml.parsers.expat 模塊

1.3.0.2. Example 5-4. 使用 xml.parsers.expat 模塊讀取 ISO Latin-1 文本

1.4. sgmllib 模塊

1.4.0.1. Example 5-5. 使用 sgmllib 模塊提取 Title 元素

1.4.0.2. Example 5-6. 使用 sgmllib 模塊格式化 SGML 文檔

1.4.0.3. Example 5-7. 使用 sgmllib 模塊檢查格式

1.4.0.4. Example 5-8. 使用 sgmllib 模塊過濾 SGML 文檔

1.5. htmllib 模塊

1.5.0.1. Example 5-9. 使用 htmllib 模塊

1.6. htmlentitydefs 模塊

1.6.0.1. Example 5-10. 使用 htmlentitydefs 模塊

1.6.0.2. Example 5-11. 使用 htmlentitydefs 模塊翻譯實(shí)體

1.6.0.3. Example 5-12. 轉(zhuǎn)義 ISO Latin-1 實(shí)體

1.7. formatter 模塊

1.7.0.1. Example 5-13. 使用 formatter 模塊將 HTML 轉(zhuǎn)換為事件流

1.7.0.2. Example 5-14. 使用 formatter 模塊將 HTML 轉(zhuǎn)換為純文本

1.7.0.3. Example 5-15. 使用 formatter 模塊自定義 Writer

1.8. ConfigParser 模塊

1.8.0.1. Example 5-16. 使用 ConfigParser 模塊

1.8.0.2. Example 5-17. 使用 ConfigParser 模塊寫入配置數(shù)據(jù)

1.9. netrc 模塊

1.9.0.1. Example 5-18. 使用 netrc 模塊

1.10. shlex 模塊

1.10.0.1. Example 5-19. 使用 shlex 模塊

1.11. zipfile 模塊

1.11.1. 列出內(nèi)容

1.11.1.1. Example 5-20. 使用 zipfile 模塊列出 ZIP 文檔中的文件

1.11.2. 從 ZIP 文件中讀取數(shù)據(jù)

1.11.2.1. Example 5-21. 使用 zipfile 模塊從 ZIP 文件中讀取數(shù)據(jù)

1.11.3. 向 ZIP 文件寫入數(shù)據(jù)

1.11.3.1. Example 5-22. 使用 zipfile 模塊將文件儲存在 ZIP 文件里

1.11.3.2. Example 5-23. 使用 zipfile 模塊在 ZIP 文件中儲存字符串

1.12. gzip 模塊

1.12.0.1. Example 5-24. 使用 gzip 模塊讀取壓縮文件

1.12.0.2. Example 5-25. 給 gzip 模塊添加 seek/tell 支持

總結(jié)