XML解析简介及Xerces-C++简单使用举例
XML是由World WideWeb聯盟(W3C)定義的元語言。它已經成為一種通用的數據交換格式,它的平臺無關性,語言無關性,系統無關性,給數據集成與交互帶來了極大的方便。XML在不同的語言里解析方式都是一樣的,只不過實現的語法不同而已。
XML本身只是以純文本對數據進行編碼的一種格式,要想利用XML,或者說利用XML文件中所編碼的數據,必須先將數據從純文本中解析出來,因此,必須有一個能夠識別XML文檔中信息的解析器,用來解釋XML文檔并提取其中的數據。然而,根據數據提取的不同需求,又存在著多種解析方式,不同的解析方式有著各自的優缺點和適用環境。選擇合適的XML解析技術能夠有效提升應用系統的整體性能。
所有的XML處理都從解析開始,無論是使用XSLT或Java語言,第一步都是要讀入XML文件,解碼結構和檢索信息等等,這就是解析,即把代表XML文檔的一個無結構的字符序列轉換為滿足XML語法的結構化組件的過程。
XML基本的的解析方式主要有兩種:SAX(Simple API for XML)和DOM(Document ObjectModel)。
SAX是基于事件流的解析。SAX處理的優點非常類似于流媒體的優點。分析能夠立即開始,而不是等待所有的數據被處理。而且,由于應用程序只是在讀取數據時檢查數據,因此不需要將數據存儲在內存中。這對于大型文檔來說是個巨大的優點。事實上,應用程序甚至不必解析整個文檔,它可以在某個條件得到滿足時停止解析。一般來說,SAX還比它的替代者DOM快很多。SAX解析器采用了基于事件的模型,它在解析XML文檔的時候可以觸發一系列的事件,當發現給定的tag的時候,它可以激活一個回調方法,告訴該方法制定的標簽已經找到。SAX對內存的要求通常會比較低,因為它讓開發人員來決定所要處理的tag。特別是當開發人員只需要處理文檔中所包含的部分數據時,SAX這種擴展能力得到了更好的體現。但用SAX解析器的時候編碼工作會比較困難,而且很難同時訪問同一個文檔中的多處不同數據。優點:(1)、不需要等待所有數據都被處理,分析就能立即開始;(2)、只在讀取數據時檢查數據,不需要保存在內存中;(3)、可以在某個條件得到滿足時停止解析,不必解析整個文檔;(4)、效率和性能較高,能解析大于系統內存的文檔。缺點:(1)、需要應用程序自己負責TAG的處理邏輯(例如維護父/子關系等),文檔越復雜程序就越復雜;(2)、單向導航,無法定位文檔層次,很難同時訪問同一文檔的不同部分數據,不支持XPath。
DOM是用與平臺和語言無關的方式表示XML文檔的官方W3C標準。DOM是以層次結構組織的節點或信息片段的集合。這個層次結構允許開發人員在樹中尋找特定信息。分析該結構通常需要加載整個文檔和構造層次結構,然后才能做任何工作。由于它是基于信息層次的,因而DOM被認為是基于樹或基于對象的。優點:(1)、允許應用程序對數據和結構做出更改;(2)、訪問是雙向的,可以在任何時候在樹中上下導航,獲取和操作任意部分的數據。缺點:通常需要加載整個XML文檔來構造層次結構,消耗資源大。
基于C/C++語言的XML解析庫包括:
(1)、Expat:http://www.libexpat.org/? ;
(2)、die-xml:https://code.google.com/p/die-xml/;
(3)、Xerces-C++:http://xerces.apache.org/xerces-c/index.html;
(4)、TinyXml:http://www.grinninglizard.com/tinyxml/;
Xerces-C++的編譯和使用:
1、? 從http://xerces.apache.org/xerces-c/download.cgi#verify下載 xerces-c-3.1.1.zip 源代碼,并解壓縮;
2、? 用vs2010打開xerces-c-3.1.1\projects\Win32\VC10\xerces-all目錄下的xerces-all.sln;
3、? 分別選擇SolutionConfigurations、Solution Platforms中相關項,然后選中Solution ‘xerces-all’,-->單擊右鍵,選擇執行Rebuild Solution,會在/Build/Win32/VC10目錄下生成相應的動態庫和靜態庫,這里選擇Static Debug/xerces-c_static_3D.lib和Static Release/xerces-c_static_3.lib進行測試;
4、在’xerces-all’工作空間的基礎上新建一個TestXerces工程,選中此工程,分別在Debug和Release下,工程屬性(1)、Configuration Properties -->Character Set:Use Unicode Character Set; (2)、C/C++-->General-->Additional Include Directories: ../../../../../src ,C/C++ -->Prerocessor中加入:_CRT_SECURE_NO_DEPRECATE
_WINDOWS
XERCES_STATIC_LIBRARY
XERCES_BUILDING_LIBRARY
XERCES_USE_TRANSCODER_WINDOWS
XERCES_USE_MSGLOADER_INMEMORY
XERCES_USE_NETACCESSOR_WINSOCK
XERCES_USE_FILEMGR_WINDOWS
XERCES_USE_MUTEXMGR_WINDOWS
XERCES_PATH_DELIMITER_BACKSLASH
HAVE_STRICMP
HAVE_STRNICMP
HAVE_LIMITS_H
HAVE_SYS_TIMEB_H
HAVE_FTIME
HAVE_WCSUPR
HAVE_WCSLWR
HAVE_WCSICMP
HAVE_WCSNICMP
stdafx.h:
#pragma once#include "targetver.h"#include <stdio.h>#include "xercesc/util/PlatformUtils.hpp"
#include "xercesc/util/XMLString.hpp"
#include "xercesc/dom/DOM.hpp"
#include "xercesc/util/OutOfMemoryException.hpp"
#include "xercesc/util/TransService.hpp"
#include "xercesc/parsers/SAXParser.hpp"
#include "xercesc/sax/HandlerBase.hpp"
#include "xercesc/framework/XMLFormatter.hpp"
stdafx.cpp:
#include "stdafx.h"// TODO: reference any additional headers you need in STDAFX.H
// and not in this file
#ifdef _DEBUG#pragma comment(lib, "../../../../../Build/Win32/VC10/Static Debug/xerces-c_static_3D.lib")
#else#pragma comment(lib, "../../../../../Build/Win32/VC10/Static Release/xerces-c_static_3.lib")
#endif
TestXerces.cpp:
#include "stdafx.h"
#include <iostream>using namespace std;XERCES_CPP_NAMESPACE_USEclass XStr
{
public :// -----------------------------------------------------------------------// Constructors and Destructor// -----------------------------------------------------------------------XStr(const char* const toTranscode){// Call the private transcoding methodfUnicodeForm = XMLString::transcode(toTranscode);}~XStr(){XMLString::release(&fUnicodeForm);}// -----------------------------------------------------------------------// Getter methods// -----------------------------------------------------------------------const XMLCh* unicodeForm() const{return fUnicodeForm;}private :// -----------------------------------------------------------------------// Private data members//// fUnicodeForm// This is the Unicode XMLCh format of the string.// -----------------------------------------------------------------------XMLCh* fUnicodeForm;
};#define X(str) XStr(str).unicodeForm()/*
* This sample illustrates how you can create a DOM tree in memory.
* It then prints the count of elements in the tree.
*/
int CreateDOMDocument()
{// Initialize the XML4C2 system.try {XMLPlatformUtils::Initialize();} catch(const XMLException& toCatch) {char *pMsg = XMLString::transcode(toCatch.getMessage());XERCES_STD_QUALIFIER cerr << "Error during Xerces-c Initialization.\n"<< " Exception message:"<< pMsg;XMLString::release(&pMsg);return 1;}// Watch for special case help requestint errorCode = 0;/*{XERCES_STD_QUALIFIER cout << "\nUsage:\n"" CreateDOMDocument\n\n""This program creates a new DOM document from scratch in memory.\n""It then prints the count of elements in the tree.\n"<< XERCES_STD_QUALIFIER endl;errorCode = 1;}*/if(errorCode) {XMLPlatformUtils::Terminate();return errorCode;}{// Nest entire test in an inner block.// The tree we create below is the same that the XercesDOMParser would// have created, except that no whitespace text nodes would be created.// <company>// <product>Xerces-C</product>// <category idea='great'>XML Parsing Tools</category>// <developedBy>Apache Software Foundation</developedBy>// </company>DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(X("Core"));if (impl != NULL) {try {DOMDocument* doc = impl->createDocument(0, // root element namespace URI.X("company"), // root element name0); // document type object (DTD).DOMElement* rootElem = doc->getDocumentElement();DOMElement* prodElem = doc->createElement(X("product"));rootElem->appendChild(prodElem);DOMText* prodDataVal = doc->createTextNode(X("Xerces-C"));prodElem->appendChild(prodDataVal);DOMElement* catElem = doc->createElement(X("category"));rootElem->appendChild(catElem);catElem->setAttribute(X("idea"), X("great"));DOMText* catDataVal = doc->createTextNode(X("XML Parsing Tools"));catElem->appendChild(catDataVal);DOMElement* devByElem = doc->createElement(X("developedBy"));rootElem->appendChild(devByElem);DOMText* devByDataVal = doc->createTextNode(X("Apache Software Foundation"));devByElem->appendChild(devByDataVal);//// Now count the number of elements in the above DOM tree.//const XMLSize_t elementCount = doc->getElementsByTagName(X("*"))->getLength();XERCES_STD_QUALIFIER cout << "The tree just created contains: " << elementCount<< " elements." << XERCES_STD_QUALIFIER endl;doc->release();} catch (const OutOfMemoryException&) {XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;errorCode = 5;} catch (const DOMException& e) {XERCES_STD_QUALIFIER cerr << "DOMException code is: " << e.code << XERCES_STD_QUALIFIER endl;errorCode = 2;} catch (...) {XERCES_STD_QUALIFIER cerr << "An error occurred creating the document" << XERCES_STD_QUALIFIER endl;errorCode = 3;}} else{// (inpl != NULL)XERCES_STD_QUALIFIER cerr << "Requested implementation is not supported" << XERCES_STD_QUALIFIER endl;errorCode = 4;}}XMLPlatformUtils::Terminate();return errorCode;
}// ---------------------------------------------------------------------------
// This is a simple class that lets us do easy (though not terribly efficient)
// transcoding of XMLCh data to local code page for display.
// ---------------------------------------------------------------------------
class StrX
{
public :// -----------------------------------------------------------------------// Constructors and Destructor// -----------------------------------------------------------------------StrX(const XMLCh* const toTranscode){// Call the private transcoding methodfLocalForm = XMLString::transcode(toTranscode);}~StrX(){XMLString::release(&fLocalForm);}// -----------------------------------------------------------------------// Getter methods// -----------------------------------------------------------------------const char* localForm() const{return fLocalForm;}private :// -----------------------------------------------------------------------// Private data members//// fLocalForm// This is the local code page form of the string.// -----------------------------------------------------------------------char* fLocalForm;
};inline XERCES_STD_QUALIFIER ostream& operator<<(XERCES_STD_QUALIFIER ostream& target, const StrX& toDump)
{target << toDump.localForm();return target;
}int SAXPrint()
{// ---------------------------------------------------------------------------// Local data//// doNamespaces// Indicates whether namespace processing should be enabled or not.// Defaults to disabled.//// doSchema// Indicates whether schema processing should be enabled or not.// Defaults to disabled.//// schemaFullChecking// Indicates whether full schema constraint checking should be enabled or not.// Defaults to disabled.//// encodingName// The encoding we are to output in. If not set on the command line,// then it is defaulted to LATIN1.//// xmlFile// The path to the file to parser. Set via command line.//// valScheme// Indicates what validation scheme to use. It defaults to 'auto', but// can be set via the -v= command.// ---------------------------------------------------------------------------static bool doNamespaces = false;static bool doSchema = false;static bool schemaFullChecking = false;static const char* encodingName = "LATIN1";static XMLFormatter::UnRepFlags unRepFlags = XMLFormatter::UnRep_CharRef;static char* xmlFile = 0;static SAXParser::ValSchemes valScheme = SAXParser::Val_Auto;// Initialize the XML4C2 systemtry {XMLPlatformUtils::Initialize();} catch (const XMLException& toCatch) {XERCES_STD_QUALIFIER cerr << "Error during initialization! :\n"<< StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;return 1;}xmlFile = "../../../../../samples/data/personal-schema.xml";int errorCount = 0;//// Create a SAX parser object. Then, according to what we were told on// the command line, set it to validate or not.//SAXParser* parser = new SAXParser;parser->setValidationScheme(valScheme);parser->setDoNamespaces(doNamespaces);parser->setDoSchema(doSchema);parser->setHandleMultipleImports (true);parser->setValidationSchemaFullChecking(schemaFullChecking);//// Create the handler object and install it as the document and error// handler for the parser-> Then parse the file and catch any exceptions// that propogate out//int errorCode = 0;try {//SAXPrintHandlers handler(encodingName, unRepFlags);//parser->setDocumentHandler(&handler);//parser->setErrorHandler(&handler);parser->parse(xmlFile);errorCount = parser->getErrorCount();} catch (const OutOfMemoryException&) {XERCES_STD_QUALIFIER cerr << "OutOfMemoryException" << XERCES_STD_QUALIFIER endl;errorCode = 5;} catch (const XMLException& toCatch) {XERCES_STD_QUALIFIER cerr << "\nAn error occurred\n Error: "<< StrX(toCatch.getMessage())<< "\n" << XERCES_STD_QUALIFIER endl;errorCode = 4;}if(errorCode) {XMLPlatformUtils::Terminate();return errorCode;}//// Delete the parser itself. Must be done prior to calling Terminate, below.//delete parser;// And call the termination methodXMLPlatformUtils::Terminate();if (errorCount > 0)return 4;elsereturn 0;return 0;
}int main(int argc, char* argv[])
{CreateDOMDocument();SAXPrint();cout<<"ok!"<<endl;return 0;
}
總結
以上是生活随笔為你收集整理的XML解析简介及Xerces-C++简单使用举例的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: XML简介及举例
- 下一篇: gtest简介及简单使用