使用第三方类库对html进行解析
                                                            生活随笔
收集整理的這篇文章主要介紹了
                                使用第三方类库对html进行解析
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
                                html解析最重要的就是看清楚節點,看是用DIV取還是用class,搞清楚結構之后,解析規范的網頁都不是什么問題。
如果網頁不規范,則要看具體情況而定了
 
把NSData轉成NSString類型的數據
NSString * str = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];把NSString類型的數據轉成DocumentRoot類型的文件(DocumentRoot是第三方類庫提供的類,把數據轉成這種類型才能用第三方類庫進一步解析)
DocumentRoot * document = [Element parseHTML:str];取出所有的DIV:
NSArray * childSecond = [childEl selectElements:@"div"];
 
該塊DIV為 <div class="item"><div class="pic"><a href="http://www.weiphone.com/iPhone/news/2013-08-02/Come_the_Blackberry_company_to_send_BBM_for_iOS_beta_invites_560440.shtml"><img src="http://resource.weiphone.com/resource/h027/h73/img201308021306430.jpg" alt="" height="100" width="158" /></a></div><div class="head"><h3><a href="http://www.weiphone.com/iPhone/news/2013-08-02/Come_the_Blackberry_company_to_send_BBM_for_iOS_beta_invites_560440.shtml">快來了 黑莓公司發送BBM for iOS測試邀請</a></h3><div class="meta"><span class="timer" title="發表時間">2013/08/02 13:05</span><span class="line">|</span><a href="http://bbs.weiphone.com/u.php?uid=798198" class="author" title="作者"> 黃曉悶</a> <span class="line">|</span><a href="javascript:void(0);" class="link" title="文章來源">weiphone</a><div class="funs"><span class="view" title="瀏覽次數">2689</span><span class="line">|</span><a href="http://www.weiphone.com/iPhone/news/2013-08-02/Come_the_Blackberry_company_to_send_BBM_for_iOS_beta_invites_560440.shtml#comment" class="cmt" title="評論次數">5</a></div></div></div><div class="desc"><p>威鋒網 8 月 2 日消息,黑莓公司日前向 iOS 用戶發送了 BBM 的測試邀請,暗示著該服務的正式登陸已經進入最后階段。</p></div></div>
把class為item的DIV的下級DIV全部取出來放入childSecond
NSArray * childSecond = [childEl selectElements:@"div"];取標簽包圍的內容 Element * child = [secondEl selectElement:@"a"];
取標簽尖括號里的內容 new.detailURL = [child.attributes objectForKey:@"href"];
以下是一個完整的解析方法,解析的網頁為http://www.weiphone.com/iPhone/news/index_0.shtml的class為item的DIV
這個方法是在下載完成調用的,傳遞一個NSData類型的參數進去:
數據模型:
// // News.h // LookNewsProject // // Created by ibokan on 13-08-01. // Copyright (c) 2013年 laomaoshiba. All rights reserved. //#import <Foundation/Foundation.h>@interface News : NSObject//標題,發布時間,詳情鏈接,圖片鏈接,瀏覽次數,評價,類別,作者,簡介,來源 @property(copy,nonatomic)NSString * title, * publishTime, * detailURL, * imgURL, * viewTimes, * evaluateTimes, * category, * author, * intro, * origin; @end解析方法:
-(void) analyNews:(NSData *)data {//中文轉碼NSString * str = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];//NSLog(@"%@",str);//html解析DocumentRoot * document = [Element parseHTML:str];//以div分割NSArray * elements= [document selectElements:@"div"];//創建存儲數組NSMutableArray * newArr = [[NSMutableArray alloc]init];//循環解析for (Element* element in elements){if ([[element attribute:@"class"] isEqualToString:@"item"]){NSArray * childElement = [element childElements];//創建新聞實體News * new = [[News alloc]init];int i=0;for(Element* childEl in childElement){i++;NSArray * childSecond = [childEl selectElements:@"div"];for(Element * secondEl in childSecond){if([[secondEl attribute:@"class"] isEqualToString:@"pic"]){Element * child = [secondEl selectElement:@"a"];//獲取詳細信息//NSLog(@"詳細鏈接:%@",[child.attributes objectForKey:@"href"]);new.detailURL = [child.attributes objectForKey:@"href"];//獲取圖片鏈接//NSLog(@"圖片鏈接:%@",[[child selectElement:@"img"].attributes objectForKey:@"src"]);new.imgURL = [[child selectElement:@"img"].attributes objectForKey:@"src"];}else if([[secondEl attribute:@"class"] isEqualToString:@"head"]){//獲取新聞標題//NSLog(@"標題:%@",[[secondEl selectElement:@"a"] contentsSource]);new.title = [[secondEl selectElement:@"a"] contentsSource];}else if([[secondEl attribute:@"class"] isEqualToString:@"meta"]){//獲取作者//NSLog(@"test-------->%@",[[secondEl selectElement:@"div"] contentsSource]);//NSLog(@"作者:%@",[[secondEl selectElement:@"a"] contentsSource]);new.author = [[secondEl selectElement:@"a"] contentsSource];//獲取發表時間//NSLog(@"發表時間:%@",[[secondEl selectElement:@"span"] contentsSource]);new.publishTime = [[secondEl selectElement:@"span"] contentsSource];//獲取來源Element * originEl = [[secondEl selectElements:@"a"] objectAtIndex:1] ;//NSLog(@"來源:%@",[originEl contentsSource]);new.origin = [originEl contentsSource];}else if([[secondEl attribute:@"class"] isEqualToString:@"funs"]){//獲取瀏覽//NSLog(@"瀏覽次數:%@",[[secondEl selectElement:@"span"] contentsSource]);new.viewTimes = [[secondEl selectElement:@"span"] contentsSource];//獲取評價次數//NSLog(@"評價次數:%@",[[secondEl selectElement:@"a"] contentsSource]);new.evaluateTimes = [[secondEl selectElement:@"a"] contentsSource];}else if([[secondEl attribute:@"class"] isEqualToString:@"desc"]){//獲取簡介//NSLog(@"簡介:%@",[[secondEl selectElement:@"p"] contentsSource]);new.intro = [[secondEl selectElement:@"p"] contentsSource];}}}[newArr addObject:new];}}[str release];[newArr release];}總結
以上是生活随笔為你收集整理的使用第三方类库对html进行解析的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: Navicat新建查询系统找不到指定路径
- 下一篇: textblock字体居中 wpf_M#
