生活随笔
收集整理的這篇文章主要介紹了
JSOUP初探
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
JSOUP是偶然看到的一個處理HTML的JAVA 類庫,其官方網址是:http://jsoup.org/
1、編寫相關的試用程序(只需要在工程中引用jsoup-1.3.3.jar即可):
[java] view plaincopyprint?
import?java.io.File;??import?java.io.IOException;????import?org.jsoup.Jsoup;??import?org.jsoup.nodes.Document;??import?org.jsoup.select.Elements;????public?class?Test?{??????public?static?void?main(String[]?args)?{??????????Test?t?=?new?Test();??????????t.parseFile();??????}????????public?void?parseString()?{??????????String?html?=?"<html><head><title>blog</title></head><body?οnlοad='test()'><p>Parsed?HTML?into?a?doc.</p></body></html>";??????????Document?doc?=?Jsoup.parse(html);??????????System.out.println(doc);??????????Elements?es?=?doc.body().getAllElements();??????????System.out.println(es.attr("onload"));??????????System.out.println(es.select("p"));??????}????????public?void?parseUrl()?{??????????try?{??????????????Document?doc?=?Jsoup.connect("http://www.baidu.com/").get();??????????????Elements?hrefs?=?doc.select("a[href]");??????????????System.out.println(hrefs);??????????????System.out.println("------------------");??????????????System.out.println(hrefs.select("[href^=http]"));??????????}?catch?(IOException?e)?{??????????????e.printStackTrace();??????????}??????}????????public?void?parseFile()?{??????????try?{??????????????File?input?=?new?File("input.html");??????????????Document?doc?=?Jsoup.parse(input,?"UTF-8");????????????????????????????Elements?codes?=?doc.body().select("td[title^=IA]?>?a[href^=javascript:view]");??????????????System.out.println(codes);??????????????System.out.println("------------------");??????????????System.out.println(codes.html());??????????}?catch?(IOException?e)?{??????????????e.printStackTrace();??????????}??????}??}?? import java.io.File;
import java.io.IOException;import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;public class Test {public static void main(String[] args) {Test t = new Test();t.parseFile();}public void parseString() {String html = "<html><head><title>blog</title></head><body οnlοad='test()'><p>Parsed HTML into a doc.</p></body></html>";Document doc = Jsoup.parse(html);System.out.println(doc);Elements es = doc.body().getAllElements();System.out.println(es.attr("onload"));System.out.println(es.select("p"));}public void parseUrl() {try {Document doc = Jsoup.connect("http://www.baidu.com/").get();Elements hrefs = doc.select("a[href]");System.out.println(hrefs);System.out.println("------------------");System.out.println(hrefs.select("[href^=http]"));} catch (IOException e) {e.printStackTrace();}}public void parseFile() {try {File input = new File("input.html");Document doc = Jsoup.parse(input, "UTF-8");// 提取出所有的編號Elements codes = doc.body().select("td[title^=IA] > a[href^=javascript:view]");System.out.println(codes);System.out.println("------------------");System.out.println(codes.html());} catch (IOException e) {e.printStackTrace();}}
}
?
2、parseString的輸出:
[java] view plaincopyprint?
<html>???<head>????<title>blog</title>???</head>???<body?οnlοad="test()">????<p>Parsed?HTML?into?a?doc.</p>???</body>??</html>??test()????<p>Parsed?HTML?into?a?doc.</p>?? <html><head><title>blog</title></head><body οnlοad="test()"><p>Parsed HTML into a doc.</p></body>
</html>
test()<p>Parsed HTML into a doc.</p>
?
3、parseUrl的輸出:
[java] view plaincopyprint?
<a?href="/gaoji/preferences.html">設置</a>??<a?href="http://passport.baidu.com/?login&tpl=mn">登錄</a>??<a?href="http://news.baidu.com">新?聞</a>??<a?href="http://tieba.baidu.com">貼?吧</a>??<a?href="http://zhidao.baidu.com">知?道</a>??<a?href="http://mp3.baidu.com">MP3</a>??<a?href="http://image.baidu.com">圖?片</a>??<a?href="http://video.baidu.com">視?頻</a>??<a?href="http://map.baidu.com">地?圖</a>????<a?href="#"?name="ime_hw">手寫</a>????<a?href="#"?name="ime_py">拼音</a>????<a?href="#"?name="ime_cl">關閉</a>??<a?href="http://hi.baidu.com">空間</a>??<a?href="http://baike.baidu.com">百科</a>??<a?href="http://www.hao123.com">hao123</a>??<a?href="/more/">更多>></a>??<a?id="st"?οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')"?href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a>??<a?href="http://e.baidu.com/?refer=888">加入百度推廣</a>??<a?href="http://top.baidu.com">搜索風云榜</a>??<a?href="http://home.baidu.com">關于百度</a>??<a?href="http://ir.baidu.com">About?Baidu</a>??<a?href="/duty/">使用百度前必讀</a>??<a?href="http://www.miibeian.gov.cn"?target="_blank">京ICP證030173號</a>??------------------??<a?href="http://passport.baidu.com/?login&tpl=mn">登錄</a>??<a?href="http://news.baidu.com">新?聞</a>??<a?href="http://tieba.baidu.com">貼?吧</a>??<a?href="http://zhidao.baidu.com">知?道</a>??<a?href="http://mp3.baidu.com">MP3</a>??<a?href="http://image.baidu.com">圖?片</a>??<a?href="http://video.baidu.com">視?頻</a>??<a?href="http://map.baidu.com">地?圖</a>??<a?href="http://hi.baidu.com">空間</a>??<a?href="http://baike.baidu.com">百科</a>??<a?href="http://www.hao123.com">hao123</a>??<a?id="st"?οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')"?href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a>??<a?href="http://e.baidu.com/?refer=888">加入百度推廣</a>??<a?href="http://top.baidu.com">搜索風云榜</a>??<a?href="http://home.baidu.com">關于百度</a>??<a?href="http://ir.baidu.com">About?Baidu</a>??<a?href="http://www.miibeian.gov.cn"?target="_blank">京ICP證030173號</a>?? <a href="/gaoji/preferences.html">設置</a>
<a href="http://passport.baidu.com/?login&tpl=mn">登錄</a>
<a href="http://news.baidu.com">新?聞</a>
<a href="http://tieba.baidu.com">貼?吧</a>
<a href="http://zhidao.baidu.com">知?道</a>
<a href="http://mp3.baidu.com">MP3</a>
<a href="http://image.baidu.com">圖?片</a>
<a href="http://video.baidu.com">視?頻</a>
<a href="http://map.baidu.com">地?圖</a><a href="#" name="ime_hw">手寫</a><a href="#" name="ime_py">拼音</a><a href="#" name="ime_cl">關閉</a>
<a href="http://hi.baidu.com">空間</a>
<a href="http://baike.baidu.com">百科</a>
<a href="http://www.hao123.com">hao123</a>
<a href="/more/">更多>></a>
<a id="st" οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')" href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a>
<a href="http://e.baidu.com/?refer=888">加入百度推廣</a>
<a href="http://top.baidu.com">搜索風云榜</a>
<a href="http://home.baidu.com">關于百度</a>
<a href="http://ir.baidu.com">About Baidu</a>
<a href="/duty/">使用百度前必讀</a>
<a href="http://www.miibeian.gov.cn" target="_blank">京ICP證030173號</a>
------------------
<a href="http://passport.baidu.com/?login&tpl=mn">登錄</a>
<a href="http://news.baidu.com">新?聞</a>
<a href="http://tieba.baidu.com">貼?吧</a>
<a href="http://zhidao.baidu.com">知?道</a>
<a href="http://mp3.baidu.com">MP3</a>
<a href="http://image.baidu.com">圖?片</a>
<a href="http://video.baidu.com">視?頻</a>
<a href="http://map.baidu.com">地?圖</a>
<a href="http://hi.baidu.com">空間</a>
<a href="http://baike.baidu.com">百科</a>
<a href="http://www.hao123.com">hao123</a>
<a id="st" οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')" href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a>
<a href="http://e.baidu.com/?refer=888">加入百度推廣</a>
<a href="http://top.baidu.com">搜索風云榜</a>
<a href="http://home.baidu.com">關于百度</a>
<a href="http://ir.baidu.com">About Baidu</a>
<a href="http://www.miibeian.gov.cn" target="_blank">京ICP證030173號</a>
?
3、parseFile的輸出:
[java] view plaincopyprint?
<a?href="javascript:view('67530','67530','0');">IA100908-002</a>????<a?href="javascript:view('67529','67529','0');">IA100908-001</a>????<a?href="javascript:view('67544','67544','0');">IA100908-016</a>????<a?href="javascript:view('67364','67364','0');">IA100903-008</a>????<a?href="javascript:view('67363','67363','0');">IA100903-007</a>????<a?href="javascript:view('66104','66104','0');">IA100710-013</a>????<a?href="javascript:view('57916','57916','0');">IA100515-013</a>????<a?href="javascript:view('56962','56962','0');">IA100430-022</a>????<a?href="javascript:view('66958','66958','0');">IA100830-001</a>????<a?href="javascript:view('66319','66319','0');">IA100713-003</a>????<a?href="javascript:view('66317','66317','0');">IA100713-001</a>????<a?href="javascript:view('66321','66321','0');">IA100713-005</a>????<a?href="javascript:view('66967','66967','0');">IA100830-010</a>????<a?href="javascript:view('66999','66999','0');">IA100831-001</a>????<a?href="javascript:view('67377','67377','0');">IA100904-004</a>????<a?href="javascript:view('67378','67378','0');">IA100904-005</a>????<a?href="javascript:view('3271','3271','0');">IA080115-031</a>??------------------??IA100908-002??IA100908-001??IA100908-016??IA100903-008??IA100903-007??IA100710-013??IA100515-013??IA100430-022??IA100830-001??IA100713-003??IA100713-001??IA100713-005??IA100830-010??IA100831-001??IA100904-004??IA100904-005??IA080115-031?? <a href="javascript:view('67530','67530','0');">IA100908-002</a><a href="javascript:view('67529','67529','0');">IA100908-001</a><a href="javascript:view('67544','67544','0');">IA100908-016</a><a href="javascript:view('67364','67364','0');">IA100903-008</a><a href="javascript:view('67363','67363','0');">IA100903-007</a><a href="javascript:view('66104','66104','0');">IA100710-013</a><a href="javascript:view('57916','57916','0');">IA100515-013</a><a href="javascript:view('56962','56962','0');">IA100430-022</a><a href="javascript:view('66958','66958','0');">IA100830-001</a><a href="javascript:view('66319','66319','0');">IA100713-003</a><a href="javascript:view('66317','66317','0');">IA100713-001</a><a href="javascript:view('66321','66321','0');">IA100713-005</a><a href="javascript:view('66967','66967','0');">IA100830-010</a><a href="javascript:view('66999','66999','0');">IA100831-001</a><a href="javascript:view('67377','67377','0');">IA100904-004</a><a href="javascript:view('67378','67378','0');">IA100904-005</a><a href="javascript:view('3271','3271','0');">IA080115-031</a>
------------------
IA100908-002
IA100908-001
IA100908-016
IA100903-008
IA100903-007
IA100710-013
IA100515-013
IA100430-022
IA100830-001
IA100713-003
IA100713-001
IA100713-005
IA100830-010
IA100831-001
IA100904-004
IA100904-005
IA080115-031
補充下,input.html的基本結果如圖:
總結
以上是生活随笔為你收集整理的JSOUP初探的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。