当当图书分类html,基于httpclient与jsoup的抓取当当图书页面数据简单Demo
public classTest {/***
*簡單抓取當當圖書分類中某一頁指定信息輸出到控制臺并保存到文件中*/
public static void main(String[] args) throwsIOException {
CloseableHttpClient httpclient=HttpClients.createDefault();//創(chuàng)建一個文件,用來保存信息
BufferedWriter writer=new BufferedWriter(new FileWriter("D:\book.csv"));try{//發(fā)送請求URL填入當當網(wǎng)圖書分類某一頁面的地址
HttpGet httpget = new HttpGet("http://category.dangdang.com/cp01.36.04.08.00.00.html");
System.out.println("Executing request " +httpget.getRequestLine());//Create a custom response handler
ResponseHandler responseHandler = response ->{int status =response.getStatusLine().getStatusCode();if (status >= 200 && status < 300) {
HttpEntity entity=response.getEntity();return entity != null ? EntityUtils.toString(entity) : null;
}else{throw new ClientProtocolException("Unexpected response status: " +status);
}
};//得到請求體也就是頁面源碼responseBody
String responseBody =httpclient.execute(httpget, responseHandler);
System.out.println("----------------------------------------");//使用Jsoup解析得到一個document對象,代表這個頁面
Document document=Jsoup.parse(responseBody);//這是人為分析源碼中的數(shù)據(jù)后,取docunment中需要的元素
Element pos=document.getElementsByClass("bigimg").get(0);
Elements list=pos.children();for(Element e:list){
Element name= e.getElementsByClass("pic").get(0);
Element detail= e.getElementsByClass("detail").get(0);
Element author= e.getElementsByAttributeValue("name","itemlist-author").get(0);
Element press= e.getElementsByAttributeValue("name","P_cbs").get(0);
Element market= e.getElementsByClass("search_pre_price").get(0);
Element sale= e.getElementsByClass("search_now_price").get(0);
System.out.println("圖書名:"+name.attr("title"));
System.out.println("簡介:"+detail.text());
System.out.println("作者:"+author.text());
System.out.println("出版社:"+press.text());
System.out.println("市場價:"+market.text());
System.out.println("驚喜價:"+sale.text());
System.out.println("--------------------");//添加要寫入文件的信息
writer.write(name.attr("title")+","+detail.text()+","+author.text()+","+press.text());
writer.newLine();
}
}finally{
writer.close();
httpclient.close();
}
}
}
總結(jié)
以上是生活随笔為你收集整理的当当图书分类html,基于httpclient与jsoup的抓取当当图书页面数据简单Demo的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 如何在手机上查看APP原型
- 下一篇: 三款免费好用的Gif录屏神器