生活随笔
收集整理的這篇文章主要介紹了
Java初始化省市区三级数据
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
使用Jsoup爬蟲工具獲取全國地區(qū)數(shù)據(jù)(省市縣鎮(zhèn)村)
最近新做一個項目,要在數(shù)據(jù)庫初始化省市區(qū)三級數(shù)據(jù),所以在網(wǎng)上找了個爬蟲工具,從國家統(tǒng)計局區(qū)劃代碼網(wǎng)站爬取了相關(guān)數(shù)據(jù)。具體原理不解釋了,只要能實現(xiàn)功能就OK。
- 首先需要導(dǎo)入Jsoup相關(guān)依賴,數(shù)據(jù)庫和spring的依賴就不用我再說了吧!!
<dependency><groupId>org.jsoup
</groupId><artifactId>jsoup
</artifactId><version>1.14.2
</version>
</dependency>
@Data
@AllArgsConstructor
@NoArgsConstructor
@TableName(value
= "area")
public class Area implements Serializable {@TableId(value
= "id")private String id
;@TableField(value
= "parentId")private String parentId
;@TableField(value
= "areaName")private String areaName
;@TableField(value
= "level")private Integer level
;
}
- 接著編寫代碼,從http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2020/這個網(wǎng)站爬取數(shù)據(jù)并存入數(shù)據(jù)庫,這個網(wǎng)站包含了全國省市縣鎮(zhèn)鄉(xiāng)所有數(shù)據(jù),但是根據(jù)業(yè)務(wù)需要,本人業(yè)務(wù)只需要省市縣數(shù)據(jù)即可,各位可以根據(jù)自己的業(yè)務(wù)需求,修改相應(yīng)代碼。
package com.yckj.appauth.service.impl;import com.baomidou.mybatisplus.extension.service.IService;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.yckj.appauth.mapper.InitAreaMapper;
import com.yckj.appauth.service.InitAreaService;
import com.yckj.common.entity.Area;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
@Service
public class InitAreaServiceImpl extends ServiceImpl<InitAreaMapper, Area> implements InitAreaService {@Autowiredprivate InitAreaMapper initAreaMapper
;private static Map<Integer, String> cssMap
= new HashMap<>();static {cssMap
.put(1, "provincetr");cssMap
.put(2, "citytr");cssMap
.put(3, "countytr");}public void initArea() {int level
= 1;Document connect
= connect("http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2020/");Elements rowProvince
= connect
.select("tr." + cssMap
.get(level
));System.out
.println("區(qū)劃代碼****上級代碼****區(qū)域名稱****區(qū)域等級");for (Element provinceElement
: rowProvince
) {Elements select
= provinceElement
.select("a");for (Element province
: select
) {String strprovince
= province
.toString();String areaName
= strprovince
.substring(strprovince
.indexOf(".html\">") + 7, strprovince
.indexOf("<br></a>"));String areaCode
= "";switch (areaName
) {case "北京市":areaCode
= "110000";break;case "天津市":areaCode
= "120000";break;case "河北省":areaCode
= "130000";break;case "山西省":areaCode
= "140000";break;case "內(nèi)蒙古自治區(qū)":areaCode
= "150000";break;case "遼寧省":areaCode
= "210000";break;case "吉林省":areaCode
= "220000";break;case "黑龍江省":areaCode
= "230000";break;case "上海市":areaCode
= "310000";break;case "江蘇省":areaCode
= "320000";break;case "浙江省":areaCode
= "330000";break;case "安徽省":areaCode
= "340000";break;case "福建省":areaCode
= "350000";break;case "江西省":areaCode
= "360000";break;case "山東省":areaCode
= "370000";break;case "河南省":areaCode
= "410000";break;case "湖北省":areaCode
= "420000";break;case "湖南省":areaCode
= "430000";break;case "廣東省":areaCode
= "440000";break;case "廣西壯族自治區(qū)":areaCode
= "450000";break;case "海南省":areaCode
= "460000";break;case "重慶市":areaCode
= "500000";break;case "四川省":areaCode
= "510000";break;case "貴州省":areaCode
= "520000";break;case "云南省":areaCode
= "530000";break;case "西藏自治區(qū)":areaCode
= "540000";break;case "陜西省":areaCode
= "610000";break;case "甘肅省":areaCode
= "620000";break;case "青海省":areaCode
= "630000";break;case "寧夏回族自治區(qū)":areaCode
= "640000";break;case "新疆維吾爾自治區(qū)":areaCode
= "650000";break;}Area area
= new Area();area
.setId(areaCode
);area
.setParentId("root");area
.setAreaName(areaName
);area
.setLevel(1);initAreaMapper
.insert(area
);System.out
.println(areaCode
+ "****root****" + areaName
+ "****" + 1);parseNextLevel(areaCode
, province
, level
+ 1);}}System.out
.println("執(zhí)行完畢");}private void parseNextLevel(String parentId
, Element parentElement
, int level
) {try {Thread.sleep(500);} catch (InterruptedException e
) {e
.printStackTrace();}Document doc
= connect(parentElement
.attr("abs:href"));if (doc
!= null) {Elements newsHeadlines
= doc
.select("tr." + cssMap
.get(level
));for (Element element
: newsHeadlines
) {printInfo(parentId
, element
, level
+ 1);String code
= element
.select("td").first().text();Elements select
= element
.select("a");if (select
.size() != 0) {parseNextLevel(code
.substring(0, 6), select
.last(), level
+ 1);}}}}private void printInfo(String parentId
, Element element
, int level
) {String code
= element
.select("td").first().text();Area area
= new Area();area
.setId(code
.substring(0, 6));area
.setParentId(parentId
);area
.setAreaName(element
.select("td").last().text());area
.setLevel(level
- 1);initAreaMapper
.insert(area
);System.out
.println(area
.getId() + "****" + area
.getParentId() + "****" + area
.getAreaName() + "****" + (level
- 1));}private static Document connect(String url
) {if (url
== null || url
.isEmpty()) {throw new IllegalArgumentException("The input url('" + url
+ "') is invalid!");}try {return Jsoup.connect(url
).timeout(100 * 1000).get();} catch (IOException e
) {e
.printStackTrace();return null;}}
}
感謝:本人是根據(jù)前輩的代碼改寫形成的,下面附上前輩博客鏈接:Jsoup獲取全國地區(qū)數(shù)據(jù)(省市縣鎮(zhèn)村)。
總結(jié)
以上是生活随笔為你收集整理的Java初始化省市区三级数据的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。