PHP爬取历史天气
PHP爬取歷史天氣
PHP作為宇宙第一語言,爬蟲也是非常方便,這里爬取的是從天氣網(wǎng)獲得中國城市歷史天氣統(tǒng)計(jì)結(jié)果。
程序架構(gòu)
main.php
<?phpinclude_once("./parser.php");include_once("./storer.php");#解析器和存儲器見下文$parser = new parser();$storer = new storer();#獲得url列表$urlList = $parser->getCityList("http://lishi.tianqi.com/");#依次解析新的URL網(wǎng)站內(nèi)容,并存到數(shù)據(jù)庫中foreach($urlList as $url){$data = $parser->getData($url);$storer->store($data);}解析器
解析器提供兩個(gè)接口,一個(gè)是解析主頁,獲得url列表;另一個(gè)是解析每座城市的數(shù)據(jù),獲得該城市的歷史天氣數(shù)據(jù)。
這里使用到的解析庫是phpquery,使用JQuery的查詢方式,簡單高效。
<?php#借助JQuery庫解析include_once("./phpQuery-onefile.php"); class parser {//獲取城市url列表function getCityList($url){//直接在線流下載phpQuery::newDocumentFile($url);//第一次選擇$links = pq(".bcity *");$urlList = [];foreach ($links as $link) {#第二次選擇$tmp = pq($link)->find('a')->attr('href');#過濾組標(biāo)簽if ($tmp!="#" and $tmp!="") {#檢查urlif(strpos($tmp,"-")==false and filter_var($tmp, FILTER_VALIDATE_URL))$urlList[] = $tmp; #添加URL列表}}return $urlList;}//獲取某個(gè)城市的歷史氣候function getData($url){//直接在線流下載phpQuery::newDocumentFile($url);//第一次選擇$text = pq("div .tqtongji p")->text();#匹配城市$city = $this->match("/,(.+)共出現(xiàn)/",$text);#匹配天氣$rainy = $this->match("/雨(\d+)天/",$text);$cloudy = $this->match("/多云(\d+)天/",$text);$sunny = $this->match("/晴(\d+)天/",$text);$overcast = $this->match("/陰(\d+)天/",$text); #為了跟cloudy區(qū)分$snowy = $this->match("/雪(\d+)天/",$text);#匹配拼音$pinYin = $this->match("/http:\/\/lishi\.tianqi\.com\/(.*?)\/index\.html/",$url);$result["url"] = $url;$result["city"] = $city;$result["pinYin"] = $pinYin;$result["rainy"] = $rainy;$result["cloudy"] = $cloudy;$result["sunny"] = $sunny;$result["overcast"] = $overcast;$result["snowy"] = $snowy;return $result;}#正則解析function match($rule,$text){preg_match_all($rule, $text, $result);#有些地區(qū)不是所有天氣都有if(count($result[1])==0)return "0";return $result[1][0];} }存儲器
使用MySQLi接口即可,代碼如下:
<?phpclass storer{public $mysqli;function __construct(){$this->mysqli = new mysqli('localhost', '***', '******', 'phpWeather');$this->mysqli->query("SET NAMES UTF8");}function store($data){$url = $data["url"];$city = $data["city"];$pinYin = $data["pinYin"];$rainy = $data["rainy"];$cloudy = $data["cloudy"];$sunny = $data["sunny"];$overcast = $data["overcast"];$snowy = $data["snowy"];#字符串在插入時(shí)要添加''來區(qū)分$insertData = "VALUES('$city','$pinYin',$rainy,$cloudy,$sunny,$overcast,$snowy,'$url');";#sql分開寫更加清楚$sql = "INSERT INTO record(city,pinYin,rainy,cloudy,sunny,overcast,snowy,url)".$insertData;$isok = $this->mysqli->query($sql);if($isok){echo "$city 數(shù)據(jù)添加成功\n";}else{echo $sql . "\n";echo "$city 數(shù)據(jù)添加失敗\n";}}function __destruct(){$this->mysqli->close();}} ?>爬蟲結(jié)果
共爬取了3119座城市的從2011年到現(xiàn)在的歷史天氣,接下來的數(shù)據(jù)分析以及可視化留到下一篇博客講述。
轉(zhuǎn)載于:https://www.cnblogs.com/fanghao/p/7496469.html
總結(jié)
- 上一篇: 面向对象加载类
- 下一篇: 1289 大鱼吃小鱼