python爬取安居客房屋价格用地图表示出来
生活随笔
收集整理的這篇文章主要介紹了
python爬取安居客房屋价格用地图表示出来
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1. 房屋價格地圖
1.1 項目思路主要分成三個步驟
- 首先利用python爬取安居客網站上個區的房屋價格,并把房屋所在小區進行歸類,求出小區均價。
- 然后利用百度地圖api中的地理編碼,我們可以獲取小區所在的經緯度,注意這里不是所有小區都能準確獲取的,存在一定數量的小區無法獲取精確的地理坐標。
- 最后利用BDP線上分析可以繪制出如下小區均價地圖。
效果如下:
1.2 項目目錄
- get_data:用于python爬取安居客房屋價格并整理
- get_lnglat:用于百度地圖api中地理編碼,獲取小區的經緯度并整理
2. python爬取安居客房屋價格并整理
import datetime import reimport requests from lxml import etree import pandas as pd import json import math import random import time import numpy as np import urllib.request# 獲取安居客網站中某市二手房樓盤的每個區域的網址 def get_different_area_wang_zhi(url):mozilla = ["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"]headers = {"Cache-Control": "max-age=0","User-Agent": "{}".format(random.choice(mozilla)),"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Language": "zh-CN,zh;q=0.9"}response = requests.get(url, headers=headers).textre = etree.HTML(response)# 返回列表格式wang_zhi = re.xpath('//div[@id="content_Rd1"]/div[@class="clearfix"]/div[@class="details float_l"][1]/div[@class="areas"]/a/@href')return wang_zhi # 第一個網址為全部的小區的網址,應該去掉# 得到一個區域的二手房樓盤的總頁數(n) def get_one_area_number(url):mozilla = ["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"]headers = {"Cache-Control": "max-age=0","User-Agent": "{}".format(random.choice(mozilla)),"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Language": "zh-CN,zh;q=0.9"}response = requests.get(url, headers=headers).textre = etree.HTML(response)number = re.xpath('//div[@class="pagination"]/ul[@class="page"]/li[@class="page-item last"]/a/text()')if number == '':print('頁面沒有小區數據')else:n = int(number[0])return n# 爬取安居客的二手房小區信息 def anjuke_new(url, n, city_name):mozilla = ["Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36","Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.3; rv:11.0) like Gecko","Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)","Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)","Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)","Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)"]headers = {"Cache-Control": "max-age=0","User-Agent": "{}".format(random.choice(mozilla)),"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Language": "zh-CN,zh;q=0.9"}l = []for i in range(1, n + 1):if i == 1:url_load = urlelse:url_load = url + "p" + str(i) + '/'address = []mian_ji = []year = []try:print("正在爬取{}的第{}頁".format(url, i))res = requests.get(url_load, headers=headers).textf = etree.HTML(res)# 獲取當前頁所有小區的網址# 獲取房屋所在小區名字name = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[''@class="property-content-info-comm-name"]/text()')# 獲取房屋所在小區地址# address_tmp = f.xpath(# '//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[@class="property-content-info-comm-address"]/span/text()')# j = 0# while j < len(address_tmp):# address.append(address_tmp[i] + address_tmp[i + 1] + address_tmp[i + 2])# j += 3# 獲取房屋均價avg_price = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//div[@class="property-price"]/p[''@class="property-price-average"]/text()')mian_ji_tmp = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[''@class="property-content-info-text"][1]/text()')for item in mian_ji_tmp:mian_ji.append(item.strip())# 獲取房屋所在小區建設年份year_tmp = f.xpath('//section[@class="list"]/div[@tongji_tag="fcpc_ersflist_gzcount"]//p[@class="property-content-info-text"][4]/text()')for item in year_tmp:year.append(item.strip())except:print("爬取{}的第{}頁錯誤".format(url, i))with open(r"./anjuke_house_error.txt", "a") as f:f.write("{}第{}頁出錯\n".format(url, i))continueprint("爬取{}的第{}頁完成".format(url, i))for j in range(len(avg_price)):d = {}d["小區名稱"] = name[j]# d["小區地址"] = address[j]d["房屋均價"] = avg_price[j]d["房屋面積"] = mian_ji[j]try:d["建造年代"] = year[j]except IndexError:d["建造年代"] = ''l.append(d)data = pd.DataFrame(l)return datadef processing_data(data, city_name):date = datetime.datetime.now()year = date.yearmonth = date.monthday = date.daydate_list = str('-').join([str(year), str(month), str(day)])flag = Falsedata_prv = ''try:data_prv = pd.read_csv('{}_anjuke_house.csv'.format(city_name), encoding='utf_8_sig')except FileNotFoundError:flag = Truel_name = []l = []for i in data.index:data_name = data.loc[i, '小區名稱']if data_name in l_name:continueelse:data_tmp = data[data_name == data['小區名稱']]price = []reg = '^\d+'for j in data_tmp.index:price.append(int(re.findall(reg, data_tmp.loc[j, '房屋均價'])[0]))avg_price = np.mean(np.array(price))d = {}d['小區名稱'] = data_named['小區均價' + date_list] = avg_pricel.append(d)l_name.append(data_name)process_data = pd.DataFrame(l)process_data.to_csv('./{}_anjuke_house.csv'.format(city_name), index=False, encoding='utf_8_sig')if not flag:l_name = []data_prv['小區均價' + date_list] = ''for i in data.index:data_name = data.loc[i, '小區名稱']if data_name in l_name:continueelse:data_tmp = data[data_name == data['小區名稱']]price = []reg = '^\d+'for j in data_tmp.index:price.append(int(re.findall(reg, data_tmp.loc[j, '房屋均價'])[0]))avg_price = np.mean(np.array(price))if data_name in data_prv['小區名稱'].values:data_prv.loc[data_prv[data_prv['小區名稱'] == data_name].index, '小區均價' + date_list] = avg_pricel_name.append(data_name)else:d = {'小區名稱': data_name, '小區均價' + date_list: avg_price}data_prv.append(d, ignore_index=True)l_name.append(data_name)data_prv.to_csv('./{}_anjuke_house.csv'.format(city_name), index=False, encoding='utf_8_sig')# 安居客二手房主函數調用 def anjuke_second_main(url):wang_zhi = get_different_area_wang_zhi(url)for item in wang_zhi:n = get_one_area_number(item)reg = '\/([A-Za-z]+)\/$'city_name = re.findall(reg, item)[0]data = anjuke_new(item, n, city_name)processing_data(data, city_name)time.sleep(random.randint(10, 15))# 爬取其他城市修改url,只需要修改城市名稱的簡稱即可,例如西安是xa,廈門是xm,具體看安居客的網址。 if __name__ == "__main__":url = "https://wuhan.anjuke.com/?pi=PZ-baidu-pc-all-biaoti"# 用于儲存文件anjuke_second_main(url)- get_different_area_wang_zhi()
獲取安居客網站中某市二手房樓盤的每個區域的網址。
- get_one_area_number()
獲取二手房樓盤每個區域網址中一共有多少頁,我發現安居客所顯示的一共最多就是50頁,但是這不一定是該區域的所有房屋價格,應該是一個不完全統計。
- anjuke_new()
獲取二手房樓盤每個區域網址中能顯示的所有房屋信息。 - processing_data()
對獲取的數據進行處理,主要是把相同小區房屋的價格求平均以得到小區的均價。 - anjuke_second_main()
主函數
3. 利用百度地圖api中的地理編碼并整理
import json import os from urllib.request import urlopen from urllib.parse import quoteimport pandas as pd import requestsdef getfilepath(path):list_name = []for file in os.listdir(path):file_path = os.path.join(path, file)if os.path.isdir(file_path):passelse:if file_path.endswith('.csv'):list_name.append(file_path)return list_namedef getlnglat(address, city):flag = 1url = 'http://api.map.baidu.com/geocoding/v3/'output = 'json'ak = 'sQMfKlPK4yCefIC28eb3Hn3QTO9MzEUV'city = quote(city)address_url = quote(address)qing_qiu = quote('請求')uri = url + '?' + 'city=' + city + '&address=' + address_url + '&output=' + output + '&ak=' + ak + '&callback=showLocation' + '//GET' + qing_qiutry:while flag != 0:res = requests.get(uri).texttemp = json.loads(res) # 將字符串轉化為jsonflag = temp['status']except:print('發送請求失敗')if temp['result']['level'] != '區縣' and temp['result']['level'] != '城市':d = {'小區名稱': address, 'lat': temp['result']['location']['lat'], 'lng': temp['result']['location']['lng']}return d # 緯度 latitude,經度 longitudeelse:print('地址: ' + address + ' 非地產小區: ' + temp['result']['level'] + ' lat ' + str(temp['result']['location']['lat']) + ' lng ' + str(temp['result']['location']['lng']))return 0def add_lnglat(path):city = '武漢市'area = ''if path.find('caidianz') != -1:area = '蔡甸區'if path.find('dongxihu') != -1:area = '東西湖區'if path.find('hannanz') != -1:area = '漢南區'if path.find('hanyang') != -1:area = '漢陽區'if path.find('hongshana') != -1:area = '洪山區'if path.find('huangpiz') != -1:area = '黃陂區'if path.find('jiangan') != -1:area = '江岸區'if path.find('jiangxiat') != -1:area = '江夏區'if path.find('qiaokou') != -1:area = '硚口區'if path.find('qingshan') != -1:area = '青山區'if path.find('wuchanga') != -1:area = '武昌區'if path.find('xinzhouz') != -1:area = '新洲區'if path.find('zhuankouk') != -1:area = '沌口'data = pd.read_csv(path, encoding='utf_8_sig')village_names = data['小區名稱'].valuesfor village_name in village_names:coordinate = getlnglat(village_name, city + area)if coordinate != 0:data.loc[data[data['小區名稱'] == coordinate['小區名稱']].index, 'lat'] = coordinate['lat']data.loc[data[data['小區名稱'] == coordinate['小區名稱']].index, 'lng'] = coordinate['lng']data.to_csv(path, index=False, encoding='utf_8_sig')if __name__ == '__main__':list_name = getfilepath('./')for item in list_name:add_lnglat(item)- getfilepath()
獲取當前目錄下面的所有csv文件。 - getlnglat()
獲取一個小區地址所對應的經緯度,并對經緯度是否合理做判斷。 - add_lnglat()
把經緯度信息添加到原先的csv文件中,方便后續的作圖。
4. 利用BDP進行數據處理
網址https://me.bdp.cn/index.html#/
通過添加數據設置經緯度和把房屋均價用顏色標識即可實現。
總結
以上是生活随笔為你收集整理的python爬取安居客房屋价格用地图表示出来的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 离线语音识别芯片对比
- 下一篇: Firefox常用扩展