python分析微博粉丝_python爬虫,对粉丝夺宝的一次数据分析
新浪微博不久前出了一個(gè)類似網(wǎng)易一元多寶的產(chǎn)品。許多人質(zhì)疑,那些中大獎(jiǎng)的人都是新浪請(qǐng)的托,具體是不是我也不知道,所以呢,我覺(jué)得有必要找到所有中獎(jiǎng)用戶。
首先,看這個(gè)URL:http://1.weibo.com/profilehis?winner=1&uid=2860976304,它返回這個(gè)uid=2860976304的用戶的中獎(jiǎng)情況。
除此之外,我還找到一個(gè)接口,返回也是用戶的中獎(jiǎng)情況,比較好的是,它的返回是json格式,這樣處理起來(lái)就簡(jiǎn)單多了。
POST http://1.weibo.com/aj/page/Profileother HTTP/1.1
Host: 1.weibo.com
Proxy-Connection: keep-alive
Content-Length: 26
Cache-Control: max-age=0
Origin: http://burp
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://burp/show/3
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.8
Cookie: SINAGLOBAL=5042382967658.341.1454320477597; un=bjmiaoyin2006@yahoo.com.cn; wvr=6; UOR=,,login.sina.com.cn; SCF=AkD5go8FLF4mKVF2-hrc9BU_XIeLwRymEqgOVOtZEk07uzYB0zYwQpPnpt99rxQJBciji219PdUo5s7_BoUBbmA.; SUB=_2A251E2q4DeTxGeVL71cR8i7JwzuIHXVWadtwrDV8PUNbmtAKLVfzkW-AMtvP5SgHzA-5Bi3jRGlhOL_Kdw..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WW8O9jiA56-HHn0H2.s1mvB5JpX5KMhUgL.FoefSh-7eo5f1hM2dJLoIXnLxKqL1-BL12-LxK-L12qLB-zLxK-L1h-LB.BLxK-LBo5LBo2LxK-L1-zL1-zLxKqL1-BL12-LxK-L12qLB-zLxK-L1h-LB.Bt; SUHB=0gHiV0NSuPLx_o; ALF=1509445224; SSOLoginState=1477909224; _s_tentry=-; Apache=8995918969370.725.1477909230337; ULV=1477909230350:605:126:1:8995918969370.725.1477909230337:1477672544934
然而,我們要想找到所有用戶的中獎(jiǎng)情況,那么必須得到所有參加粉絲奪寶用戶的uid,要不然,微博用戶這么多,我們每一個(gè)都要去看它的中獎(jiǎng)情況會(huì)產(chǎn)生大量的垃圾數(shù)據(jù)。所以,看下面這個(gè)接口:
POST http://1.weibo.com/aj/goods/goodsactors HTTP/1.1
Host: 1.weibo.com
Proxy-Connection: keep-alive
Content-Length: 26
Cache-Control: max-age=0
Origin: http://burp
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://burp/show/1
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.8
Cookie: SINAGLOBAL=5042382967658.341.1454320477597; un=bjmiaoyin2006@yahoo.com.cn; wvr=6; UOR=,,login.sina.com.cn; SCF=AkD5go8FLF4mKVF2-hrc9BU_XIeLwRymEqgOVOtZEk07uzYB0zYwQpPnpt99rxQJBciji219PdUo5s7_BoUBbmA.; SUB=_2A251E2q4DeTxGeVL71cR8i7JwzuIHXVWadtwrDV8PUNbmtAKLVfzkW-AMtvP5SgHzA-5Bi3jRGlhOL_Kdw..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WW8O9jiA56-HHn0H2.s1mvB5JpX5KMhUgL.FoefSh-7eo5f1hM2dJLoIXnLxKqL1-BL12-LxK-L12qLB-zLxK-L1h-LB.BLxK-LBo5LBo2LxK-L1-zL1-zLxKqL1-BL12-LxK-L12qLB-zLxK-L1h-LB.Bt; SUHB=0gHiV0NSuPLx_o; ALF=1509445224; SSOLoginState=1477909224; _s_tentry=-; Apache=8995918969370.725.1477909230337; ULV=1477909230350:605:126:1:8995918969370.725.1477909230337:1477672544934
這個(gè)接口返回的是參加這個(gè)pid=42077的所有用戶信息 ,包括uid,ip地址,地理位置,參加的時(shí)間等。
有了這個(gè),配合爬蟲(chóng),遍歷所有pid,就能得到全部參加粉絲奪寶的uid,這樣利用上面的那個(gè)接口,就可以找到中獎(jiǎng)的用戶了。
寫(xiě)兩個(gè)python的腳本配合一下。
第一個(gè)find_uid.py
#! /usr/bin/env python
# coding=utf-8
# author=ntwu
import requests
import json
import sys
import time
import threadpool as tp
headers_fake = {
"Host":"1.weibo.com",
"Accept":"application/json",
"X-Requested-With":"XMLHttpRequest",
"Accept-Language":"zh-cn",
"Accept-Encoding":"gzip, deflate",
"Content-Type":"application/x-www-form-urlencoded",
"Origin":"http://1.weibo.com",
'Connection': 'close',
"User-Agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1",
"Referer":"http://1.weibo.com/goodsunravel?pid=14875",
"Cookie":"ULV=1476417103515:6:4:4:5995306923612.945.1476417103443:1476368007922; _s_tentry=-; Apache=5995306923612.945.1476417103443; UOR=widget.weibo.com,; ALF=1476447308; SUB=_2A2563U8cDeTxGeVL71cR8i7JwzuIHXVWPlFUrDV8PUJbkNANLXHdkW15ppLDnd47HE2FVJkLel1wrmK7mA..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WW8O9jiA56-HHn0H2.s1mvB5JpX5oz75NHD95Q0SKBfehz7SKnNWs4Dqc_zi--ciKL2iKy8i--fiKysi-8Fi--fiKnfi-i2i--fi-z7i-zpi--fiKLFiKLFi--ciKL2iKy8i--fiKysi-8Fi--fiKnfi-i2; SCF=Ar1U69gcKJHekMMzG5YnjaDnjG9TWgynF18HHlmDTXKIOHWqdHZYTqfhPhFaH7D1JUVf_uiSD153weX0aorAyhM.; SUHB=0M2F_WriiNbkqY; SINAGLOBAL=6988002809230.238.1461669767154",
}
url = "http://1.weibo.com/aj/goods/goodsactors"
pids = []
for i in range(14847,54847):
pids.append(i)
code_status=""
time_start = time.time()
reload(sys)
sys.setdefaultencoding('utf-8')
def start(test):
while True :
pid = pids.pop()
audiData = {
"pid":pid,
"page":1,
"key":0,
}
while True:
r =requests.post(url,headers=headers_fake,data=audiData,)
all_data = json.loads(r.content)
audiData['page']+=1
if all_data['data'] == []:
break
else:
for d in all_data['data']:
try:
burp_success = open('duobao_account_uid_all.txt', 'a+')
burp_success.write(d['uid']+"\n")
burp_success.close()
time_end = time.time()
except Exception,e:
print all_data
pass
args = [
['http://xxx.com', 'test'],
]
pool = tp.ThreadPool(200)
reqs = tp.makeRequests(start, args)
[pool.putRequest(req) for req in reqs]
pool.wait()
大約二十分鐘后,
至此,就可以查看這些用戶的中獎(jiǎng)情況了??聪旅孢@個(gè)腳本,find_won.py
#! /usr/bin/env python
# coding=utf-8
# author=ljs
import codecs
import requests
import json
import sys
import threadpool as tp
import time
headers_fake = {
'Host': '1.weibo.com',
'Accept': 'application/json',
'X-Requested-With': 'XMLHttpRequest',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-cn',
'Content-Type': 'application/x-www-form-urlencoded',
'Origin': 'http://1.weibo.com',
'Content-Length': '30',
'Connection': 'close',
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Mobile/14A456 Weibo (iPhone8,2__weibo__6.10.2__iphone__os10.0.2)',
'Referer': 'http://1.weibo.com/profilehis?uid=1764571925',
'Cookie': '_s_tentry=-; Apache=7433541909010.6455.1477650674104; SINAGLOBAL=7433541909010.6455.1477650674104; ULV=1477650674175:1:1:1:7433541909010.6455.1477650674104:; SUB=_2A2569oNjDeThGeVL71cR8i7JwzuIHXVWazcrrDV8PUJbitANLWjdkWuBBHv6s4H45nFLilyDjupLYZMaCg..; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WW8O9jiA56-HHn0H2.s1mvB5NHD95Q0SKBfehz7SKnNWs4DqcjMi--NiK.Xi-2Ri--ciKnRi-zNe0-XSK5Eeh-RS7tt; SCF=AgXi0Twa0slZFI74Y0Pve7kDAPZKPPBjXl2tcaDxP29Frab512QavT429OPislnVrg..; SUHB=0QS53ljQ_62EeR',
}
url = "http://1.weibo.com/aj/page/Profileother"
f_user = open('duobao_account_uid_34499.txt', 'r')
time_start = time.time()
reload(sys)
sys.setdefaultencoding('utf-8')
def start(test):
flag =0
for user in f_user.readlines():
flag +=1
postdata = {
'uid':user[:-1],
'type':'won',
'page':'1',
}
requests.adapters.DEFAULT_RETRIES = 5
r= requests.post(url,data=postdata,headers=headers_fake,timeout=5)
all_data = json.loads(r.content)
d = all_data['data']
if d['list'] != []:
luck = open('luck.txt','a+')
success = codecs.open('won2.json' ,'a+',encoding='utf-8')
line = json.dumps(d['list']) + "\n"
success.write(line.decode('unicode_escape'))
success.write(user+"\n")
success.close()
luck.write(user)
luck.close()
print("%s::%s"%(user,flag))
args = [
['http://xxx.com', 'test'],
]
pool = tp.ThreadPool(200)
reqs = tp.makeRequests(start, args)
[pool.putRequest(req) for req in reqs]
pool.wait()
這是這些用戶的中獎(jiǎng)情況,
以及中過(guò)獎(jiǎng)用戶的uid:
所以我覺(jué)得這真的是個(gè)運(yùn)氣游戲???
over!!
總結(jié)
以上是生活随笔為你收集整理的python分析微博粉丝_python爬虫,对粉丝夺宝的一次数据分析的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 电容三点式LC振荡器工作原理
- 下一篇: ANSYS APDL学习(8):选取全部