python3今日头条App电商数据抓取
最近有幾個同學讓我們幫忙抓取今日頭條app的數據,有的同學需要頭條app?的廣告數據,有的同學需要電商資訊的數據,之前已經在博客中發布過頭條app的廣告數據,這里我就來用電商的數據來給大家講解。
1.想要抓到app的數據,就需要先抓到相應的接口,這里給大家推薦使用Charles工具來抓接口。具體怎么抓取接口的方法這里就不介紹了,大家可以去百度,我這里直接給出接口。
http://is.snssdk.com/api/news/feed/v88/?list_count=229&support_rn=4&category=%E7%94%B5%E5%AD%90%E5%95%86%E5%8A%A1&refer=1&refresh_reason=1&session_refresh_idx=6&count=20&min_behot_time=1545011410&last_refresh_sub_entrance_interval=1545011511&loc_mode=7&tt_from=pull&plugin_enable=3&iid=52371751146&device_id=51411696552&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=699&version_name=6.9.9&device_platform=android&ab_version=629152%2C607361%2C609338%2C326532%2C644516%2C641414%2C645870%2C646379%2C645275%2C644943%2C622716%2C644221%2C621629%2C622134%2C622993%2C641037%2C649190%2C640997%2C641074%2C643790%2C631607%2C631595%2C643841%2C650077%2C554836%2C549647%2C644131%2C472443%2C649122%2C572465%2C649270%2C644058%2C615291%2C606549%2C442255%2C651222%2C645527%2C650134%2C630218%2C621153%2C546702%2C648932%2C281291%2C632887%2C641825%2C622042%2C325616%2C649524%2C642450%2C634871%2C646070%2C625065%2C498375%2C638335%2C467514%2C640046%2C644240%2C631638%2C650567%2C648895%2C648270%2C595556%2C647947%2C640690%2C611287%2C647156%2C640178%2C486952%2C642202%2C571130%2C641921%2C638882%2C594582%2C239095%2C612191%2C641905%2C170988%2C643893%2C642341%2C594603%2C374119%2C641853%2C585064%2C520833%2C634646%2C649420%2C633720%2C550042%2C435215%2C603541%2C586999%2C633860%2C627125%2C649428%2C649497%2C614096%2C620526%2C522766%2C647910%2C416055%2C621360%2C643129%2C642529%2C639579%2C643098%2C545739%2C630235%2C558139%2C586260%2C555254%2C640008%2C635502%2C471406%2C603441%2C596392%2C550820%2C598626%2C644845%2C634911%2C646250%2C603386%2C603400%2C603403%2C603405%2C642681%2C649811%2C646564%2C648850%2C589102%2C633487%2C457480%2C649401%2C639235&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_group=100167&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+5X&device_brand=xiaomi&language=zh&os_api=25&os_version=7.1.2&uuid=868392038519494&openudid=a28f8cc2cde1730f&manifest_version_code=699&resolution=1080*1920&dpi=480&update_version_code=69912&_rticket=1545011511705&fp=jlTqP2Ztc2q_FlHeFrU1FYmeFSGI&tma_jssdk_version=1.5.3.9&rom_version=miui_v9_v9.6.2.0.ndbcnfd&plugin=26958&ts=1545011511&as=a2854071d7034c81474355&mas=00fee2f9cc34755ca140c408a81e07206945ec26ea06686e60&cp=54c91d7202137q1
2.拿到接口了之后,我們就可以使用python去獲取到數據了
response = requests.get(url, headers=self.getHeader(), verify=False)使用這行代碼,將抓取到額url傳入,頭部我們可以使用
header = {"Host": "is.snssdk.com","Accept-Language": "zh-Hans;q=1","tt-request-time": str(int(time.time() * 1000)),"Connection": "keep-alive","Accept-Encoding": "gzip,deflate","Cookie": "CNZZDATA1272189606=1385639719-1525687011-%7C1525692411;alert_coverage=76;install_id=31781370987;ttreq=1$b79c6e66ea460b1579579c027e8073593305644e;odin_tt = 4c07858cc8b75143c593d0a99a04aa8fcf10136c3dca9badd9c31a2aa9cc415022834c64d7f52952d9290e3028876735;UM_distinctid = 1633a13d9fd41b-0910970a30f79a8-12485712-3d10d-1633a13d9fe84a;_ga=GA1.2.555016291.1525687770;_gid=GA1.2.96631484.1525687770;qh[360] = 1;__tea_sdk__ssid=957b8ce1-d5b3-4010-bd9c-bfec73bdf526;__tea_sdk__user_unique_id=6552731409432937992;tt_webid=6552731409432937992","X-SS-Cookie": "CNZZDATA1272189606=1385639719-1525687011-%7C1525692411;alert_coverage = 76;install_id=31781370987;ttreq=1$b79c6e66ea460b1579579c027e8073593305644e;odin_tt=4c07858cc8b75143c593d0a99a04aa8fcf10136c3dca9badd9c31a2aa9cc415022834c64d7f52952d9290e3028876735;UM_distinctid=1633a13d9fd41b-0910970a30f79a8-12485712-3d10d-1633a13d9fe84a;_ga=GA1.2.555016291.1525687770;_gid=GA1.2.96631484.1525687770;qh[360]=1;__tea_sdk__ssid=957b8ce1-d5b3-4010-bd9c-bfec73bdf526;__tea_sdk__user_unique_id=6552731409432937992;tt_webid=6552731409432937992","User-Agent": "News/6.6.5(iPhone;iOS10.2;Scale/2.00)","Accept": "*/*"}這樣我們就可以獲取電商類目的數據了,我們運行項目看結果,頭條給我們返回的數據,這里是我使用json工具格式化,其中打他就是我們想要的電商類目的數據,其中content字段就是每條數據的詳細信息。
這里我們就需要取出content里面的詳細數據,代碼如下:
json_list = (json.loads(jsonStr))["data"] for json_str in json_list:content = json.loads(json_str["content"])self.savaDataInfo(content)每條數據的信息量是很大的,我們取出需要的數據保存數據庫即可,保存數據庫代碼如下
def savaDataInfo(self, content):DataInfo.title = content["title"]DataInfo.type = 1DataInfo.channel = "jinritoutiao"if "download_url" in content["raw_ad_data"]:DataInfo.appdownload = content["raw_ad_data"]["download_url"]self.saveBitmapUrlOrPath(content)DataInfo.device_type = "ios"DataInfo.app_name = content["source"]MySqlManager().insert_inspection_list(3)插入數據庫:
def insert_inspection_list(self, table_id):print(str(DataInfo.pic_list))print(str(DataInfo.pic_path))sql = "INSERT INTO " + self.getTableName(table_id) + "(title,app_download,time,channel,type,content,gif,video,source_type,pic_list,pic_path,device_type,material_size,app_name,created_at,updated_at)" \" VALUES ('%s','%s','%s','%s',%d,'%s','%s','%s',%d,'%s','%s','%s','%s','%s','%s','%s')" \% (DataInfo.title, DataInfo.app_download, DataInfo.time, DataInfo.channel, DataInfo.type,DataInfo.content, json.dumps(DataInfo.gif), json.dumps(DataInfo.video), DataInfo.source_type, json.dumps(DataInfo.pic_list),json.dumps(DataInfo.pic_path), DataInfo.device_type,DataInfo.material_size,DataInfo.app_name, self.getCurrentTime(), self.getCurrentTime())cursor = self.conn.cursor()cursor.execute(sql)self.conn.commit()?
到這里基本就完成了如何獲取今日頭條app的電商數據了
總結
以上是生活随笔為你收集整理的python3今日头条App电商数据抓取的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 某cpws - ciphertext加密
- 下一篇: 删除xml或者图像文件夹多余的文件