联合国农产品数据分析
生活随笔
收集整理的這篇文章主要介紹了
联合国农产品数据分析
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
一、項目描述
中國自古以來就是一個農業大國,廣闊的土地上栽培的無數的農產品,這個項目是用來從聯合國的數據里,爬取到各個國家農產品的交易記錄,并且對這個數據進行分析,方便他人可以通過圖表對我國農產品的進出口有個很清晰的了解。
二、項目的步驟
1、瀏覽分析聯合國的貿易網站,查看網站接口api的文檔
2、編寫數據爬取的代碼,使用requests庫發起請求,將csv數據保存在本地
3、將csv文件使用pandas進行合并入庫
4、對入庫的數據進行分析,并繪制圖表,然后將結果通過郵件發送給老師
三、涉及到的技術
python3.6,mariadb,centos
四、需求分析
1、爬取聯合國各個農產品的交易數據
2、需要年份 2020-2021年 這個年份最好不要寫死,以后可以更改,最好用配置文件去管理
3、保存成csv 放到data目錄 這個data目錄路徑你要自己設計好
4、入庫到數據庫
5、繪制圖表,并將結果發送給老師
五、代碼實現
1、新建一個config配置文件,可以用來隨時更改年份、農產品的種類、出口國、進口國、以及后面要請求的URL,和要連接的數據庫等
# 保存項目的配置#如果需要獲取別的年份,直接修改如下配置 #start_year 規定開始年份 #end_year 規定結束年份 start_year = 2020 end_year = 2021#要獲取的產品 #其他農產品的id可以去網址查詢,此項目以大米來舉例說明 Products = "rice" Products_id = 1006URL = f"https://comtrade.un.org/api/get/plus?max=10000&type=C&freq=A&px=HS&p=all&rg=all&fmt=csv" #sqlalchmey 連接數據庫url#什么數據庫+什么方式連接://用戶名:密碼@主機ip地址/連接的數據庫?字符編碼 DBURL = "mysql+pymysql://sc:123456@192.168.243.128/sc?charset=utf8"2、新建一個country文件,里面存放了所有國家和國家對應的id
country = [{"id": "all","text": "All"},{"id": "4","text": "Afghanistan"},{"id": "8","text": "Albania"},{"id": "12","text": "Algeria"},{"id": "20","text": "Andorra"},{"id": "24","text": "Angola"},{"id": "660","text": "Anguilla"},{"id": "28","text": "Antigua and Barbuda"},{"id": "32","text": "Argentina"},{"id": "51","text": "Armenia"},{"id": "533","text": "Aruba"},{"id": "36","text": "Australia"},{"id": "40","text": "Austria"},{"id": "31","text": "Azerbaijan"},{"id": "44","text": "Bahamas"},{"id": "48","text": "Bahrain"},{"id": "50","text": "Bangladesh"},{"id": "52","text": "Barbados"},{"id": "112","text": "Belarus"},{"id": "56","text": "Belgium"},{"id": "58","text": "Belgium-Luxembourg"},{"id": "84","text": "Belize"},{"id": "204","text": "Benin"},{"id": "60","text": "Bermuda"},{"id": "64","text": "Bhutan"},{"id": "68","text": "Bolivia (Plurinational State of)"},{"id": "535","text": "Bonaire"},{"id": "70","text": "Bosnia Herzegovina"},{"id": "72","text": "Botswana"},{"id": "92","text": "Br. Virgin Isds"},{"id": "76","text": "Brazil"},{"id": "96","text": "Brunei Darussalam"},{"id": "100","text": "Bulgaria"},{"id": "854","text": "Burkina Faso"},{"id": "108","text": "Burundi"},{"id": "132","text": "Cabo Verde"},{"id": "116","text": "Cambodia"},{"id": "120","text": "Cameroon"},{"id": "124","text": "Canada"},{"id": "136","text": "Cayman Isds"},{"id": "140","text": "Central African Rep."},{"id": "148","text": "Chad"},{"id": "152","text": "Chile"},{"id": "156","text": "China"},{"id": "344","text": "China, Hong Kong SAR"},{"id": "446","text": "China, Macao SAR"},{"id": "170","text": "Colombia"},{"id": "174","text": "Comoros"},{"id": "178","text": "Congo"},{"id": "184","text": "Cook Isds"},{"id": "188","text": "Costa Rica"},{"id": "384","text": "C?te d'Ivoire"},{"id": "191","text": "Croatia"},{"id": "192","text": "Cuba"},{"id": "531","text": "Cura?ao"},{"id": "196","text": "Cyprus"},{"id": "203","text": "Czechia"},{"id": "200","text": "Czechoslovakia"},{"id": "408","text": "Dem. People's Rep. of Korea"},{"id": "180","text": "Dem. Rep. of the Congo"},{"id": "208","text": "Denmark"},{"id": "262","text": "Djibouti"},{"id": "212","text": "Dominica"},{"id": "214","text": "Dominican Rep."},{"id": "588","text": "East and West Pakistan"},{"id": "218","text": "Ecuador"},{"id": "818","text": "Egypt"},{"id": "222","text": "El Salvador"},{"id": "226","text": "Equatorial Guinea"},{"id": "232","text": "Eritrea"},{"id": "233","text": "Estonia"},{"id": "231","text": "Ethiopia"},{"id": "97","text": "EU-28"},{"id": "234","text": "Faeroe Isds"},{"id": "238","text": "Falkland Isds (Malvinas)"},{"id": "242","text": "Fiji"},{"id": "246","text": "Finland"},{"id": "886","text": "Fmr Arab Rep. of Yemen"},{"id": "278","text": "Fmr Dem. Rep. of Germany"},{"id": "866","text": "Fmr Dem. Rep. of Vietnam"},{"id": "720","text": "Fmr Dem. Yemen"},{"id": "230","text": "Fmr Ethiopia"},{"id": "280","text": "Fmr Fed. Rep. of Germany"},{"id": "582","text": "Fmr Pacific Isds"},{"id": "590","text": "Fmr Panama, excl.Canal Zone"},{"id": "592","text": "Fmr Panama-Canal-Zone"},{"id": "868","text": "Fmr Rep. of Vietnam"},{"id": "717","text": "Fmr Rhodesia Nyas"},{"id": "736","text": "Fmr Sudan"},{"id": "835","text": "Fmr Tanganyika"},{"id": "810","text": "Fmr USSR"},{"id": "890","text": "Fmr Yugoslavia"},{"id": "836","text": "Fmr Zanzibar and Pemba Isd"},{"id": "251","text": "France"},{"id": "254","text": "French Guiana"},{"id": "258","text": "French Polynesia"},{"id": "583","text": "FS Micronesia"},{"id": "266","text": "Gabon"},{"id": "270","text": "Gambia"},{"id": "268","text": "Georgia"},{"id": "276","text": "Germany"},{"id": "288","text": "Ghana"},{"id": "292","text": "Gibraltar"},{"id": "300","text": "Greece"},{"id": "304","text": "Greenland"},{"id": "308","text": "Grenada"},{"id": "312","text": "Guadeloupe"},{"id": "320","text": "Guatemala"},{"id": "324","text": "Guinea"},{"id": "624","text": "Guinea-Bissau"},{"id": "328","text": "Guyana"},{"id": "332","text": "Haiti"},{"id": "336","text": "Holy See (Vatican City State)"},{"id": "340","text": "Honduras"},{"id": "348","text": "Hungary"},{"id": "352","text": "Iceland"},{"id": "699","text": "India"},{"id": "356","text": "India, excl. Sikkim"},{"id": "360","text": "Indonesia"},{"id": "364","text": "Iran"},{"id": "368","text": "Iraq"},{"id": "372","text": "Ireland"},{"id": "376","text": "Israel"},{"id": "381","text": "Italy"},{"id": "388","text": "Jamaica"},{"id": "392","text": "Japan"},{"id": "400","text": "Jordan"},{"id": "398","text": "Kazakhstan"},{"id": "404","text": "Kenya"},{"id": "296","text": "Kiribati"},{"id": "414","text": "Kuwait"},{"id": "417","text": "Kyrgyzstan"},{"id": "418","text": "Lao People's Dem. Rep."},{"id": "428","text": "Latvia"},{"id": "422","text": "Lebanon"},{"id": "426","text": "Lesotho"},{"id": "430","text": "Liberia"},{"id": "434","text": "Libya"},{"id": "440","text": "Lithuania"},{"id": "442","text": "Luxembourg"},{"id": "450","text": "Madagascar"},{"id": "454","text": "Malawi"},{"id": "458","text": "Malaysia"},{"id": "462","text": "Maldives"},{"id": "466","text": "Mali"},{"id": "470","text": "Malta"},{"id": "584","text": "Marshall Isds"},{"id": "474","text": "Martinique"},{"id": "478","text": "Mauritania"},{"id": "480","text": "Mauritius"},{"id": "175","text": "Mayotte"},{"id": "484","text": "Mexico"},{"id": "496","text": "Mongolia"},{"id": "499","text": "Montenegro"},{"id": "500","text": "Montserrat"},{"id": "504","text": "Morocco"},{"id": "508","text": "Mozambique"},{"id": "104","text": "Myanmar"},{"id": "580","text": "N. Mariana Isds"},{"id": "516","text": "Namibia"},{"id": "524","text": "Nepal"},{"id": "530","text": "Neth. Antilles"},{"id": "532","text": "Neth. Antilles and Aruba"},{"id": "528","text": "Netherlands"},{"id": "540","text": "New Caledonia"},{"id": "554","text": "New Zealand"},{"id": "558","text": "Nicaragua"},{"id": "562","text": "Niger"},{"id": "566","text": "Nigeria"},{"id": "579","text": "Norway"},{"id": "512","text": "Oman"},{"id": "490","text": "Other Asia, nes"},{"id": "586","text": "Pakistan"},{"id": "585","text": "Palau"},{"id": "591","text": "Panama"},{"id": "598","text": "Papua New Guinea"},{"id": "600","text": "Paraguay"},{"id": "459","text": "Peninsula Malaysia"},{"id": "604","text": "Peru"},{"id": "608","text": "Philippines"},{"id": "616","text": "Poland"},{"id": "620","text": "Portugal"},{"id": "634","text": "Qatar"},{"id": "410","text": "Rep. of Korea"},{"id": "498","text": "Rep. of Moldova"},{"id": "638","text": "Réunion"},{"id": "642","text": "Romania"},{"id": "643","text": "Russian Federation"},{"id": "646","text": "Rwanda"},{"id": "647","text": "Ryukyu Isd"},{"id": "461","text": "Sabah"},{"id": "652","text": "Saint Barthelemy"},{"id": "654","text": "Saint Helena"},{"id": "659","text": "Saint Kitts and Nevis"},{"id": "658","text": "Saint Kitts, Nevis and Anguilla"},{"id": "662","text": "Saint Lucia"},{"id": "534","text": "Saint Maarten"},{"id": "666","text": "Saint Pierre and Miquelon"},{"id": "670","text": "Saint Vincent and the Grenadines"},{"id": "882","text": "Samoa"},{"id": "674","text": "San Marino"},{"id": "678","text": "Sao Tome and Principe"},{"id": "457","text": "Sarawak"},{"id": "682","text": "Saudi Arabia"},{"id": "686","text": "Senegal"},{"id": "688","text": "Serbia"},{"id": "891","text": "Serbia and Montenegro"},{"id": "690","text": "Seychelles"},{"id": "694","text": "Sierra Leone"},{"id": "702","text": "Singapore"},{"id": "703","text": "Slovakia"},{"id": "705","text": "Slovenia"},{"id": "711","text": "So. African Customs Union"},{"id": "90","text": "Solomon Isds"},{"id": "706","text": "Somalia"},{"id": "710","text": "South Africa"},{"id": "728","text": "South Sudan"},{"id": "724","text": "Spain"},{"id": "144","text": "Sri Lanka"},{"id": "275","text": "State of Palestine"},{"id": "729","text": "Sudan"},{"id": "740","text": "Suriname"},{"id": "748","text": "Eswatini"},{"id": "752","text": "Sweden"},{"id": "757","text": "Switzerland"},{"id": "760","text": "Syria"},{"id": "762","text": "Tajikistan"},{"id": "807","text": "North Macedonia"},{"id": "764","text": "Thailand"},{"id": "626","text": "Timor-Leste"},{"id": "768","text": "Togo"},{"id": "772","text": "Tokelau"},{"id": "795","text": "Turkmenistan"},{"id": "796","text": "Turks and Caicos Isds"},{"id": "798","text": "Tuvalu"},{"id": "800","text": "Uganda"},{"id": "804","text": "Ukraine"},{"id": "858","text": "Uruguay"},{"id": "850","text": "US Virgin Isds"},{"id": "842","text": "USA"},{"id": "841","text": "USA (before 1981)"},{"id": "548","text": "Vanuatu"},{"id": "862","text": "Venezuela"},{"id": "704","text": "Viet Nam"},{"id": "975","text": "ASEAN"}]3、新建一個request_data文件,可以用來存放獲取數據的代碼,同時還要在同目錄下新建一個data目錄用來存放請求到的數據;
#導入第三方請求庫 import requests #導入剛剛自己定義的國家模塊,因為后面要用到國家模塊里面的id import country #導入自己定義的配置文件庫 import config #導入文件處理os庫 import os #導入time時間處理的庫 import time#定義一個函數,檢查文件夾是否存在,不存在就創建 def dir_check(dir_path):if not os.path.exists(dir_path):os.makedirs(dir_path)# 通過接口請求獲取相應的信息,然后保存在當前data目錄下,要在當前目錄下新建一個data目錄 #定義一個函數,用來去請求數據,并將數據保存在文件夾里 def download(full_url,file_path):# 請求接口 -- 異常處理,重試3次,睡個2秒再去重試for i in range(3):try:result = requests.get(full_url)if result.status_code == 200:with open(file_path,"w+",encoding="utf-8")as fp:fp.write(result.text)break;except:print("請求失敗,重新發起請求")time.sleep(2)if __name__=="__main__": #判斷數據目錄存不存在,不存在就調用dir_check函數去創建,創建的目錄以產品的名稱來命名dir_path = "data/"+config.Productsdir_check(dir_path) #在配置文件里規定好起始年份和結束年份,循環得到所有年份for year in range(config.start_year,config.end_year+1):#創建年份文件夾year_path = dir_path + "/" + str(year)dir_check(year_path)#循環所有國家,每次循環得到一個字典,用state接收for state in country.country[1:]:print(f"正在下載{year}年,{state['text']}國家的數據")#將字典里id的值賦給state_idstate_id = state["id"]#使用字符串拼接得到最終請求的url,\是續航符full_url=config.URL+"&ps="+str(year)+"&r="+state_id + \"&cc=" + str(config.Products_id)#使用字符串拼接,得到年份下面以國家命名的.csv結尾的文件file_path = year_path + "/" + state["text"] + ".csv"# 如果請求過就不需要重新請求了if os.path.exists(file_path):print("此文件已經存在,不需要重新請求")else:download(full_url,file_path)# 一個小時只有100次請求,控制請求頻率,請求一下 sleep一下time.sleep(40)運行request文件里的代碼,就會得到數據,等待所有數據爬取完成,就是這個界面(我這里2020年的數據沒有爬取完)
4、新建一個insert_mysql的文件,將爬取到的數據存到自己的數據庫里面去;
#導入 pymysql模塊 import pymysql #導入pandas數據庫叫做pd import pandas as pd #從sqlalchemy模塊導入create_engine函數 from sqlalchemy import create_engine #導入文件處理os模塊 import os #從config模塊里導入DBURL,start_year, end_year, Products from config import DBURL, start_year, end_year, Products#用pymysql模塊的方法連接到數據庫 conn = pymysql.connect(host='192.168.243.128',#主機的ip地址user='sc',#用戶名password='123456',#用戶密碼database='sc',#連接的數據庫的名稱charset='utf8'#字符編碼格式) #創建連接數據庫引擎 engine = create_engine(DBURL)#按年份存入數據庫 dir_path = "date/"+ Products for year in range(start_year,end_year+1):#按年存入數據庫#將每一年的csv都讀取到一個大的dataframe里面,然后使用to_sql入庫df = pd.DataFrame()#得到年份目錄的路徑year_path = os.path.join(dir_path,str(year))#循環得到年份目錄下每一個以.csv結尾的國家的文件for filename in os.listdir(year_path):#使用os的拼接方法,得到文件的路徑file_path = os.path.join(year_path,filename)#使用pandas將每個文件都讀取出來tmp_df = pd.read_csv(file_path)#將每個小的dataframe合并成一個大的dataframedf = df.append(tmp_df,ignore_index=False)#用pandas處理表格數據,篩選出年份不為空的數據df = df[df["Period"].notnull()]#將最終篩選出來的df表格插入到數據庫里df.to_sql("union_table",engine,if_exists="append",index=False)查看一下插入前的數據庫
查看執行了代碼后,插入后的數據庫
5、新建一個analysis文件,用來分析數據,并繪圖
#分析2021年中國與其他國家大米交易量前三的國家 #使用pandas matplotlib作圖 import pymysql import pandas as pd#連接數據庫 db = pymysql.connect(host = "192.168.243.128", #mysql主機ipuser = "sc", #用戶名passwd = "123456", #密碼database = "sc" #數據庫 )#獲取china 2021年對所有國家的交易數據 df = pd.read_sql("select * from union_table where Reporter='China' and Year='2021';", con=db) #與中國進行大米交易前三的國家數據 Partner得到的國家就成為這個dataframe 的index了 result = df.groupby('Partner').sum().sort_values(by="Netweight (kg)", ascending=False).iloc[1:4, :] #sql語句 直接查詢 #select Partner, sum(Qty) as nw from union_table where Reporter='China' and Year='2021' group by Partner order by nw desc limit 5;#把前三的國家 交易數據放入result_df result_df = df[df['Partner'].isin(result.index) ] #把Partner作為result_df索引 result_df.set_index(['Partner'], inplace=True) #篩選交易類型和交易量 result_df = result_df[['Trade Flow','Netweight (kg)']] #s1是所有出口國家 #s2是所有進口國家 s1 = result_df[result_df["Trade Flow"] == "X"]['Netweight (kg)'] s2 = result_df[result_df["Trade Flow"] == "M"]['Netweight (kg)'] #將s1,s2,倆個serise合并成一個dataframe,并且他們的列名稱就是"export","import" rdf = pd.concat([s1,s2], axis=1) rdf.columns = ["export","import"] # #畫圖 import matplotlib.pyplot as plt #對于dataframe,matplotlib作圖工具,會自動的將列名稱作為柱狀圖的x軸下標名稱 #會自動的將每一行的數據生成一個柱狀圖 rdf.plot.bar() #讓柱狀圖的下標可以自由的旋轉,達到合適的效果 plt.xticks(rotation=360) # fontproperties='simhei',讓python畫圖支持中文格式 plt.title(f"2021-大米進出口總量前三排名",fontproperties='simhei') #將圖片保存在當前目錄1.png里面 plt.savefig('1.png')plt.show()雙擊顯示柱狀圖
6、新建一個sendmail的文件,用來將最后的圖片發送老師的郵箱
#郵件協議 #smtp協議 發郵件的協議 默認端口25號 #pop3 收郵件的協議 默認端口110 #imap 收郵件的協議 默認端口143#pop3和imap都是收郵件的協議,區別在于 #pop3在客戶端操作不會反饋到服務器 #imap在客戶端的操作會反饋到服務器163.com郵箱的授權碼 #safdsdfsad(這里保存好自己的授權碼)#python里兩個模塊發郵件 #1、smtplib 發郵件 #2、email 構建郵件內容 import smtplib from email.mime.text import MIMEText from email.mime.multipart import MIMEMultipart from email.mime.application import MIMEApplication#設置第三方發郵件 SMTP服務 mail_host = "smtp.163.com" #設置服務器 mail_user = "123456@163.com" #用戶名 mail_pass = "saasfd" #授權碼,填自己正確的授權碼receivers = "123456@163.com" #接收郵箱,可以自己給自己發郵件 #創建一個實例 msg = MIMEMultipart() #設置標題的一些主題,郵件來自和郵件發送 msg['Subject'] = 'lhj' msg['From'] = mail_user msg['To'] = receivers#給郵件加入內容 message = MIMEText('lhj') msg.attach(message) #發送附件1.png圖片 message = MIMEApplication(open('1.png','rb').read()) message.add_header('Content-Disposition','attachment',filename='1.png') msg.attach(message)try:#創建一個實例s = smtplib.SMTP()#創建連接s.connect(mail_host)#登錄,mail_user是用戶名,mail_pass是授權碼s.login(mail_user,mail_pass)#發送郵件,第一個參數是誰發送的,第二個參數是發送給誰的,后面的是將內容變成strings.sendmail(mail_user,receivers,msg.as_string()) except Exception as e:print(e) s.close()六、遇到的問題
1、一個小時只有100次請求
解決:每次請求采用time庫的sleep方法,睡個40秒
2、程序錯誤結束時,開始下一次請求時,請求過就不需要重新請求
解決:采用一個循環,結合os庫的方法,判斷文件路徑是否存在,如果存在就不去url請求數據了
3、請求接口時,接口發生異常,
解決:采用異常處理,捕獲異常,用循環重試3次,睡個2秒再去重試
總結
以上是生活随笔為你收集整理的联合国农产品数据分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 2021苍穹战队视觉组寒假学习计划--环
- 下一篇: [Windows实用软件推荐:1]本地搜