先知模型 facebook_使用Facebook先知进行犯罪率预测
先知模型 facebook
Time series prediction is one of the must-know techniques for any data scientist. Questions like predicting the weather, product sales, customer visit in the shopping center, or amount of inventory to maintain, etc - all about time series forecasting, making it a valuable addition to a data scientist’s skillsets.
時間序列預(yù)測是任何數(shù)據(jù)科學(xué)家都必須了解的技術(shù)之一。 諸如預(yù)測天氣,產(chǎn)品銷售,購物中心的顧客來訪或要維護(hù)的庫存量之類的問題都與時間序列預(yù)測有關(guān),這使其成為數(shù)據(jù)科學(xué)家技能的寶貴補充。
In this article, I will introduce how to use Facebook Prophet to predict the crime rate in Chicago. Split into 5 parts:
在本文中,我將介紹如何使用Facebook Prophet預(yù)測芝加哥的犯罪率。 分為5部分:
1. Prophet Introduction
1.先知介紹
2. EDA
2. EDA
3. Data processing
3.數(shù)據(jù)處理
4. Model prediction
4.模型預(yù)測
5. Takeaways
5.外賣
Let’s begin the journey.
讓我們開始旅程。
1. Prophet Introduction
1.先知介紹
In 2017, Facebook Core Data Science Team open-sourced Prophet. As stated on its Github page, Prophet is:
2017年,Facebook核心數(shù)據(jù)科學(xué)團(tuán)隊開源了Prophet。 如其Github頁所述,先知是:
- a procedure for forecasting time series data; 預(yù)測時間序列數(shù)據(jù)的程序;
- based on additive models; 基于加性模型;
- fit non-linear trends with yearly, weekly, and daily seasonality, plus holiday effect. 使非線性趨勢與每年,每周和每天的季節(jié)性相適應(yīng),再加上假期影響。
Prophet uses a decomposable model with three main components, including trend, seasonality, and holidays, as combined below:
先知使用具有三個主要組成部分的可分解模型,包括趨勢,季節(jié)性和假日,如下所示:
Where:
哪里:
g(t) is the trend function which models non-periodic changes;
g(t)是模擬非周期性變化的趨勢函數(shù);
s(t) represents periodic changes (e.g., weekly and yearly seasonality);
s(t)代表周期性變化(例如,每周和每年的季節(jié)性變化);
h(t) represents the effects of holidays which occur on potentially irregular schedules;
h(t)表示假期可能在不定期的時間表上發(fā)生的影響;
- the error term represents any idiosyncratic changes which are not accommodated by the model. 錯誤項表示模型不適應(yīng)的任何特有變化。
So using time as a regressor, Prophet tries to fit linear and non-linear functions of time as components. In effect, Prophet frames the forecasting problem as a curve-fitting exercise, instead of looking at the time-based dependency of each observation, which brings flexibility, fast-fitting, and interpretable parameters.
因此,先知將時間用作回歸變量,嘗試將時間的線性和非線性函數(shù)擬合為分量。 實際上,Prophet將預(yù)測問題構(gòu)造為曲線擬合練習(xí),而不是查看每個觀測值基于時間的依賴性,這帶來了靈活性,快速擬合和可解釋的參數(shù)。
Prophet works best with time series that have strong seasonal effects and several seasons of historical data.
先知最適合具有強烈季節(jié)性影響和多個季節(jié)歷史數(shù)據(jù)的時間序列。
2. EDA
2. EDA
The data used here is the Chicago Crime dataset from Kaggle. It contains a summary of the reported crimes that occurred in the City of Chicago from 2001 to 2017.
這里使用的數(shù)據(jù)是來自Kaggle的Chicago Crime數(shù)據(jù)集。 它包含2001年至2017年在芝加哥市發(fā)生的所報告犯罪的摘要。
Quickly looking at the data below, you will notice the dataset has 23 columns and 7,941,282 records, including ID, Case Number, Block, Primary Type, Description, etc.
快速查看下面的數(shù)據(jù),您會注意到數(shù)據(jù)集有23列和7,941,282條記錄,包括ID,案例編號,塊,主要類型,描述等。
A brief view of the raw Chicago Crime dataset原始芝加哥犯罪數(shù)據(jù)集的簡要視圖First, let’s drop the unused columns. Specifically,
首先,讓我們刪除未使用的列。 特別,
df.drop([‘Unnamed: 0’, ‘ID’, ‘Case Number’, ‘IUCR’, ‘X Coordinate’, ‘Y Coordinate’,’Updated On’,’Year’, ‘FBI Code’, ‘Beat’,’Ward’,’Community Area’,‘Location’, ‘District’, ‘Latitude’, ‘Longitude’],axis = 1, inplace=True)Fig.1 Data view after column dropping圖1列刪除后的數(shù)據(jù)視圖
As shown in Fig.1, the column ‘Date’ is in date format. Let’s convert it to a date format Pandas can interpret, and set it as the index. Specifically,
如圖1所示, “日期”列為日期格式。 讓我們將其轉(zhuǎn)換為熊貓可以解釋的日期格式,并將其設(shè)置為索引。 特別,
df.Date = pd.to_datetime(df.Date, format = ‘%m/%d/%Y %I:%M:%S %p’)df.index = pd.DatetimeIndex(df.Date)
df.drop(‘Date’, inplace = True, axis = 1)
Now data is ready for visualization. First, let’s look at the yearly crime distribution. Specifically,
現(xiàn)在,數(shù)據(jù)已準(zhǔn)備好可視化。 首先,讓我們看一下每年的犯罪分布。 特別,
plt.plot(df.resample(‘Y’).size())plt.xlabel(‘Year’)
plt.ylabel(‘Num of crimes’)
Note above df.resample(‘Y’).size() produce the yearly crime count.
請注意,上面的df.resample('Y')。size()會產(chǎn)生年度犯罪計數(shù)。
As indicated in Fig.2, the crime rate starts to drop from 2002 to 2005. But from 2006, the crime rate starts to go up, reaching a peak in 2009 and going down till 2018. This curve may reflect the economic impact on social crime. Before and after the financial crisis, the crime rate goes downs yearly, but the bad economy resulting from the financial crisis causes an increase in crimes.
如圖2所示,犯罪率從2002年到2005年開始下降。但是從2006年開始,犯罪率開始上升,2009年達(dá)到峰值,然后下降到2018年。該曲線可能反映了經(jīng)濟對社會的影響。犯罪。 金融危機前后,犯罪率逐年下降,但金融危機造成的經(jīng)濟不景氣導(dǎo)致犯罪率上升。
Fig.2 Yearly distribution of the crime rate圖2犯罪率的年度分布Second, let’s look at the quarterly crime rate distribution. As shown in Fig.3, the crime rate shows a descending trend with periodic ups and downs.
其次,讓我們看一下季度犯罪率分布。 如圖3所示,犯罪率呈下降趨勢,并有周期性的起伏。
Fig.3 Monthly distribution of the crime rate圖3犯罪率月分布In a similar way, as shown in Fig.4, the monthly crime rate shows the same pattern as the quarterly analysis.
以類似的方式,如圖4所示,每月犯罪率顯示與季度分析相同的模式。
Fig.4 Quarterly distribution of the crime rate圖4犯罪率季度分布3. Data processing
3.數(shù)據(jù)處理
The input to Prophet is always a dataframe with two columns: ‘ds’ and ‘y’. The ‘ds’ (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The ‘y’ column must be numeric and represents the measurement we wish to forecast.
先知的輸入始終是具有兩列的數(shù)據(jù)框:“ ds”和“ y”。 “ ds”(datestamp)列應(yīng)采用熊貓期望的格式,理想情況下,日期應(yīng)為YYYY-MM-DD,時間戳則應(yīng)為YYYY-MM-DD HH:MM:SS。 “ y”列必須為數(shù)字,代表我們希望預(yù)測的度量。
Specifically,
特別,
df_m = df.resample(‘M’).size().reset_index()df_m.columns = [‘Date’, ‘Monthly Crime Count’]
df_m_final = df_m.rename(columns = {‘Date’: ‘ds’, ‘Monthly Crime Count’: ‘y’})
4. Model prediction
4.模型預(yù)測
From EDA analysis, we found there is monthly and quarterly seasonality but no yearly seasonality. By default, Prophet fits weekly and yearly seasonality, if the time series is more than two cycles long. Users can add seasonality such as hourly, monthly, and quarterly using ‘a(chǎn)dd_seasonality’ method.
通過EDA分析,我們發(fā)現(xiàn)每個月和每個季度都有季節(jié)性,但沒有年度季節(jié)性。 默認(rèn)情況下,如果時間序列長于兩個周期以上,則先知適合每周和每年的季節(jié)性。 用戶可以使用“ add_seasonality”方法添加每小時,每月和每季度等季節(jié)性信息。
To make a prediction, instantiate a new Prophet object, and call the fit method to train on the data. Specifically,
要進(jìn)行預(yù)測,請實例化一個新的Prophet對象,然后調(diào)用fit方法對數(shù)據(jù)進(jìn)行訓(xùn)練。 特別,
m = Prophet(interval_width=0.95, yearly_seasonality=False)m.add_seasonality(name=’monthly’, period=30.5, fourier_order=10)
m.add_seasonality(name=’quarterly’, period=91.5, fourier_order=10)
m.fit(df_m_final)
Note ‘interval_width=0.95’, produces a confidence interval around the forecast. Prophet uses a partial Fourier sum to approximate periodic signal. The number of Fourier order determines how quickly the seasonality can change.
注意'interval_width = 0.95' ,在預(yù)測周圍產(chǎn)生一個置信區(qū)間。 先知使用部分傅立葉和來近似周期信號。 傅立葉階數(shù)確定季節(jié)性可以多快地改變。
Predictions are made on a dataframe with a column ‘ds’ containing the dates for which a prediction is to be made. For instance, to predict the following 24 months, try below:
在具有“ ds”列的數(shù)據(jù)幀上進(jìn)行預(yù)測,該列包含要進(jìn)行預(yù)測的日期。 例如,要預(yù)測接下來的24個月,請嘗試以下操作:
future = m.make_future_dataframe(periods = 24, freq = ‘M’)pred = m.predict(future)
As shown in Fig.5, the predicted value ‘yhat’ is assigned to each date with a lower and upper limit.
如圖5所示,將預(yù)測值“ yhat”分配給具有上限和下限的每個日期。
Fig.5 Prediction results圖5預(yù)測結(jié)果As shown in Fig.6, the black dots are the historical data, and the deep blue line is model predictions. The light blue shadow is a 95% confidence interval around the predictions. The blue line shows a good match with the pattern in Fig.3, indicating a good prediction on historical data. Great!
如圖6所示,黑點是歷史數(shù)據(jù),深藍(lán)線是模型預(yù)測。 淡藍(lán)色陰影是圍繞預(yù)測的95%置信區(qū)間。 藍(lán)線表示與圖3中的圖案非常匹配,表示對歷史數(shù)據(jù)的良好預(yù)測。 大!
Fig.6 Prediction plot圖6預(yù)測圖Finally, Fig.7 shows the un-periodic trend, and monthly and quarterly seasonality components of the crime rate pattern.
最后,圖7顯示了犯罪率模式的非周期性趨勢以及每月和每季度的季節(jié)性組成。
Fig.7 Prediction pattern component plot圖7預(yù)測模式成分圖5. Takeaways
5.外賣
We introduced how to make the best use of Facebook Prophet. Specifically,
我們介紹了如何充分利用Facebook Prophet。 特別,
- to use EDA to explore the historical data patterns, helping to create the best suitable model 使用EDA探索歷史數(shù)據(jù)模式,幫助創(chuàng)建最合適的模型
- to use data processing to prepare the data for modeling 使用數(shù)據(jù)處理為建模準(zhǔn)備數(shù)據(jù)
- to use Prophet to fit the historical data and forecast future crime rate 使用先知來擬合歷史數(shù)據(jù)并預(yù)測未來犯罪率
Great! Huge congratulations for making it to the end. If you need the source code, feel free to visit my Github page.
大! 巨大的祝賀,使它走到了盡頭。 如果您需要源代碼,請隨時訪問我的Github頁面。
1. Facebook Prophet official document
1. Facebook Prophet官方文件
2. Prophet paper: Sean J. Taylor, Benjamin Letham (2018) Forecasting at scale. The American Statistician 72(1):37–45 (https://peerj.com/preprints/3190.pdf).
2.先知論文:肖恩·泰勒(Sean J. Taylor),本杰明·萊瑟姆(Benjamin Letham)(2018)大規(guī)模預(yù)測。 美國統(tǒng)計師72(1):37-45( https://peerj.com/preprints/3190.pdf )。
翻譯自: https://towardsdatascience.com/crime-rate-prediction-using-facebook-prophet-5348e21273d
先知模型 facebook
總結(jié)
以上是生活随笔為你收集整理的先知模型 facebook_使用Facebook先知进行犯罪率预测的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 梦到妈妈吐血对妈妈好不好
- 下一篇: 怀孕梦到乌龟什么预兆