當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

2022年美赛C题M奖思路复盘（附代码、附论文）

發布時間：2023/12/31 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 2022年美赛C题M奖思路复盘（附代码、附论文）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

前言

美賽已經結束4天了，一直忙于教資考試的準備，今天我終于抽空寫了這篇C題思路復盤的博客。

題目大致要求

題目叫'Trading Strategies（交易策略）'，一共給了兩個文件，分別是比特幣和黃金價格隨時間變化的CSV文件。大致要求可以分為以下四個步驟：

基于截止至當日的價格情況建立模型，預測2021年9月10日原來的本金1000美元會變成多少錢？（Develop a model that gives the best daily trading strategy based only on price data up to that day. How much is the initial $1000 investment worth on 9/10/2021 using your model and strategy?）

證明你的模型提供了最佳策略（Present evidence that your model provides the best strategy.）

就是讓你對手續費那兩個參數做敏感性分析（Determine how sensitive the strategy is to transaction costs. How do transaction costs affect the strategy and results?）

美賽的老一套，讓你寫一個什么備忘錄，向投資者陳述你的戰略、模型和結果（Communicate your strategy, model, and results to the trader in a memorandum of at most two pages.）

具體題目內容見美賽網站：2022 MCM Problem C (immchallenge.org)http://www.immchallenge.org/mcm/2022_MCM_Problem_C.pdf

分析題目?

對于問題1，思路無非是分為兩個步驟，第一個步驟是預測出后幾日的價格進而估計得到后幾日的收益率，然后通過動態規劃模型進行最優化的求解。具體思路如下：

首先先看數據集，你會發現數據集只有兩列，時間一列和價格一列。這意味著知網或者其他數據庫中查到的很多論文寫的預測算法你都用不了了。為什么呢？因為人家寫論文時候用的數據集都有其他的特征，而不是這里只有時間一列特征。所以那些什么SVM、貝葉斯網、向量自回歸等要求多維特征的算法都不適用。查來查去，有推薦使用LSTM、神經網絡模型還有xgboost算法，但我都是不是很熟悉，最終還是用了時間序列ARIMA模型來做的預測。為什么用ARIMA模型來做的預測，我給出以下幾點原因：

ARIMA模型屬于單時間序列模型，符合題目數據集提供的數據以及題目給出的要求。
ARIMA模型是一個很常見的時間序列模型，較為成熟也是我所熟悉的，我能拿它寫不少東西。
我使用Python編程，而Python的statsmodels庫中提供了ARIMA現成的API，不用另外寫代碼，而且調用API繪制圖像的操作十分方便。

對于問題二呢，網上的思路都是說，改變什么參數，看看收益率會不會提高，如果不會的話就說明是最優的，但也沒說清楚具體怎么操作。而我的想法不完全是這樣的，我看了他們的思路突然想到了"梯度下降法"，那你說如果我用模型的兩個參數（具體來講一個是ARIMA模型的AR參數另一個是MA參數，作為梯度下降法改變模型的兩個搜索方向，然后進行算法計算，理論上是完全可行的，于是乎我就這么干了。

對于問題三和問題四就不再多說了。但我也羅嗦一下，問敏感性分析，我除了做了問題三的手續費的敏感性分析外，我還針對ARIMA模型做了敏感度分析。問題四就總分總來寫，總一段，分三段，也就是戰略、模型和結果各一段寫，最后寫個總結，要注意的是英文備忘錄是有格式的，得按照格式來，要有點儀式感，不然你就可能會被扣分。

還有就是別忘了一開始寫數據清洗那一段，我用了基于時間序列的牛頓插值法進行插補，用箱型圖法判斷離群點。

接下來我貼出我的代碼。

代碼

import pandas as pd from pylab import * import statsmodels.api as sm from datetime import datetime from statsmodels.tsa.stattools import adfuller from statsmodels.tsa.stattools import kpss from statsmodels.stats.diagnostic import acorr_ljungboxdef adf_test(timeseries):print('Results of Dickey-Fuller Test:')dftest = adfuller(timeseries, autolag='AIC')dfoutput = pd.Series(dftest[0:4], index=['Test Statistic', 'p-value', '#Lags Used', 'Number of Observations Used'])for key, value in dftest[4].items():dfoutput['Critical Value (%s)'%key] = valueprint(dfoutput)def kpss_test(timeseries):print('Results of KPSS Test:')kpsstest = kpss(timeseries, regression='c')kpss_output = pd.Series(kpsstest[0:3], index=['Test Statistic','p-value','Lags Used'])for key, value in kpsstest[3].items():kpss_output['Critical Value (%s)'%key] = valueprint(kpss_output)dataGOLD = pd.read_csv("C:\\ProblemC\\LBMA-GOLD.csv") dataMKPRU = pd.read_csv("C:\\ProblemC\\BCHAIN-MKPRU.csv") dataGOLD['Date'] = pd.to_datetime(dataGOLD.Date) dataMKPRU['Date'] = pd.to_datetime(dataMKPRU.Date) dataGOLD = dataGOLD[dataGOLD['Date'] >= datetime.strptime('2016-9-11', "%Y-%m-%d")] dataMKPRU = dataMKPRU[dataMKPRU['Date'] >= datetime.strptime('2016-9-11', "%Y-%m-%d")] dataGOLD = dataGOLD[dataGOLD['Date'] <= datetime.strptime('2021-9-10', "%Y-%m-%d")] dataMKPRU = dataMKPRU[dataMKPRU['Date'] <= datetime.strptime('2021-9-10', "%Y-%m-%d")] dataGOLD = dataGOLD.sort_values('Date').reset_index(drop=True) dataMKPRU = dataMKPRU.sort_values('Date').reset_index(drop=True)adf_test(dataGOLD['Value']) kpss_test(dataGOLD['Value'])diffGOLD = dataGOLD['Value'].diff(1).dropna() diffMKPRU = dataMKPRU['Value'].diff(1).dropna()fig = plt.figure(figsize=(12, 8)) ax1 = fig.add_subplot(211) sm.graphics.tsa.plot_acf(diffMKPRU, ax=ax1) ax1.xaxis.set_ticks_position('top') ax2 = fig.add_subplot(212) sm.graphics.tsa.plot_pacf(diffMKPRU, ax=ax2) ax2.xaxis.set_ticks_position('bottom') fig.tight_layout() plt.show()fig = plt.figure(figsize=(12, 8)) ax11 = fig.add_subplot(211) sm.graphics.tsa.plot_acf(diffGOLD, ax=ax11) ax11.xaxis.set_ticks_position('top') ax22 = fig.add_subplot(212) sm.graphics.tsa.plot_pacf(diffGOLD, ax=ax22) ax22.xaxis.set_ticks_position('bottom') fig.tight_layout() plt.show()p_value = acorr_ljungbox(dataGOLD['Value']) print(p_value)results = sm.tsa.arma_order_select_ic(dataMKPRU['Value'], ic=['aic', 'bic'], trend='nc', max_ar=5, max_ma=5) print('AIC', results.aic_min_order) print('BIC', results.bic_min_order)n_sample = dataGOLD['Value'].shape[0] n_train = int(n_sample * 0.95) + 1 n_forecast = n_sample - n_traints_train = dataGOLD.iloc[:n_train]['Value'] ts_test = dataGOLD.iloc[n_train:]['Value']arima = sm.tsa.SARIMAX(ts_train, order=(2, 1, 0)) model_results = arima.fit() model_results.plot_diagnostics(figsize=(16, 12)) plt.show() print(model_results.summary())plt.title("Bitcoin price image after first-order difference") plt.plot(dataMKPRU['Date'], dataMKPRU['Value'].diff(1)) plt.show()plt.title("Gold price image after first-order difference") plt.plot(dataGOLD['Date'], dataGOLD['Value'].diff(1)) plt.show()

代碼繪圖展示

我們使用matplotlib進行繪圖，繪制了如下的圖片進行展示：

箱型圖法檢測的兩張圖，如下：

價格隨時間變化的趨勢圖兩張：

一階差分圖兩張：?

應用新策略后的趨勢圖一張：

?ARIMA模型預測圖一張

ACF圖和PACF圖兩張：?

各類圖表：一階差分圖、正態分布直方圖、QQ圖等?

代碼分析與第一問細節?

可以看到，statsmodels確實十分強大，各種API拿來就用，十分方便。

其中我們用ACF圖和PACF圖確定了參數的大致階數，然后用BIC和AIC來定階。代碼只展示了黃金的ARIMA模型的階數為（2，1，0），比特幣模型的階數只要改下變量就行了，就不在展示了，比特幣模型的階數最終定為（4，1，4）。

就寫到這里吧，僅供參考和復盤。

最終獲獎結果

更新于5月8日。

我們團隊最終獲得了M獎，祝賀所有人！！！

結語

?這是我們的論文鏈接，僅供參考！學術生涯不易，請投喂我們，論文只要1.9元，也不是很貴！

2022年美賽（MCM）C題M獎論文https://download.csdn.net/download/qq_41938259/85318021

END

總結

以上是生活随笔為你收集整理的2022年美赛C题M奖思路复盘（附代码、附论文）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： QT 5.9.0下载安装及配置教程
下一篇： DB2 SQLCODE 异常大全编辑(五