动手搭建深度强化学习的自动股票量化交易系统
基于深度強化學習的股票量化交易
? ? ? 歡迎點個小小的Star支持!? ? ?
開源不易,希望大家多多支持~
- 更多實踐案例(AI識蟲,基于PaddleX實現森林火災監測,眼疾識別,智能相冊分類等)、深度學習資料,請參考:awesome-DeepLearning
- 更多學習資料請參閱飛槳深度學習平臺
1.項目介紹
金融領域每天會產生大量的數據,這些數據的噪聲性質很強,信息不全,很難利用起來進行分析。傳統的隨機控制理論和其他的分析方法在利用這些數據做決策的時候,這些方法會嚴重依賴模型的一些假設。但是強化學習能夠利用這些每天產生的大量的金融數據,強化學習不需要對模型或者數據進行假設,并通過構建金融環境就能夠學習到很復雜的金融決策策略,可以用于自動化股票輔助決策交易,投資組合,金融產品推薦等領域。
目前的股票交易策略有2種,第一種是價格預測,即使用機器學習的方法來預測未來的股價,交易就使用了一個預先定義好的交易策略,這個交易策略綜合考慮了機器學習預測出來的價格,經紀人傭金,稅費等等;第二種是自動化交易學習,即給定每天的股票的數據,直接學習交易策略使得獲取的利潤最大化。
1.1 項目內容
股票交易是一個經典的時序決策問題,其指的是在每個交易時間點通過分析歷史圖表,從而做出對應決策(如:買入、賣出、觀望等),以達到長期的最大收益。因此,該問題可以被建模為一個強化學習問題。在此場景下,人即為智能體,股票市場為環境,人通過對股票做出決策,即與環境交互后,會獲得股票當前的狀態。
圖1 基于強化學習的股票量化交易在此項目中,股票狀態包含20個屬性變量,包含所采用第三方股票數據包baostock的一些股票屬性和基于此計算得到的一些屬性變量,分別為:
| open | 當天開盤價格 |
| high | 最高價格 |
| low | 最低價格 |
| close | 收盤價格 |
| volume | 成交量 |
| amount | 成交額 |
| adjustflag | 賦權狀態(1:后復權,2:前復權,3:不復權) |
| tradestatus | 交易狀態(1:正常交易,0:停牌) |
| pctChg | 漲跌幅(百分比) |
| peTTM | 滾動市盈率 |
| pbMRQ | 市凈率 |
| psTTM | 滾動市銷率 |
| balance | 當前擁有的金錢 |
| max_net_worth | 最大資產凈值 |
| net_worth | 當前資產凈值 |
| shares_held | 持有的手數 |
| cost_basis | 即時買入價格 |
| total_shares_sold | 總共拋出的手數 |
| total_sales_value | 總共拋出的價值 |
NOTE:上述屬性值均會經過歸一化處理,因此在此項目中,狀態為一個長度為20的一維向量,其中每一個值的值域均為[0,1][0,1][0,1]。
人根據當前的狀態,依據現有的策略,執行相應的動作,在此項目中,可執行的動作為以下三種:
| (23,1)(\frac{2}{3},1)(32?,1) | 賣出股票 |
| (13,23)(\frac{1}{3},\frac{2}{3})(31?,32?) | 觀望 |
| (0,13)(0,\frac{1}{3})(0,31?) | 買入股票 |
為了定量買入/賣出的股票數量,此項目加入了另一個值amount,表示買入/賣出的股票的比例。因此,此場景下的動作空間為一個長度為2的一維向量,其中第一個值表示動作種類,值域為[0,1][0,1][0,1];第二個值表示買入/賣出的股票的比例,值域為[0,1][0,1][0,1]。
在該項目中,若觸發以下三種情況任意一種,則一輪實驗終止(我們稱一個序幕(episode)為一輪實驗):
max_net_worth≥initial_account_balance×max_predict_rate\mathbb{max\_net\_worth\ge{initial\_account\_balance\times{max\_predict\_rate}}} max_net_worth≥initial_account_balance×max_predict_rate
net_worth≤0\mathbb{net\_worth\le0} net_worth≤0
該項目中的獎勵信號reward設計基于相對初始收益比來度量,具體地:
net_worth=balance+num_shares_held×current_price\mathbb{net\_worth=balance+num\_shares\_held\times{current\_price}} net_worth=balance+num_shares_held×current_price
profit_percent=net_worth?initial_account_balanceinitial_account_balance\mathbb{profit\_percent=\frac{net\_worth-initial\_account\_balance}{initial\_account\_balance}} profit_percent=initial_account_balancenet_worth?initial_account_balance?
reward={profit_percentmax_predict_rate,ifprofit_percent>0?0.1,others\mathbb{reward=} \begin{cases} \mathbb{\frac{profit\_percent}{max\_predict\_rate}},\quad{if\ }\mathbb{profit\_percent>0}\\ -0.1,\quad\quad\quad\quad\quad\quad\quad{others} \end{cases} reward={max_predict_rateprofit_percent?,if?profit_percent>0?0.1,others?
2.安裝說明
在進行項目之前,安裝最新版本的parl。
!pip install parl==2.0.4 -i https://mirror.baidu.com/pypi/simple Looking in indexes: https://mirror.baidu.com/pypi/simple Requirement already satisfied: parl==2.0.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (2.0.4) Requirement already satisfied: cloudpickle==1.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.6.0) Requirement already satisfied: scipy>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.6.3) Requirement already satisfied: psutil>=5.6.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (5.7.2) Requirement already satisfied: grpcio>=1.27.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.33.2) Requirement already satisfied: protobuf>=3.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (3.14.0) Requirement already satisfied: click in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (7.1.2) Requirement already satisfied: tb-nightly==1.15.0a20190801 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.15.0a20190801) Requirement already satisfied: pyzmq==18.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (18.1.1) Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.1.0) Requirement already satisfied: flask>=1.0.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.1.1) Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (2.24.0) Requirement already satisfied: flask-cors in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (3.0.8) Requirement already satisfied: tensorboardX==1.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from parl==2.0.4) (1.8) Requirement already satisfied: setuptools>=41.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (56.2.0) Requirement already satisfied: markdown>=2.6.8 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (3.1.1) Requirement already satisfied: absl-py>=0.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (0.8.1) Requirement already satisfied: six>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (1.16.0) Requirement already satisfied: numpy>=1.12.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (1.19.5) Requirement already satisfied: wheel>=0.26 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (0.36.2) Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from tb-nightly==1.15.0a20190801->parl==2.0.4) (1.0.1) Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.0.4->parl==2.0.4) (2.11.3) Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.0.4->parl==2.0.4) (1.1.0) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->parl==2.0.4) (1.25.6) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->parl==2.0.4) (2019.9.11) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->parl==2.0.4) (2.8) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->parl==2.0.4) (3.0.4) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.0.4->parl==2.0.4) (1.1.1) [33mWARNING: You are using pip version 22.0.4; however, version 22.1.2 is available. You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.[0m[33m [0m !pip install -r requirements.txt Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Requirement already satisfied: paddle-serving-app>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (0.9.0) Requirement already satisfied: paddle-serving-client>=0.7.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (0.9.0) Requirement already satisfied: paddle-serving-server-gpu>=0.7.0.post102 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.9.0.post1028) Requirement already satisfied: pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (8.2.0) Requirement already satisfied: sentencepiece<=0.1.96 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (0.1.96) Requirement already satisfied: shapely in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (1.8.2) Requirement already satisfied: six>=1.10.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (1.16.0) Requirement already satisfied: pyclipper in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (1.3.0.post3) Requirement already satisfied: opencv-python==3.4.17.61 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (3.4.17.61) Requirement already satisfied: numpy>=1.14.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from opencv-python==3.4.17.61->paddle-serving-app>=0.7.0->-r requirements.txt (line 1)) (1.19.5) Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (3.14.0) Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (2.24.0) Requirement already satisfied: grpcio<=1.33.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (1.33.2) Requirement already satisfied: grpcio-tools<=1.33.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (1.33.2) Requirement already satisfied: click==7.1.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (7.1.2) Requirement already satisfied: MarkupSafe==1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.1.1) Requirement already satisfied: Jinja2==2.11.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (2.11.3) Requirement already satisfied: flask<2.0.0,>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.1.1) Requirement already satisfied: itsdangerous==1.1.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.1.0) Requirement already satisfied: Werkzeug==1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.0.1) Requirement already satisfied: func-timeout in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (4.3.5) Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (5.1.2) Requirement already satisfied: pytest in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (7.1.2) Requirement already satisfied: py>=1.8.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.11.0) Requirement already satisfied: pluggy<2.0,>=0.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.0.0) Requirement already satisfied: tomli>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (2.0.1) Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (21.3) Requirement already satisfied: attrs>=19.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (21.4.0) Requirement already satisfied: importlib-metadata>=0.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (4.2.0) Requirement already satisfied: iniconfig in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (1.1.1) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (2.8) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (3.0.4) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (1.25.6) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->paddle-serving-client>=0.7.0->-r requirements.txt (line 2)) (2019.9.11) Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (4.2.0) Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (3.8.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from packaging->pytest->paddle-serving-server-gpu>=0.7.0.post102->-r requirements.txt (line 3)) (3.0.8) [33mWARNING: You are using pip version 22.0.4; however, version 22.1.2 is available. You should consider upgrading via the '/opt/conda/envs/python35-paddle120-env/bin/python -m pip install --upgrade pip' command.[0m[33m [0m如果安裝不上,則需要clone源代碼進行安裝。切換到終端,然后執行下面的命令:
git clone https://github.com/PaddlePaddle/PARL.git cd PARL python setup.py install在運行項目之前,我們首先導入一下相關的庫包
import argparse import os import gym import random from gym import spacesimport numpy as np import pandas as pd from parl.utils import logger, tensorboard, ReplayMemory import paddle from parl.algorithms import SAC [32m[06-29 11:43:51 MainThread @utils.py:73][0m paddlepaddle version: 2.3.0./opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/parl/remote/communication.py:38: DeprecationWarning: 'pyarrow.default_serialization_context' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead.context = pyarrow.default_serialization_context()3.環境構建
繼承gym.env,并重寫相應的接口即可,如__init__(),reset(),step()等,代碼的實現細節如下:
# 默認的一些數據,用于歸一化屬性值 MAX_ACCOUNT_BALANCE = 2147480 # 最大的賬戶財產 MAX_NUM_SHARES = 2147480 # 最大的手數 MAX_SHARE_PRICE = 5000 # 最大的單手價格 MAX_VOLUME = 1e9 # 最大的成交量 MAX_AMOUNT = 1e10 # 最大的成交額 MAX_OPEN_POSITIONS = 5 # 最大的持倉頭寸 MAX_STEPS = 1000 # 最大的交互次數 MAX_DAY_CHANGE = 1 # 最大的日期改變 max_loss =-50000 # 最大的損失 max_predict_rate = 3 # 最大的預測率 INITIAL_ACCOUNT_BALANCE = 100000 # 初始的金錢class StockTradingEnv(gym.Env):"""A stock trading environment for OpenAI gym"""metadata = {'render.modes': ['human']}def __init__(self, df):super(StockTradingEnv, self).__init__()self.df = df# self.reward_range = (0, MAX_ACCOUNT_BALANCE)# 動作的可能情況:買入x%, 賣出x%, 觀望self.action_space = spaces.Box(low=np.array([-1, -1]), high=np.array([1, 1]), dtype=np.float32)# 環境狀態的維度self.observation_space = spaces.Box(low=0, high=1, shape=(20,), dtype=np.float32)self.current_step = 0def seed(self, seed):random.seed(seed)np.random.seed(seed)# 處理狀態def _next_observation(self):# 有些股票數據缺失一些數據,處理一下d10 = self.df.loc[self.current_step, 'peTTM'] / 100d11 = self.df.loc[self.current_step, 'pbMRQ'] / 100d12 = self.df.loc[self.current_step, 'psTTM'] / 100if np.isnan(d10): # 某些數據是0.00000000e+00,如果是nan會報錯d10 = d11 = d12 = 0.00000000e+00obs = np.array([self.df.loc[self.current_step, 'open'] / MAX_SHARE_PRICE,self.df.loc[self.current_step, 'high'] / MAX_SHARE_PRICE,self.df.loc[self.current_step, 'low'] / MAX_SHARE_PRICE,self.df.loc[self.current_step, 'close'] / MAX_SHARE_PRICE,self.df.loc[self.current_step, 'volume'] / MAX_VOLUME,self.df.loc[self.current_step, 'amount'] / MAX_AMOUNT,self.df.loc[self.current_step, 'adjustflag'],self.df.loc[self.current_step, 'tradestatus'] / 1,self.df.loc[self.current_step, 'pctChg'] / 100,d10,d11,d12,self.df.loc[self.current_step, 'pcfNcfTTM'] / 100,self.balance / MAX_ACCOUNT_BALANCE,self.max_net_worth / MAX_ACCOUNT_BALANCE,self.net_worth / MAX_ACCOUNT_BALANCE,self.shares_held / MAX_NUM_SHARES,self.cost_basis / MAX_SHARE_PRICE,self.total_shares_sold / MAX_NUM_SHARES,self.total_sales_value / (MAX_NUM_SHARES * MAX_SHARE_PRICE),])return obs# 執行當前動作,并計算出當前的數據(如:資產等)def _take_action(self, action):# 隨機設置當前的價格,其范圍上界為當前時間點的價格current_price = random.uniform(self.df.loc[self.current_step, "low"], self.df.loc[self.current_step, "high"])action_type = action[0]amount = action[1]if action_type < 1/3 and self.balance >= current_price: # 買入amount%total_possible = int(self.balance / current_price)shares_bought = int(total_possible * amount)if shares_bought != 0.:prev_cost = self.cost_basis * self.shares_heldadditional_cost = shares_bought * current_priceself.balance -= additional_costself.cost_basis = (prev_cost + additional_cost) / (self.shares_held + shares_bought)self.shares_held += shares_boughtelif action_type > 2/3 and self.shares_held != 0: # 賣出amount%shares_sold = int(self.shares_held * amount)self.balance += shares_sold * current_priceself.shares_held -= shares_soldself.total_shares_sold += shares_soldself.total_sales_value += shares_sold * current_priceelse:pass# 計算出執行動作后的資產凈值self.net_worth = self.balance + self.shares_held * current_priceif self.net_worth > self.max_net_worth:self.max_net_worth = self.net_worthif self.shares_held == 0:self.cost_basis = 0# 與環境交互def step(self, action):# 在環境內執行動作self._take_action(action)done = Falsestatus = Nonereward = 0# 判斷是否終止self.current_step += 1# delay_modifier = (self.current_step / MAX_STEPS)# reward += delay_modifierif self.net_worth >= INITIAL_ACCOUNT_BALANCE * max_predict_rate:reward += max_predict_ratestatus = f'[ENV] success at step {self.current_step}! Get {max_predict_rate} times worth.'# self.current_step = 0done = Trueif self.current_step > len(self.df.loc[:, 'open'].values) - 1:status = f'[ENV] Loop training. Max worth was {self.max_net_worth}, final worth is {self.net_worth}.'# reward += (self.net_worth / INITIAL_ACCOUNT_BALANCE - max_predict_rate) / max_predict_rate reward += self.net_worth / INITIAL_ACCOUNT_BALANCEself.current_step = 0 # loop trainingdone = Trueif self.net_worth <= 0 :status = f'[ENV] Failure at step {self.current_step}. Loss all worth. Max worth was {self.max_net_worth}'reward += -1# self.current_step = 0done = Trueelse:# 計算相對收益比,并據此來計算獎勵profit = self.net_worth - INITIAL_ACCOUNT_BALANCE# profit = self.net_worth - self.balanceprofit_percent = profit / INITIAL_ACCOUNT_BALANCEif profit_percent > 0:reward += profit_percent / max_predict_rateelif profit_percent == 0:reward += -0.1else:reward += -0.1obs = self._next_observation()return obs, reward, done, {'profit': self.net_worth,'current_step': self.current_step,'status': status}# 重置環境def reset(self, new_df=None):# 重置環境的變量為初始值self.balance = INITIAL_ACCOUNT_BALANCEself.net_worth = INITIAL_ACCOUNT_BALANCEself.max_net_worth = INITIAL_ACCOUNT_BALANCEself.shares_held = 0self.cost_basis = 0self.total_shares_sold = 0self.total_sales_value = 0# 傳入環境數據集if new_df:self.df = new_df# if self.current_step > len(self.df.loc[:, 'open'].values) - 1:self.current_step = 0return self._next_observation()def get_obs(self, current_step):d10 = self.df.loc[current_step, 'peTTM'] / 100d11 = self.df.loc[current_step, 'pbMRQ'] / 100d12 = self.df.loc[current_step, 'psTTM'] / 100if np.isnan(d10): # 某些數據是0.00000000e+00,如果是nan會報錯d10 = d11 = d12 = 0.00000000e+00obs = np.array([self.df.loc[current_step, 'open'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'high'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'low'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'close'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'volume'] / MAX_VOLUME,self.df.loc[current_step, 'amount'] / MAX_AMOUNT,self.df.loc[current_step, 'adjustflag'],self.df.loc[current_step, 'tradestatus'] / 1,self.df.loc[current_step, 'pctChg'] / 100,d10,d11,d12,self.df.loc[current_step, 'pcfNcfTTM'] / 100,self.balance / MAX_ACCOUNT_BALANCE,self.max_net_worth / MAX_ACCOUNT_BALANCE,self.net_worth / MAX_ACCOUNT_BALANCE,self.shares_held / MAX_NUM_SHARES,self.cost_basis / MAX_SHARE_PRICE,self.total_shares_sold / MAX_NUM_SHARES,self.total_sales_value / (MAX_NUM_SHARES * MAX_SHARE_PRICE),])return obs# 顯示環境至屏幕def render(self, mode='human'):# 打印環境信息profit = self.net_worth - INITIAL_ACCOUNT_BALANCEprint('-'*30)print(f'Step: {self.current_step}')print(f'Balance: {self.balance}')print(f'Shares held: {self.shares_held} (Total sold: {self.total_shares_sold})')print(f'Avg cost for held shares: {self.cost_basis} (Total sales value: {self.total_sales_value})')print(f'Net worth: {self.net_worth} (Max net worth: {self.max_net_worth})')print(f'Profit: {profit}')return profit # 獲得數據 df = pd.read_csv('./stock/train.csv') # 根據數據集設置環境 env = StockTradingEnv(df) # T得到環境的參數信息(如:狀態和動作的維度) state_dim = env.observation_space.shape[0] action_dim = env.action_space.shape[0]max_action = float(env.action_space.high[1]) max_step = len(df.loc[:, 'open'].values) print(f'state: {state_dim}, action: {action_dim}, action max value: {max_action}, max step:{max_step}') state: 20, action: 2, action max value: 1.0, max step:5125 # 獲得數據 eval_df = pd.read_csv('./stock/test_v1.csv') # 根據數據集設置環境 eval_env = StockTradingEnv(eval_df)4.模型構建
模型構建部分主要實現智能提StockAgent,StockModel,StockAgent定義了模型的學習和參數更新方法,StockModel定義了模型的結構。
import parl import paddle import paddle.nn as nn import paddle.nn.functional as Fclass StockAgent(parl.Agent):def __init__(self, algorithm):super(StockAgent, self).__init__(algorithm)self.alg.sync_target(decay=0)def predict(self, obs):obs = paddle.to_tensor(obs.reshape(1, -1), dtype='float32')action = self.alg.predict(obs)action_numpy = action.cpu().numpy()[0]return action_numpydef sample(self, obs):obs = paddle.to_tensor(obs.reshape(1, -1), dtype='float32')action, _ = self.alg.sample(obs)action_numpy = action.cpu().numpy()[0]return action_numpydef learn(self, obs, action, reward, next_obs, terminal):terminal = np.expand_dims(terminal, -1)reward = np.expand_dims(reward, -1)obs = paddle.to_tensor(obs, dtype='float32')action = paddle.to_tensor(action, dtype='float32')reward = paddle.to_tensor(reward, dtype='float32')next_obs = paddle.to_tensor(next_obs, dtype='float32')terminal = paddle.to_tensor(terminal, dtype='float32')critic_loss, actor_loss = self.alg.learn(obs, action, reward, next_obs,terminal)return critic_loss, actor_loss # clamp bounds for Std of action_log # action網絡輸出的標準差的上界和下界 LOG_SIG_MAX = 1.0 LOG_SIG_MIN = -1e9class StockModel(parl.Model):def __init__(self, obs_dim, action_dim):super(StockModel, self).__init__()self.actor_model = Actor(obs_dim, action_dim)self.critic_model = Critic(obs_dim, action_dim)def policy(self, obs):return self.actor_model(obs)def value(self, obs, action):return self.critic_model(obs, action)def get_actor_params(self):return self.actor_model.parameters()def get_critic_params(self):return self.critic_model.parameters()class Actor(parl.Model):def __init__(self, obs_dim, action_dim):super(Actor, self).__init__()self.l1 = nn.Linear(obs_dim, 256)self.l2 = nn.Linear(256, 256)self.mean_linear = nn.Linear(256, action_dim)self.std_linear = nn.Linear(256, action_dim)def forward(self, obs):x = F.relu(self.l1(obs))x = F.relu(self.l2(x))act_mean = self.mean_linear(x)act_std = self.std_linear(x)act_log_std = paddle.clip(act_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)return act_mean, act_log_stdclass Critic(parl.Model):def __init__(self, obs_dim, action_dim):super(Critic, self).__init__()# Q1 networkself.l1 = nn.Linear(obs_dim + action_dim, 256)self.l2 = nn.Linear(256, 256)self.l3 = nn.Linear(256, 1)# Q2 networkself.l4 = nn.Linear(obs_dim + action_dim, 256)self.l5 = nn.Linear(256, 256)self.l6 = nn.Linear(256, 1)def forward(self, obs, action):x = paddle.concat([obs, action], 1)# Q1q1 = F.relu(self.l1(x))q1 = F.relu(self.l2(q1))q1 = self.l3(q1)# Q2q2 = F.relu(self.l4(x))q2 = F.relu(self.l5(q2))q2 = self.l6(q2)return q1, q2設置強化學習的超參數。
SEED = 0 # 隨機種子 WARMUP_STEPS = 640 EVAL_EPISODES = 5 # 評估的輪數 MEMORY_SIZE = int(1e5) # 經驗池的大小 BATCH_SIZE = 64 # 批次的大小 GAMMA = 0.995 # 折扣因子 TAU = 0.005 # 當前網絡參數比例,用于更新目標網絡 ACTOR_LR = 1e-4 # actor網絡的參數 CRITIC_LR = 1e-4 # critic網絡的參數 alpha = 0.2 # 熵正則化系數, SAC的參數 MAX_REWARD = -1e9 # 最大獎勵 file_name = f'sac_Stock' # 模型保存的名字定義SAC算法和Agent,其他的DDPG和TD3算法的定義類似。
# Initialize model, algorithm, agent, replay_memory model = StockModel(state_dim, action_dim) algorithm = SAC(model,gamma=GAMMA,tau=TAU,alpha=alpha,actor_lr=ACTOR_LR,critic_lr=CRITIC_LR) agent = StockAgent(algorithm) rpm = ReplayMemory(max_size=MEMORY_SIZE, obs_dim=state_dim, act_dim=action_dim) W0629 11:43:52.308462 7549 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1 W0629 11:43:52.312708 7549 gpu_context.cc:306] device: 0, cuDNN Version: 7.6.5. 模型訓練
模型的訓練過程如下,我們在訓練環境中進行訓練,在測試環境中進行測試,取在測試環境中平均回報最大的參數進行保存。
# Runs policy for 5 episodes by default and returns average reward # A fixed seed is used for the eval environment eval_seed = [0, 53, 47, 99, 107, 1, 17, 57, 97, 179, 777] @paddle.no_grad() def run_evaluate_episodes(agent, env, eval_episodes):avg_reward = 0.for epi in range(eval_episodes):obs = env.reset()env.seed(eval_seed[epi])done = Falsewhile not done:action = agent.predict(obs)obs, reward, done, _ = env.step(action)avg_reward += rewardavg_reward /= eval_episodesprint(f'Evaluator: the average reward is {avg_reward:.3f} over {eval_episodes} episodes.')return avg_reward # Run episode for training def run_train_episode(agent, env, rpm,episode_num):action_dim = env.action_space.shape[0]obs = env.reset()env.seed(SEED)done = Falseepisode_reward = 0episode_steps = 0while not done:episode_steps += 1# Select action randomly or according to policyif rpm.size() < WARMUP_STEPS:action = np.random.uniform(-1, 1, size=action_dim)else:action = agent.sample(obs)# action = agent.sample(obs)action = (action+1.0)/2.0next_obs, reward, done, info = env.step(action)terminal = float(done)# Store data in replay memoryrpm.append(obs, action, reward, next_obs, terminal)obs = next_obsepisode_reward += reward# Train agent after collecting sufficient dataif rpm.size() >= WARMUP_STEPS:batch_obs, batch_action, batch_reward, batch_next_obs, batch_terminal = rpm.sample_batch(BATCH_SIZE)agent.learn(batch_obs, batch_action, batch_reward, batch_next_obs,batch_terminal)# print(f'Learner: Episode {episode_steps+1} done. The reward is {episode_reward:.3f}.')# 打印信息current_step = info['current_step']print(f'Learner: Episode {episode_num} done. The reward is {episode_reward:.3f}.')print(info['status'])return episode_reward, episode_steps總共訓練train_total_steps數,每訓練完一個episode,我們把模型放到測試集的環境進行評估,得到平均獎勵,并保存平均獎勵最大的模型。
def do_train(agent, env, rpm):save_freq = 1total_steps = 0train_total_steps = 3e6episode_num = 0best_award = -1e9while total_steps < train_total_steps:episode_num +=1# Train episodeepisode_reward, episode_steps = run_train_episode(agent, env, rpm,episode_num)total_steps += episode_stepsif(episode_num%save_freq==0):avg_reward = run_evaluate_episodes(agent, eval_env, EVAL_EPISODES)if(best_award<avg_reward):best_award = avg_rewardprint(f'Saving best model!')agent.save(f"./models/{file_name}.ckpt")do_train(agent, env, rpm)運行的時間比較長,需要耐心的等待。起始資金設置的是10萬,大家可以從日志中看出收益,總體來說收益都是正向的,即大于10萬。
6. 交易測試
交易測試環節加載最好的模型,并設置最大執行的數max_action_step,可以查看平均收益。
def run_test_episodes(agent, env, eval_episodes,max_action_step = 200):avg_reward = 0.avg_worth = 0.for _ in range(eval_episodes):obs = env.reset()env.seed(0)done = Falset = 0while not done:action = agent.predict(obs)obs, reward, done, info = env.step(action)avg_reward += rewardt+=1if(t==max_action_step):# eval_env.render()print('over')breakavg_worth += info['profit']avg_reward /= eval_episodesavg_worth /= eval_episodesprint(f'Evaluator: The average reward is {avg_reward:.3f} over {eval_episodes} episodes.')print(f'Evaluator: The average worth is {avg_worth:.3f} over {eval_episodes} episodes.')return avg_reward # 獲得數據 df = pd.read_csv('./stock/test_v1.csv') # 根據數據集設置環境 env = StockTradingEnv(df) agent.restore('models/sac_Stock_base.ckpt') # 設置的最大執行的天數,每一個step表示一天 max_action_step = 400 avg_reward = run_test_episodes(agent, env, EVAL_EPISODES,max_action_step) Evaluator: The average reward is 75.724 over 5 episodes. Evaluator: The average worth is 210542.472 over 5 episodes.7.線上部署
線上部署首先需要把強化學習模型導出,然后弄成serving的形式,然后集成到量化交易系統,就可以嘗試使用看收益啦。
7.1 轉換成靜態圖
利用parl庫的save_inference_model接口把模型的actor網絡部分轉換成靜態圖。
save_inference_path = './output/inference_model' input_shapes = [[None, env.observation_space.shape[0]]] input_dtypes = ['float32'] agent.save_inference_model(save_inference_path, input_shapes, input_dtypes,model.actor_model)7.2 靜態圖預測
轉換成靜態圖以后,接下來可以加載靜態圖模型進行簡單的測試,給模型傳入某一天的state的數據,然后模型預測出執行的動作。
from paddle import inferenceclass Predictor(object):def __init__(self,model_dir,device="gpu",batch_size=32,use_tensorrt=False,precision="fp32",cpu_threads=10,enable_mkldnn=False):self.batch_size = batch_sizemodel_file = model_dir + "/inference_model.pdmodel"params_file = model_dir + "/inference_model.pdiparams"if not os.path.exists(model_file):raise ValueError("not find model file path {}".format(model_file))if not os.path.exists(params_file):raise ValueError("not find params file path {}".format(params_file))config = paddle.inference.Config(model_file, params_file)if device == "gpu":# set GPU configs accordingly# such as intialize the gpu memory, enable tensorrtconfig.enable_use_gpu(100, 0)precision_map = {"fp16": inference.PrecisionType.Half,"fp32": inference.PrecisionType.Float32,"int8": inference.PrecisionType.Int8}precision_mode = precision_map[precision]if use_tensorrt:config.enable_tensorrt_engine(max_batch_size=batch_size,min_subgraph_size=30,precision_mode=precision_mode)elif device == "cpu":# set CPU configs accordingly,# such as enable_mkldnn, set_cpu_math_library_num_threadsconfig.disable_gpu()if args.enable_mkldnn:# cache 10 different shapes for mkldnn to avoid memory leakconfig.set_mkldnn_cache_capacity(10)config.enable_mkldnn()config.set_cpu_math_library_num_threads(args.cpu_threads)elif device == "xpu":# set XPU configs accordinglyconfig.enable_xpu(100)config.switch_use_feed_fetch_ops(False)self.predictor = paddle.inference.create_predictor(config)self.input_handles = [self.predictor.get_input_handle(name)for name in self.predictor.get_input_names()]# self.output_handle = self.predictor.get_output_handle(# self.predictor.get_output_names()[0])self.output_handle = [self.predictor.get_output_handle(name)for name in self.predictor.get_output_names()]# 重置環境的變量為初始值self.balance = INITIAL_ACCOUNT_BALANCEself.net_worth = INITIAL_ACCOUNT_BALANCEself.max_net_worth = INITIAL_ACCOUNT_BALANCEself.shares_held = 0self.cost_basis = 0self.total_shares_sold = 0self.total_sales_value = 0def predict(self, df):"""Predicts the data labels.Args:data (obj:`List(str)`): The batch data whose each element is a raw text.tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` which contains most of the methods. Users should refer to the superclass for more information regarding methods.Returns:results(obj:`dict`): All the predictions probs."""obs = self.get_obs(df,0)print(obs)self.input_handles[0].copy_from_cpu(obs.reshape(1, -1).astype('float32'))self.predictor.run()action = self.output_handle[0].copy_to_cpu()std = self.output_handle[1].copy_to_cpu()return [action,std]def get_obs(self, df, current_step):self.df = dfd10 = self.df.loc[current_step, 'peTTM'] / 100d11 = self.df.loc[current_step, 'pbMRQ'] / 100d12 = self.df.loc[current_step, 'psTTM'] / 100if np.isnan(d10): # 某些數據是0.00000000e+00,如果是nan會報錯d10 = d11 = d12 = 0.00000000e+00obs = np.array([self.df.loc[current_step, 'open'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'high'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'low'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'close'] / MAX_SHARE_PRICE,self.df.loc[current_step, 'volume'] / MAX_VOLUME,self.df.loc[current_step, 'amount'] / MAX_AMOUNT,self.df.loc[current_step, 'adjustflag'],self.df.loc[current_step, 'tradestatus'] / 1,self.df.loc[current_step, 'pctChg'] / 100,d10,d11,d12,self.df.loc[current_step, 'pcfNcfTTM'] / 100,self.balance / MAX_ACCOUNT_BALANCE,self.max_net_worth / MAX_ACCOUNT_BALANCE,self.net_worth / MAX_ACCOUNT_BALANCE,self.shares_held / MAX_NUM_SHARES,self.cost_basis / MAX_SHARE_PRICE,self.total_shares_sold / MAX_NUM_SHARES,self.total_sales_value / (MAX_NUM_SHARES * MAX_SHARE_PRICE),])return obs model_dir = 'output' device = 'gpu' predictor = Predictor(model_dir, device) df = pd.read_csv('./stock/test_v1.csv') act_out, act_std = predictor.predict(df) # print(result) action = (act_out[0]+1.0)/2.0 print(act_out) print(action) [1.92800000e-03 1.94600000e-03 1.91000000e-03 1.93800000e-036.29069390e-02 6.06364959e-02 3.00000000e+00 1.00000000e+001.03300000e-03 5.14297900e-02 5.57414000e-03 1.47343800e-023.46801300e-02 4.65662078e-02 4.65662078e-02 4.65662078e-020.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [[-0.16079684 -0.09829579]] [0.4196016 0.4508521][1m[35m--- Running analysis [ir_graph_build_pass][0m [1m[35m--- Running analysis [ir_graph_clean_pass][0m [1m[35m--- Running analysis [ir_analysis_pass][0m [32m--- Running IR pass [is_test_pass][0m [32m--- Running IR pass [simplify_with_basic_ops_pass][0m [32m--- Running IR pass [conv_bn_fuse_pass][0m [32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass][0m [32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass][0m [32m--- Running IR pass [multihead_matmul_fuse_pass_v2][0m [32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass][0m [32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass][0m [32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass][0m [32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass][0m I0629 11:50:33.165313 7549 fuse_pass_base.cc:57] --- detected 4 subgraphs [32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass][0m [32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass][0m [32m--- Running IR pass [fc_fuse_pass][0m I0629 11:50:33.166007 7549 fuse_pass_base.cc:57] --- detected 4 subgraphs [32m--- Running IR pass [fc_elementwise_layernorm_fuse_pass][0m [32m--- Running IR pass [conv_elementwise_add_act_fuse_pass][0m [32m--- Running IR pass [conv_elementwise_add2_act_fuse_pass][0m [32m--- Running IR pass [conv_elementwise_add_fuse_pass][0m [32m--- Running IR pass [transpose_flatten_concat_fuse_pass][0m [32m--- Running IR pass [runtime_context_cache_pass][0m [1m[35m--- Running analysis [ir_params_sync_among_devices_pass][0m I0629 11:50:33.167120 7549 ir_params_sync_among_devices_pass.cc:100] Sync params from CPU to GPU [1m[35m--- Running analysis [adjust_cudnn_workspace_size_pass][0m [1m[35m--- Running analysis [inference_op_replace_pass][0m [1m[35m--- Running analysis [ir_graph_to_program_pass][0m I0629 11:50:33.170668 7549 analysis_predictor.cc:1007] ======= optimize end ======= I0629 11:50:33.170722 7549 naive_executor.cc:102] --- skip [feed], feed -> obs I0629 11:50:33.170990 7549 naive_executor.cc:102] --- skip [linear_12.tmp_1], fetch -> fetch I0629 11:50:33.170997 7549 naive_executor.cc:102] --- skip [clip_0.tmp_0], fetch -> fetch7.3 Paddle Serving 部署
import paddle_serving_client.io as serving_iodirname="output" # 模型的路徑 model_filename="inference_model.pdmodel" # 參數的路徑 params_filename="inference_model.pdiparams" # server的保存地址 server_path="serving_server" # client的保存地址 client_path="serving_client" # 指定輸出的別名 feed_alias_names=None # 制定輸入的別名 fetch_alias_names='mean_output,std_output' # 設置為True會顯示日志 show_proto=None serving_io.inference_model_to_serving(dirname=dirname,serving_server=server_path,serving_client=client_path,model_filename=model_filename,params_filename=params_filename,show_proto=show_proto,feed_alias_names=feed_alias_names,fetch_alias_names=fetch_alias_names) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle_serving_client/httpclient.py:22: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop workingfrom collections import Iterable(dict_keys(['obs']), dict_keys(['linear_12.tmp_1', 'clip_0.tmp_0']))搭建結束以后,就可以啟動server部署服務,使用client端訪問server端就行了。具體細節參考代碼:https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples/Pipeline/simple_web_service
7.4 量化交易系統搭建
量化交易系統搭建請參考鏈接:https://github.com/vnpy/vnpy ,
VeighNa是一套基于Python的開源量化交易系統開發框架,在開源社區持續不斷的貢獻下一步步成長為多功能量化交易平臺,自發布以來已經積累了眾多來自金融機構或相關領域的用戶,包括私募基金、證券公司、期貨公司等。具有以下的特點:
1.豐富接口:支持大量高性能交易Gateway接口,包括:期貨、期權、股票、期貨期權、黃金T+d、銀行間固收、外盤市場等
2.開箱即用:內置諸多成熟的量化交易策略App模塊,用戶可以自由選擇通過GUI圖形界面模式管理,或者使用CLI腳本命令行模式運行
3.自由拓展:結合事件驅動引擎的核心架構以及Python的膠水語言特性,用戶可以根據自己的需求快速對接新的交易接口或者開發上層策略應用
4.開源平臺:遵循開放靈活的MIT開源協議,可以在Gitee上獲取所有項目源代碼,自由使用于自己的開源項目或者商業項目,且永久免費
【注意】本項目從頭到尾講解了SAC算法應用,很容易實現多種強化學習的算法,然后可以綜合決策,提升策略的魯棒性
8.參考文獻
[1].【協同育人項目】【實踐】基于DDPG算法的股票量化交易. https://aistudio.baidu.com/aistudio/projectdetail/2221634
此文僅為搬運,原作鏈接:https://aistudio.baidu.com/aistudio/projectdetail/4275734?channelType=0&channel=0
總結
以上是生活随笔為你收集整理的动手搭建深度强化学习的自动股票量化交易系统的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 缺少mdmcpq驱动文件非原版Windo
- 下一篇: 计算服务器Centos 7.9 配置to