當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

期权定价_强化学习的期权定价

發(fā)布時間：2023/12/16 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了期权定价_强化学习的期权定价小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

期權(quán)定價

This post demonstrates how to use reinforcement learning to price an American Option. An option is a derivative contract that gives its owner the right but not the obligation to buy or sell an underlying asset. Unlike its European-style counterpart, American-style option may exercise at any time before expiry.

這篇文章演示了如何使用強化學(xué)習(xí)來定價美國期權(quán)。期權(quán)是一種衍生合同，它賦予其所有者購買或出售基礎(chǔ)資產(chǎn)的權(quán)利，但沒有義務(wù)。與歐式期權(quán)不同，美式期權(quán)可以在到期前的任何時間行使。

American Option is known to be an optimal control MDP (Markov Decision Process) problem where the underlying process is Geometric Brownian motion ([1]). The Markovian state is a price-time tuple and the control is a binary action that decides on each day whether to exercise the option or not.

眾所周知，美式期權(quán)是一個最優(yōu)控制MDP(馬爾可夫決策過程)問題，其中基礎(chǔ)過程是幾何布朗運動([1])。馬爾可夫狀態(tài)是一個價格時間元組，而控件是一個二進(jìn)制操作，它每天決定是否執(zhí)行該期權(quán)。

The optimal stopping policy looks like the figure below, where the x-axis is time and the y-axis is the stock price. The curve in red is commonly called the optimal exercise boundary. On each day, if the stock price falls in the exercise region that is located above the boundary for a call or below the boundary for a put, it is optimal to exercise the option and get paid by the amount between stock price and strike price.

最佳停止策略如下圖所示，其中x軸是時間，y軸是股票價格。紅色的曲線通常稱為最佳運動邊界。每天，如果股票價格位于看漲期權(quán)的邊界上方或看跌期權(quán)的邊界下方的行使區(qū)域中，則最好行使期權(quán)并按股票價格與行使價之間的金額獲得報酬。

One can imagine it as a discretized Q-table as illustrated in dotted grids. Every day the agent or the trader looks up the table and take action according to today’s’ price. The Q-table is monotonous in that all the grids above the boundary yield a go-decision and all the grids below yield a no-go decision. Therefore Q-learning suits well to find the optimal strategy that is defined by this boundary.

可以想象它是離散的Q表，如虛線網(wǎng)格所示。代理商或交易員每天都會查表，并根據(jù)今天的價格采取行動。 Q表是單調(diào)的，因為邊界上方的所有網(wǎng)格均產(chǎn)生決策，而邊界下方的所有網(wǎng)格均產(chǎn)生決策。因此，Q學(xué)習(xí)非常適合找到由該邊界定義的最佳策略。

The remainder contains three sections. In the first section, a baseline price is computed using classical models. In the second section, an OpenAI gym environment is constructed, similar to building an Atari game. and then in the third section, an agent is trained with DQN (Deep Q-Network) to play American options, similar to training computers to play Atari games. The full Python notebook is located here on Github.

其余部分分為三個部分。在第一部分中，使用經(jīng)典模型計算基準(zhǔn)價格。在第二部分中，構(gòu)建了OpenAI體育館環(huán)境，類似于構(gòu)建Atari游戲。然后在第三部分中，對代理商進(jìn)行了DQN(深度Q網(wǎng)絡(luò))培訓(xùn)，以玩美式期權(quán)，類似于訓(xùn)練計算機玩Atari游戲。完整的Python筆記本位于Github上。

第一節(jié)-基準(zhǔn) (Section One — Baseline)

There are many ways to price an American option, from for example binomial tree to Longstaff-Schwartz Monte Carlo methods. Here I use QuantLib package to price a one-year American put option.

有多種定價美式期權(quán)的方法，例如從二叉樹到Longstaff-Schwartz蒙特卡洛方法。在這里，我使用QuantLib packag e為一年期美國看跌期權(quán)定價。

pricing_dict = {}bsm73 = ql.AnalyticEuropeanEngine(bsm_process) european_option.setPricingEngine(bsm73) pricing_dict['BlackScholesEuropean'] = european_option.NPV()analytical_engine = ql.BaroneAdesiWhaleyEngine(bsm_process) american_option.setPricingEngine(analytical_engine) pricing_dict['BawApproximation'] = american_option.NPV()binomial_engine = ql.BinomialVanillaEngine(bsm_process, "crr", 100) american_option.setPricingEngine(binomial_engine) pricing_dict['BinomialTree'] = american_option.NPV()print(pricing_dict){'BlackScholesEuropean': 6.92786901829998, 'BawApproximation': 7.091254636695334, 'BinomialTree': 7.090924645858217}

The last line is the output, which says this American option is worth $7.091, while its European counterpart is worth $6.928. This implies an early exercise premium of $0.163.

最后一行是輸出，表示此美式期權(quán)價值$ 7.091，而其歐洲期權(quán)價值$ 6.928。這意味著提前行使期權(quán)費為0.163美元。

第二部分-OpenAI體育館環(huán)境 (Section Two — OpenAI Gym Environment)

It is standard to derive from OpenAI gym environment. This makes our work expandable to further studies such as exotic options and stochastic volatilities. The underlying theory is the famous Black-Sholes framework and the underlying asset follows Geometric Brownian motion in the risk-neutral world. This is realized in the step function below,

它是從OpenAI體育館環(huán)境衍生而來的標(biāo)準(zhǔn)。這使我們的工作可擴(kuò)展到進(jìn)一步的研究，例如外來期權(quán)和隨機波動率。基本理論是著名的Black-Sholes框架，基本資產(chǎn)在風(fēng)險中性世界中遵循幾何布朗運動。這在下面的步進(jìn)功能中實現(xiàn)，

def step(self, action):if action == 1: # exercise reward = max(K-self.S1, 0.0) * np.exp(-self.r * self.T * (self.day_step/self.N))
done = Trueelse: # holdif self.day_step == self.N: # at maturity
reward = max(self.K-self.S1, 0.0) * np.exp(-self.r * self.T)
done = Trueelse: # move to tomorrow
reward = 0# lnS1 - lnS0 = (r - 0.5*sigma^2)*t + sigma * Wt
self.S1 = self.S1 * np.exp((self.r - 0.5 * self.sigma**2) * (self.T/self.N) + self.sigma * np.sqrt(self.T/self.N) * np.random.normal())
self.day_step += 1
done = False tao = 1.0-self.day_step/self.N # time to maturity, in unit of yearsreturn np.array([self.S1, tao]), reward, done, {}

The AmeriOptionEnv takes action 0 as to hold or not exercise, and action 1 as to exercise. If we stick to the no-exercise policy until expiry, this essentially becomes stock price simulation and serves as input to price European option as control variate.

AmeriOptionEnv采取行動0來決定是否進(jìn)行鍛煉，采取行動1來進(jìn)行鍛煉。如果我們在到期前堅持不運動的政策，這實質(zhì)上就是股票價格模擬，并作為控制變量作為歐洲期權(quán)價格的輸入。

import matplotlib.pyplot as pltenv = AmeriOptionEnv()
s = env.reset()sim_prices = []
sim_prices.append(s[0])for i in range(365):
action = 0 # hold until expiry
s_next, reward, done, info = env.step(action)
sim_prices.append(s_next[0])plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.plot(sim_prices)Stock Price Gym Simulation股票價格模擬

第三節(jié)-DQN定價 (Section Three — Pricing with DQN)

Once the gym environment is constructed, we are ready to price the American option using reinforcement learning, specifically DQN (Deep Q-Network) in this post. Here I use the Tensorflow TF-Agents Library. Alternative choices are other OpenAI Gym compatible libraries such as Pytorch and OpenAI baseline.

構(gòu)建好體育館環(huán)境后，我們準(zhǔn)備使用強化學(xué)習(xí)(特別是本文中的DQN(深度Q網(wǎng)絡(luò)))為美國選項定價。在這里我使用Tensorflow TF-Agents Library 。其他的選擇是其他與OpenAI Gym兼容的庫，例如Pytorch和OpenAI基線。

The code follows precisely the TF-Agents API document. The only changes I made are importing customized AmeriOption environment and adjusting hyper-parameters that are more pertinent to the one-year option than Cartpole game.

該代碼恰好遵循TF-Agents API文檔。我所做的唯一更改是導(dǎo)入自定義的AmeriOption環(huán)境并調(diào)整與一年選項相比比Cartpole游戲更相關(guān)的超參數(shù)。

As labelled in Jupyter notebook, the RL model is constructed in the following steps:

如Jupyter筆記本中標(biāo)記的那樣，RL模型是按照以下步驟構(gòu)建的：

Import user-defined AmeriOptionEnv Environment.

導(dǎo)入用戶定義的AmeriOptionEnv環(huán)境。

Define deep Q-network

定義深度Q網(wǎng)絡(luò)

Employ a DQN agent

聘請DQN代理

Construct Experience Replay Buffer

構(gòu)造體驗重播緩沖區(qū)

Train the agent in episodes

在情節(jié)中訓(xùn)練特工

Use the trained policy to price option

使用經(jīng)過培訓(xùn)的政策為期權(quán)定價

According to API, a TF-agent is defined as

根據(jù)API，TF代理定義為

agent = dqn_agent.DqnAgent(
train_env.time_step_spec(),
train_env.action_spec(),
q_network=q_net,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)agent.initialize()

to be aware of the environment states, action space, the deep neural network for policy evaluation, and an optimizer on temporal-difference loss function to do TD-optimization.

要了解環(huán)境狀態(tài)，行動空間，用于策略評估的深度神經(jīng)網(wǎng)絡(luò)，以及針對時差損失函數(shù)的優(yōu)化器以進(jìn)行TD優(yōu)化。

Policy Training Performance政策培訓(xùn)績效

The training performance is shown as above. It is rather noisy because the evaluation step uses only 10 simulation paths and is subject to Monte Carlo randomness. For example, we know the option price is around $7 yet the average price can go as high as $12. Therefore, after learning the optimal stopping policy, it is essential to do a full-blown Monte Carlo to find the actual price as below.

訓(xùn)練效果如上所示。這很嘈雜，因為評估步驟僅使用10條仿真路徑，并且受到蒙特卡洛隨機性的影響。例如，我們知道期權(quán)價格約為7美元，但平ASP格可能高達(dá)12美元。因此，在學(xué)習(xí)了最佳停止策略之后，有必要做一個成熟的蒙特卡洛法以找到如下的實際價格。

import pandas as pdnpv = compute_avg_return(eval_env, agent.policy, num_episodes=2_000)
pricing_dict['ReinforcementAgent'] = npvpricing_df = pd.DataFrame.from_dict(pricing_dict, orient='index')
pricing_df.columns = ['Price']
print(pricing_df)

The Reinforcement learning agent values the price at $7.057, implying an early exercise premium $0.129, This result is in line with classical baseline models.

強化學(xué)習(xí)代理的價格為$ 7.057，這意味著早期行使價為$ 0.129，此結(jié)果與經(jīng)典基準(zhǔn)模型一致。

結(jié)論 (Conclusion)

In this post, we prepare a gym environment and then train a DQN TF-Agent to price an American option. The result is encouraging with a reasonably good price that is in line with classical baseline models. Some improvements include,

在這篇文章中，我們準(zhǔn)備了一個健身房環(huán)境，然后訓(xùn)練了DQN TF-Agent來定價美式期權(quán)。其結(jié)果是令人鼓舞的，它的合理價格與經(jīng)典基準(zhǔn)模型相一致。一些改進(jìn)包括

For practitioners,

對于從業(yè)者

Use a mirror AmeriOption gym environment to provide antithetic variates.

在AmeriOption健身房環(huán)境中使用鏡子以提供對立變量。

In function compute_avg_return, continue the simulation path to price European option as control variates.

在函數(shù)compute_avg_return中，隨著控制變量的變化，繼續(xù)進(jìn)行模擬以對歐式期權(quán)定價。

For researchers,

對于研究人員

Add another MDP process to capture stochastic volatility.

添加另一個MDP流程以捕獲隨機波動率。

Instead of using the default network structure, design a specialized multi-layer network to enable transfer learning into other maturities as well as options on rates, futures, FX, and exotic products.

與其使用默認(rèn)的網(wǎng)絡(luò)結(jié)構(gòu)，不如設(shè)計一個專門的多層網(wǎng)絡(luò)以使學(xué)習(xí)轉(zhuǎn)移到其他到期日以及利率，期貨，外匯和奇異產(chǎn)品的選擇權(quán)。

翻譯自: https://medium.com/swlh/option-pricing-using-reinforcement-learning-ad2ddca7735b

期權(quán)定價