當(dāng)前位置：首頁 > 运维知识 > windows >内容正文

windows

基于kb的问答系统_1KB以下基于表的Q学习

發(fā)布時間：2023/12/15 windows 33 豆豆

生活随笔收集整理的這篇文章主要介紹了基于kb的问答系统_1KB以下基于表的Q学习小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

基于kb的問答系統(tǒng)

介紹 (Introduction)

Q-learning is an algorithm in which an agent interacts with its environment and collects rewards for taking desirable actions.

Q學(xué)習(xí)是一種算法，其中代理與它的環(huán)境進(jìn)行交互，并為采取期望的行動而收集獎勵。

The simplest implementation of Q-learning is referred to as tabular or table-based Q-learning. There are tons of articles, tutorials, etc. already available on the web which describe Q-learning so I won’t go into excruciating detail here. Instead, I want to show how efficiently table-base Q-learning can be done using tinymind. In this article, I will describe how tinymind implements Q-learning using C++ templates and fixed-point (Q-format) numbers as well as go thru the example in the repo.

Q學(xué)習(xí)的最簡單實現(xiàn)稱為表格或基于表的Q學(xué)習(xí)。網(wǎng)絡(luò)上已經(jīng)有大量的文章，教程等描述了Q學(xué)習(xí)，所以在這里我不再贅述。相反，我想展示使用tinymind如何高效地進(jìn)行基于表的Q學(xué)習(xí)。在本文中，我將描述tinymind如何使用C ++模板和定點(diǎn)( Q格式 )數(shù)字實現(xiàn)Q學(xué)習(xí)，并通過回購中的示例進(jìn)行介紹。

迷宮問題 (The Maze Problem)

A common table-based Q-learning problem is to train a virtual mouse to find its way out of a maze to get the cheese (reward). Tinymind contains an example program which demonstrates how the Q-learning template library works.

基于表的常見Q學(xué)習(xí)問題是訓(xùn)練虛擬鼠標(biāo)從迷宮中脫出以獲取奶酪(獎勵)。 Tinymind包含一個示例程序，該程序演示了Q學(xué)習(xí)模板庫的工作方式。

In the example program, we define the maze:

在示例程序中，我們定義迷宮：

/*
Q-Learning unit test. Learn the best path out of a simple maze.
5 == Outside the maze
________________________________________________
| | |
| | |
| 0 | 1 / 5
| | |
|____________/ ________|__/ __________________|_______________________
| | | |
| | / |
| 4 | 3 | 2 |
| / | |
|__/ __________________|_______________________|_______________________|
5
The paths out of the maze:0->4->5
0->4->3->1->5
1->5
1->3->4->5
2->3->1->5
2->3->4->5
3->1->5
3->4->5
4->5
4->3->1->5

We define all of our types in a common header so that we can separate the maze learner code from the training and file management code. I have done this so that we can measure the amount of code and data required for the Q-learner alone. The common header defines the maze as well as the type required to hold states and actions:

我們在公共標(biāo)頭中定義所有類型，以便我們可以將迷宮學(xué)習(xí)者代碼與培訓(xùn)和文件管理代碼分開。我這樣做是為了使我們可以衡量僅Q學(xué)習(xí)器所需的代碼和數(shù)據(jù)量。通用頭定義了迷宮以及保存狀態(tài)和動作所需的類型：

// 6 rooms and 6 actions
#define NUMBER_OF_STATES 6
#define NUMBER_OF_ACTIONS 6typedef uint8_t state_t;
typedef uint8_t action_t;

We train the mouse by dropping it into a randomly-selected room (or on the outside of it where the cheese is). The mouse starts off by taking a random action from a list of available actions at each step. The mouse receives a reward only when he finds the cheese (e.g. makes it to position 5 outside the maze). If the mouse is dropped into position 5, he has to learn to stay there and not wander back into the maze.

我們通過將鼠標(biāo)放到隨機(jī)選擇的房間(或放在奶酪所在的外部)中來訓(xùn)練鼠標(biāo)。鼠標(biāo)通過在每個步驟從可用動作列表中采取隨機(jī)動作來開始。僅當(dāng)他找到奶酪時(例如，使其在迷宮外的位置5定位)，鼠標(biāo)才會獲得獎勵。如果將鼠標(biāo)放到位置5，則他必須學(xué)會留在位置5，而不能游回迷宮。

建立例子 (Building The Example)

Starting from cppnnml/examples/maze, I will create a directory to hold the executable file and build the example.

從cppnnml / examples / maze開始，我將創(chuàng)建一個目錄來保存可執(zhí)行文件并構(gòu)建示例。

mkdir -p ~/maze
g++ -O3 -o ~/maze/maze maze.cpp mazelearner.cpp -I../../cpp

This builds the maze leaner example program and places the executable file at ~/maze. We can now cd into the directory where the executable file was generated and run the example program.

這將構(gòu)建迷宮式更精巧的示例程序，并將可執(zhí)行文件放置在?/ maze中。現(xiàn)在我們可以cd到生成可執(zhí)行文件的目錄中，并運(yùn)行示例程序。

cd ~/maze
./maze

When the program finishes running, you’ll see the last of the output messages, something like this:

程序完成運(yùn)行后，您將看到最后一條輸出消息，如下所示：

take action 5
*** starting in state 3 ***
take action 4
take action 5
*** starting in state 2 ***
take action 3
take action 2
take action 3
take action 4
take action 5
*** starting in state 3 ***
take action 4
take action 5
*** starting in state 5 ***
take action 5

Your messages may be slightly different since we’re starting our mouse in a random room on every iteration. During example program execution, we save all mouse activity to files (maze_training.txt and maze_test.txt). Within the training file, the mouse takes random actions for the first 400 episodes and then the randomness is decreased from 100% random to 0% random for another 100 episodes. To see the first few training iterations you can do this:

您的消息可能會略有不同，因為每次迭代時我們都會在隨機(jī)的房間內(nèi)啟動鼠標(biāo)。在示例程序執(zhí)行期間，我們將所有鼠標(biāo)活動保存到文件(maze_training.txt和maze_test.txt)中。在訓(xùn)練文件中，鼠標(biāo)在前400個情節(jié)中采取隨機(jī)動作，然后對于其他100個情節(jié)，隨機(jī)性從100％隨機(jī)降低為0％隨機(jī)。要查看前幾次訓(xùn)練迭代，可以執(zhí)行以下操作：

head maze_training.txt

You should see something like this:

您應(yīng)該會看到以下內(nèi)容：

1,3,4,0,4,5,
4,5,
2,3,1,3,4,3,1,5,
5,5,
4,5,
1,5,
3,2,3,4,3,4,5,
0,4,0,4,0,4,0,4,5,
1,3,1,5,
5,4,0,4,3,1,3,1,5,

Again, your messages will look slightly different. The first number is the start state and every comma-separated value after that is the random movement of the mouse from room to room. Example: In the first line above we started in room 1, then moved to 3, then 4, then 0, then back to 4, then to 5. Since 5 is our goal state, we stopped. The reason this looks so erratic is for the first 400 iterations of training we make a random decision from our possible actions. Once we get to state 5, we get our reward and stop.

同樣，您的消息看起來會略有不同。第一個數(shù)字是開始狀態(tài)，之后的每個逗號分隔值是鼠標(biāo)在不同房間之間的隨機(jī)移動。示例：在上面的第一行中，我們從1號房間開始，然后移至3號，然后是4號，然后是0號，然后又回到4號，然后是5號。由于5是我們的目標(biāo)狀態(tài)，因此我們停止了。看起來如此古怪的原因是對于訓(xùn)練的前400次迭代，我們從可能的動作中做出隨機(jī)決定。一旦進(jìn)入狀態(tài)5，我們將獲得獎勵并停止。

During the test runs, we’ve decreased our randomness down to 0% and so we rely upon the Q-table to decide which action to take from the state our mouse is in.

在測試運(yùn)行期間，我們已將隨機(jī)性降低到0％，因此我們依靠Q表來決定從鼠標(biāo)所處的狀態(tài)采取哪種動作。

可視化培訓(xùn)和測試 (Visualizing Training And Testing)

I have included a Python script to plot the training and test data. If we plot the training data for start state == 2 (i.e. the mouse is dropped into room 2 at the beginning):

我包括一個Python腳本來繪制訓(xùn)練和測試數(shù)據(jù)。如果我們繪制起始狀態(tài)== 2的訓(xùn)練數(shù)據(jù)(即，鼠標(biāo)首先放在房間2中)：

Each line on the graph represents an episode where we’ve randomly placed the mouse into room 2 at the start of the episode. You can see that in the worst case run, we took 32 random moves to find the goal state (state 5). This is because at each step, we’re simply generating a random number to choose from the available actions (i.e. which room should be move to next). If we use the script to plot the testing data for start state == 2:

圖中的每一行代表一個情節(jié)，在情節(jié)開始時我們將鼠標(biāo)隨機(jī)放置在2號房間中。您可以看到，在最壞的情況下，我們隨機(jī)進(jìn)行了32次移動以找到目標(biāo)狀態(tài)(狀態(tài)5)。這是因為在每個步驟中，我們只是生成一個隨機(jī)數(shù)以從可用操作中進(jìn)行選擇(即應(yīng)將哪個房間移至下一個房間)。如果我們使用腳本繪制開始狀態(tài)== 2的測試數(shù)據(jù)：

You can see that once trained, the Q-learner has learned, thru random experimentation, to follow an optimal path to the goal state: 2->3->4->5.

您可以看到，經(jīng)過訓(xùn)練后，Q-學(xué)習(xí)者通過隨機(jī)實驗學(xué)會了遵循最佳狀態(tài)進(jìn)入目標(biāo)狀態(tài)的途徑：2-> 3-> 4-> 5。

What happens when we drop the virtual mouse outside of the maze where the cheese is? If we plot the training data:

當(dāng)我們將虛擬鼠標(biāo)放到奶酪所在的迷宮之外時，會發(fā)生什么？如果我們繪制訓(xùn)練數(shù)據(jù)：

The mouse is making random decisions during training and so wanders back into the maze in most episodes. After training:

鼠標(biāo)在訓(xùn)練過程中會做出隨機(jī)決定，因此在大多數(shù)情節(jié)中都會游走回到迷宮中。訓(xùn)練結(jié)束后：

Our virtual mouse has learned to stay put and get the reward.

我們的虛擬鼠標(biāo)學(xué)會了保持原狀并獲得獎勵。

確定Q學(xué)習(xí)器的大小 (Determining The Size Of The Q-Learner)

We can determine how much code and data are taken up by the Q-learner by compiling just the machine learner code and using the size program:

通過僅編譯機(jī)器學(xué)習(xí)者代碼并使用size程序，我們可以確定Q學(xué)習(xí)者占用了多少代碼和數(shù)據(jù)：

g++ -c mazelearner.cpp -O3 -I../../cpp && mv mazelearner.o ~/maze/.
cd ~/maze
size mazelearner.o

The output you should see is:

您應(yīng)該看到的輸出是：

text data bss dec hex filename
540 8 348 896 380 mazelearner.o

The total code + data footprint of the Q-learner is 896 bytes. This should allow a table-based Q-learning implementation to fit in any embedded system available today.

Q學(xué)習(xí)器的總代碼+數(shù)據(jù)占用空間為896字節(jié)。這應(yīng)該允許基于表的Q學(xué)習(xí)實現(xiàn)適合當(dāng)今可用的任何嵌入式系統(tǒng)。

結(jié)論 (Conclusion)

Table-based Q-learning can be done very efficiently using the capabilities provided within tinymind. We don’t need floating point or fancy interpreted programming languages. One can instantiate a Q-learner using C++ templates and fixed point numbers. Clone the repo and try the example for yourself!

使用tinymind提供的功能可以非常有效地完成基于表的Q學(xué)習(xí)。我們不需要浮點(diǎn)或花哨的解釋性編程語言。可以使用C ++模板和定點(diǎn)數(shù)實例化Q學(xué)習(xí)器。克隆倉庫并親自嘗試示例！

翻譯自: https://medium.com/swlh/table-based-q-learning-in-under-1kb-3cc0b5b54b43