NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)
NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)
Notice
The original text comes from?NVIDIA-AI Course. This article only provides Chinese translation.
?
?
?
目錄
Image Regression
Classification Vs. Regression ?分類與回歸
Continuous Outputs ?連續輸出
Changing The Final Layer ?改變最后一層
Evaluation ?評價
Face XY Project ?人臉坐標項目
Interactive Tool Startup Steps ?交互式工具啟動步驟
More Regression Projects ? ?更多的回歸項目
???????
?
?
?
?
Image Regression
正在更新……
Classification Vs. Regression ?分類與回歸
? ? ? ?Unlike Image Classification applications, which map image inputs to?discrete?outputs (classes), the Image Regression task maps the image input pixels to?continuous?outputs.
? ? ? ?與將圖像輸入映射到離散輸出(類)的圖像分類應用程序不同,圖像回歸任務將圖像輸入像素映射到連續輸出。
Continuous Outputs ?連續輸出
? ? ? In the course regression project, those continuous outputs happen to define the X and Y coordinates of various features on a face, such as a?nose. Mapping an image stream to a location for tracking can be used in other applications, such as following a line in mobile robotics. Tracking isn't the only thing a Regression model can do though. The output values could be something quite different such as steering values, or camera movement parameters.
? ? ? ?在課程回歸項目中,這些連續的輸出恰好定義了人臉(如鼻子)上各種特征的X和Y坐標。將圖像流映射到用于跟蹤的位置可以用于其他應用程序,比如在移動機器人中跟蹤一條線。跟蹤并不是回歸模型唯一能做的事情。輸出值可以是一些完全不同的東西,如轉向值,或相機運動參數。
Changing The Final Layer ?改變最后一層
? ? ? ?The final layer of the pre-trained ResNet-18 network is a fully connected (fc) layer that has 512 inputs mapped to 1000 output classes, or (512, 1000). Using transfer learning in the Image Classification projects, that last layer was changed to only a few classes, depending on the application. For example, if there are to be 3 classes trained, we change the fc layer to (512, 3). The output includes the final layer of the neural network as a fully connected layer, with 512 inputs mapped to 3 classes.
? ? ? ?預訓練的ResNet-18網絡的最后一層是完全連接(fc)層,其中有512個輸入映射到1000個輸出類(512,1000)。在圖像分類項目中使用遷移學習,根據應用程序的不同,最后一層只更改為幾個類。例如,如果要訓練3個類,我們將fc層更改為(52,3),輸出包括神經網絡的最后一層作為全連接層,其中512個輸入映射到3個類。
? ? ? ?In the case of a Regression project predicting coordinates, we want?two?values for each category, the X and Y values. That means twice as many outputs are required in the fc layer. For example, if there are 3 facial features (nose,?left_eye,?right_eye), each with both an X and Y output, then 6 outputs are required, or (512, 6) for the fc layer.
? ? ? ?對于預測坐標的回歸項目,我們希望每個類別都有兩個值,X和Y值。這意味著fc層需要兩倍的輸出。例如,如果有3個面部特征(鼻子、左眼、右眼),每個都有X和Y輸出,那么fc層需要6個輸出,或者(512,6)。
? ? ? ?In classification, recall that the softmax function was used to build a probability distribution of the output values. For regression, we want to keep the actual values, because we didn't train for probabilities, but for actual X and Y output values.
? ? ? ?在分類中,記得使用softmax函數來構建輸出值的概率分布。對于回歸,我們想要保留實際的值,因為我們沒有訓練概率,而是實際的X和Y的輸出值。
Evaluation ?評價
? ? ? ?Classification and Regression also differ in the way they are evaluated. The discrete values of classification can be evaluated based on accuracy, i.e. a calculation of the percentage of "right" answers. In the case of regression, we are interested in getting as close as possible to a correct answer. Therefore, the root mean squared error can be used.
? ? ? ?分類和回歸在評估方法上也有所不同。分類的離散值可以根據準確度來評估,即計算“正確”答案的百分比。在回歸的情況下,我們感興趣的是盡可能接近一個正確的答案。因此,可以使用均方根誤差。
?
Face XY Project ?人臉坐標項目
? ? ? The goal of this project is to build an Image Regression project that can predict the X and Y coordinates of a facial feature in a live image.
? ? ? 該項目的目標是建立一個圖像回歸項目,可以預測一個活圖像中面部特征的X和Y坐標。
Interactive Tool Startup Steps ?交互式工具啟動步驟
? ? ? ? You will implement the project by collecting your own data using a clickable image display tool, training a model to find the XY coordinates of the feature, and then testing and updating your model as needed using images from the live camera. Since you are collecting two values for each category, the model may require more training and data to get a satisfactory result.?
? ? ? ? 您將通過使用可單擊的圖像顯示工具收集您自己的數據來實現該項目,訓練一個模型來找到特性的XY坐標,然后根據需要使用來自live camera的圖像測試和更新您的模型。由于您為每個類別收集兩個值,因此模型可能需要更多的訓練和數據來獲得滿意的結果。
Be patient!?Building your model is an iterative process. ?要有耐心!構建模型是一個迭代過程。
Step 1: Open The Notebook ??第一步:打開筆記本
? ? ?To get started, navigate to the regression folder in your JupyterLab interface and double-click the?regression_interactive.ipynb?notebook to open it.
? ? ?首先,導航到JupyterLab界面中的regression文件夾,雙擊regression_interactive。ipynb筆記本打開它。
Step 2: Execute All Of The Code Blocks ??步驟2:執行所有代碼塊
? ? ?The notebook is designed to be reusable for any XY regression task you wish to build. Step through the code blocks and execute them one at a time.
? ? ?記事本的設計是可重用的任何XY回歸任務,您希望建立。遍歷代碼塊并一次執行一個。
This block sets the size of the images and starts the camera. If your camera is already active in this notebook or in another notebook, first shut down the kernel in the active notebook before running this code cell. Make sure that the correct camera type is selected for execution (USB or CSI). This cell may take several seconds to execute.
此塊設置圖像的大小并啟動相機。如果您的相機已經在本筆記本或其他筆記本中處于活動狀態,那么在運行此代碼單元之前,請先關閉活動筆記本中的內核。確保選擇正確的相機類型執行(USB或CSI)。此單元格可能需要幾秒鐘執行。
You get to define your?TASK?and?CATEGORIES?parameters here, as well as how many datasets you want to track. For the Face XY Project, this has already been defined for you as the?face?task with categories of?nose, left_eye, and right_eye. Each category for the XY regression tool will require both an X and Y values. Go ahead and execute the cell. Subdirectories for each category are created to store the example images you collect. The file names of the images will contain the XY coordinates that you tag the images with during the data collection step. This cell should only take a few seconds to execute.
您可以在這里定義任務和類別參數,以及要跟蹤的數據集的數量。對于Face XY項目,這已經為您定義為Face任務,包含nose、left_eye和right_eye類別。XY回歸工具的每個類別都需要X和Y值。繼續執行單元格。創建每個類別的子目錄來存儲您收集的示例圖像。圖像的文件名將包含在數據收集步驟中標記圖像所用的XY坐標。這個單元格只需要幾秒鐘就可以執行。
You’ll collect images for your categories with a special clickable image widget set up in this cell. As you click the “nose” or “eye” in the live feed image, the data image filename is automatically annotated and saved using the X and Y coordinates from the click.
您將使用這個單元格中設置的一個特殊的可單擊圖像小部件為您的類別收集圖像。當您單擊實時提要圖像中的“nose”或“eye”時,數據圖像文件名將使用單擊中的X和Y坐標自動注釋和保存。
The model is set to the same pre-trained ResNet18 model for this project:
模型設置為本項目相同的預訓練后的ResNet18模型:
model = torchvision.models.resnet18(pretrained=True)
For more information on available PyTorch pre-trained models, see the?PyTorch documentation. In addition to choosing the model, the last layer of the model is modified to accept only the number of classes that we are training for. In the case of the Face XY Project, it is twice the number of categories, since each requires both X and Y coordinates (i.e.?nose X,?nose Y,?left_eye X,?right_eye X?and?right_eye Y).
有關可用的PyTorch預培訓模型的更多信息,請參閱PyTorch文檔。除了選擇模型外,模型的最后一層被修改為只接受我們要培訓的類的數量。在Face XY項目中,它是類別數的兩倍,因為每個類別都需要X和Y坐標(即鼻子X,鼻子Y,左眼X,右眼X和右眼Y)。
output_dim = 2 * len(dataset.categories)
model.fc = torch.nn.Linear(512, output_dim)
This code cell may take several seconds to execute.
執行此代碼單元格可能需要幾秒鐘。
This code block sets up threading to run the model in the background so that you can view the live camera feed and visualize the model performance in real time. This cell should only take a few seconds to execute. For this project, circle blue circle will overlay the model prediction for the location of the feature selected.
此代碼塊設置線程在后臺運行模型,以便您可以查看實時攝像機提要并實時可視化模型性能。這個單元格只需要幾秒鐘就可以執行。對于這個項目,circle blue circle將覆蓋所選特征位置的模型預測。
The training code cell sets the hyper-parameters for the model training (number of epochs, batch size, learning rate, momentum) and loads the images for training or evaluation. The regression version is very similar to the simple classification training, though the loss is calculated differently. The mean square error over the X and Y value errors is calculated and used as the loss for backpropagation in training to improve the model. This code cell may take several seconds to execute.
訓練代碼單元設置模型訓練的超參數(epochs數、批大小、學習率、動量),并加載用于訓練或評估的圖像?;貧w版本與簡單分類訓練非常相似,只是計算損失的方法不同。通過計算X、Y值誤差的均方誤差,作為訓練中反向傳播的損失,對模型進行了改進。執行此代碼單元格可能需要幾秒鐘。
This is the last code cell. All that's left to do is pack all the widgets into one comprehensive tool and display it. This cell may take several seconds to run and should display the full tool for you to work with. There are three image windows. Initially, only the left camera feed is populated. The middle window will display the most recent annotated snapshot image once you start collecting data. The right-most window will display the live prediction view once the model has been trained.
這是最后一個代碼單元格。剩下要做的就是將所有小部件打包到一個全面的工具中并顯示它。這個單元格可能需要幾秒鐘的時間來運行,應該會顯示要使用的完整工具。有三個圖像窗口。最初,只填充左攝像機提要。一旦開始收集數據,中間的窗口將顯示最新的帶注釋快照圖像。一旦模型被訓練好,最右邊的窗口將顯示實時預測視圖。
Step 3: Collect Data, Train, Test ? ?第三步:收集數據,訓練,測試
? ? ? Position the camera in front of your face and collect initial data. Point to the target feature with the mouse cursor that matches the category you've selected (such as the?nose). Click to collect data. The annotated snapshot you just collected will appear in the middle display box. As you collect each image, vary your head position and pose:
? ? ? ?把相機放在你的臉前面,收集初始數據。用鼠標指針指向與您選擇的類別匹配的目標特性(例如鼻子)。單擊以收集數據。您剛剛收集的帶注釋的快照將出現在中間的顯示框中。當你收集每張圖片時,改變你的頭部位置和姿勢:
添加20張圖片,您的左眼臉與左t_eye類別選擇
Step 4: Improve Your Model ? ?第四步:改進你的模型
? ? ? Use the live inference as a guide to improve your model! The live feed shows the model's prediction. As you move your head, does the target circle correctly follow your nose (or left_eye, right_eye)? If not, then click the correct location and add data. After you've added some data for a new scenario, train the model some more. For example:
? ? ? 使用活動推理作為指導來改進您的模型!實時feed顯示了模型的預測。當你移動頭部時,目標圓是否正確地跟隨你的鼻子(或left t_eye, right_eye)?如果沒有,則單擊正確的位置并添加數據。在為新場景添加了一些數據之后,對模型進行更多的培訓。例如:
- Move the camera so that the face is closer. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
移動相機,讓臉更近。預測器的性能還好嗎?如果沒有,嘗試為每個類別添加一些數據(每個類別10個)并重新訓練(5個epochs)。這有幫助嗎?你可以試驗更多的數據和更多的訓練。 - Move the camera to provide a different background. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
移動相機以提供不同的背景。預測器的性能還好嗎?如果沒有,嘗試為每個類別添加一些數據(每個類別10個)并重新訓練(5個epochs)。這有幫助嗎?你可以試驗更多的數據和更多的訓練。 - Are there any other scenarios you think the model might not perform well? Try them out!
您是否認為模型還可能執行得不好?試一試! - Can you get a friend to try your model? Does it work the same? You know the drill: more data and training!
你能找個朋友試試你的模型嗎?工作原理一樣嗎?你知道這個練習:更多的數據和訓練!???????
Step 5: Save Your Model ? ?第五步:保存模型
? ? ? When you are satisfied with your model, save it by entering a name in the "model path" box and click "save model".
? ? ? 當您對您的模型感到滿意時,通過在“模型路徑”框中輸入一個名稱并單擊“保存模型”保存模型。
More Regression Projects ? ????????更多的回歸項目
? ? ? To build another project, follow the pattern you did with the Face Project. Save your previous work, modify the?TASK?and?CATEGORIES?values, shutdown and restart the notebook, and run all the cells. Then collect, train, and test!
? ? ? 要構建另一個項目,請遵循您對Face項目所做的模式。保存以前的工作,修改任務和類別值,關閉和重啟筆記本,并運行所有單元格。然后收集、訓練和測試!
?
?
?
總結
以上是生活随笔為你收集整理的NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: NVIDIA之AI Course:Get
- 下一篇: NVIDIA之AI Course:Get