當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

tensorflow 多人_使用TensorFlow2.x进行实时多人2D姿势估计

發(fā)布時間：2023/12/15 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了 tensorflow 多人_使用TensorFlow2.x进行实时多人2D姿势估计小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

tensorflow 多人

介紹 (Introduction)

As described by Zhe Cao in his 2017 Paper, Realtime multi-person 2D pose estimation is crucial in enabling machines to understand people in images and videos.

正如曹哲在其2017年的論文中所述，實時多人2D姿態(tài)估計對于使機器能夠理解圖像和視頻中的人物至關重要。

但是，什么是姿勢估計？ (However, what is the Pose Estimation?)

As the name suggests, it is a technique used to estimate how a person is physically positioned, such as standing, sitting, or lying down. One way to obtain this estimate is to find the 18 “joints of the body” or as named in the Artificial Intelligence field: “Key Points.” The images below show our goal, which is to find these points in an image:

顧名思義，它是一種用于估計人的身體位置(例如站立，坐著或躺下)的技術。獲得此估算值的一種方法是找到18個“人體關節(jié)”或在“人工智能”字段中命名為“關鍵點”。下圖顯示了我們的目標，即在圖像中找到這些點：

PhysicsWorld — Einstein in Oxford (1933)PhysicsWorld —愛因斯坦在牛津(1933)

The key points go from point #0 (Top neck) going down on body joints and returning to head, ending with point #17 (right ear).

關鍵點從＃0點 (上頸)向下延伸到身體關節(jié)并回到頭部，最后以＃17點 (右耳)結束。

The first significant work that appeared using the Artificial Intelligence-based approach was DeepPose, a 2014 paper by Toshev and Zegedy from Google. The paper proposed a human pose estimation method based on Deep Neural Networks (DNNs), where the pose estimation was formulated as a DNN-based regression problem towards body joints.

使用基于人工智能的方法出現(xiàn)的第一項重要工作是DeepPose ，這是Google的Toshev和Zegedy在2014年發(fā)表的論文。本文提出了一種基于深度神經(jīng)網(wǎng)絡(DNN)的人體姿勢估計方法，該方法將姿勢估計公式化為基于DNN的身體關節(jié)回歸問題。

The model consisted of an AlexNet backend (7 layers) with an extra final layer that outputs 2k joint coordinates. The significant problem with this approach is that first, a single person must be detected (classic object detection) following by the model application. So, each human body found on an image must be treated separately, which increases considerably the time to process the image. This type of approach is known as “top-down” because first find the bodies and from it, the joints associated with them.

該模型由一個AlexNet后端(7層)組成，該后端具有一個額外的最終層，可輸出2k關節(jié)坐標。這種方法的主要問題是，首先，模型應用程序必須檢測到一個人(經(jīng)典對象檢測)。因此，圖像上發(fā)現(xiàn)的每個人體都必須分開處理，這大大增加了處理圖像的時間。這種方法被稱為“自上而下”，因為首先找到物體，并從中找到與之關聯(lián)的關節(jié)。

姿勢估計面臨的挑戰(zhàn) (Challenges with Pose Estimation)

There are several problems related to Pose Estimation, as:

與姿勢估計有關的幾個問題如下：

Each image may contain an unknown number of people that can appear at any position or scale.

每個圖像可能包含未知數(shù)量的人，這些人可以在任何位置或任何比例出現(xiàn)。

Interactions between people induce complex spatial interference, due to contact, occlusion, or limb articulations, making association of parts difficult.

人與人之間的互動會由于接觸，咬合或肢體關節(jié)而引起復雜的空間干擾，從而使零件的關聯(lián)變得困難。

Runtime complexity tends to grow with the number of people in the image, making realtime performance a challenge.

運行時復雜度往往隨著映像中人數(shù)的增加而增加，這給實時性能帶來了挑戰(zhàn)。

To solve those problems, a more exciting approach (that is the one used on this project) is OpenPose, which was introduced in 2016 by ZheCao and his colleagues from the Robotics Institute at Carnegie Mellon University.

為了解決這些問題， OpenPose是一種更令人興奮的方法(該項目中使用的一種方法)，該方法由ZheCao和他的卡內(nèi)基梅隆大學機器人學院的同事于2016年引入。

OpenPose (OpenPose)

The proposed method of OpenPose uses a nonparametric representation, referred to as Part Affinity Fields (PAFs), to “connect” each finds body joints on an image, associating them with individual people. In other words, OpenPose does the opposite of DeepPose, first finding all the joints on an image and after going “up,” looking for the most probable body that will contain that joint without using any person detector (“bottom-up” approach). OpenPose finds the key points on an image regardless of the number of people on it. The below image, retrieved from OpenPose presentation at ILSVRC and COCO workshop 2016, give us an idea about the process.

提出的OpenPose方法使用非參數(shù)表示法，稱為部分親和力字段(PAF)，以“連接”圖像上的每個發(fā)現(xiàn)的身體關節(jié)，并將它們與各個人相關聯(lián)。換句話說，OpenPose的工作方式與DeepPose相反，首先查找圖像上的所有關節(jié)，然后“向上”，在不使用任何人體檢測器的情況下查找包含該關節(jié)的最有可能的物體(“自下而上”的方法) 。無論圖像上的人數(shù)多少，OpenPose都能找到圖像上的關鍵點。下圖取自ILSVRC和COCO研討會2016的OpenPose演示文稿，使我們對過程有了一個了解。

OpenPose presentation at ILSVRC and COCO workshop 2016OpenPose在ILSVRC和COCO研討會2016上的演講

The image below shows the architecture of the two-branch multi-stage CNN model used for training. First, a feed-forward network simultaneously predicts a set of 2D confidence maps (S) of body part locations (keypoints annotations from (dataset/COCO/annotations/) and a set of 2D vector fields of part affinities (L), which encode the degree of association between parts. After each stage, the two branches’ predictions, along with the image features, are concatenated for the next stage. Finally, the confidence maps and the affinity fields are parsed by greedy inference to output the 2D keypoints for all people in the image.

下圖顯示了用于訓練的兩分支多階段CNN模型的體系結構。首先，前饋網(wǎng)絡同時預測身體部位位置的一組2D置信圖(S)(來自(數(shù)據(jù)集/ COCO / annotations /)的關鍵點注釋和一組部位親和力(L)的2D矢量場，這些場在每個階段之后，將兩個分支的預測以及圖像特征連接到下一個階段，最后，通過貪婪推斷對置信度圖和親和度字段進行解析，以輸出2D關鍵點。圖片中的所有人。

2017 OpenPose Paper2017 OpenPose Paper

During the execution of the project, we will return to some of those concepts for clarification. However, it is highly recommended to follow the OpenPose ILSVRC and COCO workshop 2016 presentation and the video recording at CVPR 2017 for a better understanding.

在項目執(zhí)行期間，我們將返回一些概念進行說明。但是，強烈建議您遵循OpenPose ILSVRC和COCO研討會2016的介紹以及CVPR 2017的視頻記錄，以更好地理解。

TensorFlow 2 OpenPose安裝(tf-pose-estimation) (TensorFlow 2 OpenPose installation (tf-pose-estimation))

The original OpenPose was developed using the model-based VGG pre-trained network and using a Caffe framework. However, for this installation, we will follow Ildoo Kim TensorFlow approach as detailed on his tf-pose-estimation GitHub.

原始的OpenPose使用基于模型的VGG預訓練網(wǎng)絡和Caffe框架開發(fā) 。但是，對于此安裝，我們將遵循Ildoo Kim TensorFlow的方法，該方法在其tf-pose-estimation GitHub上進行了詳細介紹。

什么是tf姿勢估計？ (What is tf-pose-estimation?)

tf-pose-estimation is the ‘Openpose’, human pose estimation algorithm that has been implemented using Tensorflow. It also provides several variants that have some changes to the network structure for realtime processing on the CPU or low-power embedded devices.

tf-pose-estimation是使用Tensorflow實現(xiàn)的“ Openpose”人體姿勢估計算法。它還提供了幾種變體，這些變體對網(wǎng)絡結構進行了一些更改，以便在CPU或低功耗嵌入式設備上進行實時處理。

The tf-pose-estimation GitHub, shows several experiments with different models as:

tf-pose-estimation GitHub顯示了幾個使用不同模型的實驗，如下所示：

cmu: the model-based VGG pretrained network described in the original paper with weights in Caffe format converted to be used in TensorFlow.
cmu：原始論文中描述的基于模型的VGG預訓練網(wǎng)絡，其Caffe格式的權重已轉換為可在TensorFlow中使用。
dsconv: same architecture as the cmu version except for the depthwise separable convolution of mobilenet.
dsconv ：與cmu版本相同的體系結構，但移動網(wǎng)的深度可分離卷積除外。
mobilenet: based on the mobilenet V1 paper, 12 convolutional layers are used as feature-extraction layers.
mobilenet ：基于mobilenet V1的論文，使用12個卷積層作為特征提取層。
mobilenet v2: similar to mobilenet, but using an improved version of it.
mobilenet v2 ：類似于mobilenet，但使用了改進版本。

The studies on this article were done with mobilenet V1 (“mobilenet_thin”), that has an intermediary performance regarding computation budget and latency:

本文的研究是使用mobilenet V1(“ mobilenet_thin”)完成的，該軟件在計算預算和延遲方面具有中間性能：

https://github.com/ildoonet/tf-pose-estimation/blob/master/etcs/experiments.mdhttps://github.com/ildoonet/tf-pose-estimation/blob/master/etcs/experiments.md

第1部分-安裝tf-pose-estimation (Part 1 — Installing tf-pose-estimation)

We follow here, the excellent Gunjan Seth article Pose Estimation with TensorFlow 2.0.

我們在這里關注Gunjan Seth的優(yōu)秀文章TensorFlow 2.0的Pose Estimation 。

Go to terminal and create a working directory (for example, “Pose_Estimation”), moving to it :
轉到終端并創(chuàng)建一個工作目錄(例如，“ Pose_Estimation”)，然后移至該目錄：

mkdir Pose_Estimation
cd Pose_Estimation

Create a Virtual Environment (for example Tf2_Py37)
創(chuàng)建一個虛擬環(huán)境(例如Tf2_Py37)

conda create --name Tf2_Py37 python=3.7.6 -y
conda activate Tf2_Py37

Install TF2
安裝TF2

pip install --upgrade pip
pip install tensorflow

Install basic packages to be used during development:
安裝要在開發(fā)期間使用的基本軟件包：

conda install -c anaconda numpy
conda install -c conda-forge matplotlib
conda install -c conda-forge opencv

Clone tf-pose-estimation repository:
克隆tf-pose-estimation存儲庫：

git clone https://github.com/gsethi2409/tf-pose-estimation.git

Go to tf-pose-estimation folder and install the requirements
轉到tf-pose-estimation文件夾并安裝要求

cd tf-pose-estimation/
pip install -r requirements.txt

In the next step, install SWIG, an interface compiler that connects programs written in C and C++ with scripting languages such as Python. It works by taking the declarations found in C/C++ header files and using them to generate the wrapper code that scripting languages need to access the underlying C/C++ code.

在下一步中，安裝SWIG ，這是一個接口編譯器，它將用C和C ++編寫的程序與腳本語言(例如Python)連接起來。它通過獲取C / C ++頭文件中的聲明并使用它們來生成腳本語言訪問底層C / C ++代碼所需的包裝器代碼來工作。

conda install swig

Using Swig, build C++ library for post-processing.
使用Swig構建用于后期處理的C ++庫。

cd tf_pose/pafprocess
swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

Now, install tf-slim library, a lightweight library used for defining, training, and evaluating complex models in TensorFlow.

現(xiàn)在，安裝tf-slim 庫，一個輕量級的庫，用于定義，訓練和評估TensorFlow中的復雜模型。

pip install git+https://github.com/adrianc-a/tf-slim.git@remove_contrib

That is it! Now, it is essential to run a quick test. For that return to the main tf-pose-estimation directory.

這就對了！現(xiàn)在，必須進行快速測試。為此，返回主tf-pose-estimation目錄。

If you follow the sequence, you must be inside tf_pose/pafprocess. Otherwise use the appropriated command to change directory.

如果遵循該順序，則必須位于tf_pose / pafprocess之內(nèi)。否則，使用適當?shù)拿顏砀哪夸洝?

cd ../..

Inside tf-pose-estimation directory there is a python script run.py, let's run it, having as parameters:

在tf-pose-estimation目錄中，有一個python腳本run.py ，讓我們運行它作為參數(shù)：

model=mobilenet_thin
型號= mobilenet_thin
resize=432x368 (size of the image at pre-processing)
resize = 432x368(預處理時圖像的大小)
image=./images/ski.jpg (sample image inside images directory)
image =。/ images / ski.jpg(圖像目錄中的示例圖像)

python run.py --model=mobilenet_thin --resize=432x368 --image=./images/ski.jpg

Note that during a few seconds, nothing will happen, but after a minute or so, the terminal should present something similar to the below image:

請注意，在幾秒鐘之內(nèi)，什么都不會發(fā)生，但是在一分鐘左右之后，終端應該顯示類似于下圖的內(nèi)容：

However, more important, an image will appear on an independent OpenCV window:

但是，更重要的是，圖像將出現(xiàn)在獨立的OpenCV窗口中：

Great! The images are proof that everything was properly installed and working fine! We will enter in more detail in the next section. However, for a quick explanation about what the four images mean, the top-left (“Result”) is the pose detection skeleton drawn having the original image (in this case, ski.jpg) as background. The top-right image is a “heat map”, where the “parts detected” (Ss) are shown, and both bottom images show the part association (Ls). The “Result” is the connected S’s and L’s to individual persons.

大！這些圖像證明一切都已正確安裝并且可以正常工作！我們將在下一部分中更詳細地輸入。但是，為了快速解釋這四個圖像的含義，左上角(“結果”)是繪制的姿勢檢測骨架，該骨架以原始圖像(在這種情況下為ski.jpg)作為背景。右上方的圖像是“熱圖”，其中顯示了“檢測到的零件”(Ss)，兩個底部圖像都顯示了零件關聯(lián)(Ls)。 “結果”是將S和L連接到個人。

The next test is a live video:

下一個測試是現(xiàn)場視頻：

If the computer has only one camera installed, use: camera=0

如果計算機僅安裝了一個攝像頭，請使用：camera = 0

python run_webcam.py --model=mobilenet_thin --resize=432x368 --camera=1

If everything goes well, a window will appear with a real live video, like this screenshot:

如果一切順利，將出現(xiàn)一個帶有實時視頻的窗口，如以下屏幕截圖所示：

Image source: PrintScreen Author’s WebCam圖片來源：PrintScreen作者的網(wǎng)絡攝像頭

第2部分-深入了解圖像中的姿勢估計 (Part 2 — Going Deeper with Pose Estimation in Images)

In this section, we will go more in-depth with our TensorFlow Pose Estimation implementation. It is advised to follow the article, trying to reproduce Jupyter Notebook: 10_Pose_Estimation_Images, which can be downloaded from Project GitHub.

在本節(jié)中，我們將更深入地研究TensorFlow姿勢估計實現(xiàn)。建議遵循該文章，嘗試重現(xiàn)Jupyter Notebook： 10_Pose_Estimation_Images ，可以從GitHub項目下載。

As a reference, this project was 100% developed on a MacPro (2.9Hhz Quad-Core i7 16GB 2133Mhz RAM).

作為參考，該項目是在MacPro(2.9Hhz四核i7 16GB 2133Mhz RAM)上進行的100％開發(fā)。

導入庫 (Import Libraries)

import sys
import time
import logging
import numpy as np
import matplotlib.pyplot as plt
import cv2from tf_pose import common
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh

模型定義和TfPose估算器創(chuàng)建 (Model definition and TfPose Estimator creation)

It is possible to use the models located on model/graph sub-directory, as mobilenet_v2_large or cmu (VGG pretrained model).

可以將模型/圖形子目錄中的模型用作mobilenet_v2_large或cmu(VGG預訓練模型)。

For cmu, the *.pb files were not downloaded during installation, because they are significant in size. To use it, run the bash script download.sh that is located on /cmu sub-directory.

對于cmu，*。pb文件在安裝期間未下載，因為它們的大小很大。要使用它，請運行/ cmu子目錄中的bash腳本download.sh 。

This project uses mobilenet_thin (MobilenetV1), considering that all images used should be reshaped to 432x368.

考慮到所有使用的圖像都應重塑為432x368，因此該項目使用mobilenet_thin(MobilenetV1)。

Parameters:

參數(shù)：

model='mobilenet_thin'
resize='432x368'
w, h = model_wh(resize)

Create estimator:

創(chuàng)建估算器：

e = TfPoseEstimator(get_graph_path(model), target_size=(w, h))

Let us load a simple human image for ease analysis. OpenCV is used to read images. The images are stored as RGB, but internally, OpenCV works with BGR. Using OpenCV to show an image has no problem because it will be converted from BGR to RGB before image presentation on a specific window (as saw with ski.jpg on the previous section).

讓我們加載一個簡單的人像以便進行分析。 OpenCV用于讀取圖像。圖像存儲為RGB，但在內(nèi)部，OpenCV與BGR一起使用。使用OpenCV來顯示圖像沒有問題，因為在特定窗口上進行圖像顯示之前，它會從BGR轉換為RGB(如上一節(jié)中的ski.jpg所示)。

Once the image should be plotted on a Jupyter cell, Matplotlib will be used instead OpenCV. Because of that, the image should be converted before display, as shown below:

將圖像繪制在Jupyter單元上后，將使用Matplotlib代替OpenCV。因此，圖像應在顯示之前進行轉換，如下所示：

image_path = ‘./images/human.png’
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
plt.grid();

Observe that this image has a shape of 567x567. OpenCV when reading an image, automatically convert it to an array, where each value goes from 0 to 255, where 0=''white" and 255=''Black"'.

觀察到該圖像的形狀為567x567。讀取圖像時，OpenCV會自動將其轉換為數(shù)組，其中每個值從0到255，其中0 =“ white”和255 =“ Black”'。

Once the image is an array, it is simple to verify its size, using shape:

一旦圖像是一個數(shù)組，就可以很容易地使用形狀驗證其大小：

image.shape

The result will be (567, 567, 3), where the shape is (width, height, color channels).

結果將是(567，567，3)，其中形狀是(寬度，高度，顏色通道)。

Spite that the image can be read using OpenCV; we will use the function read_imgfile(image_path) from the library tf_pose.common to prevent any trouble with color channels.

盡管可以使用OpenCV讀取圖像；我們將使用tf_pose.common庫中的read_imgfile(image_path)函數(shù)來防止顏色通道出現(xiàn)任何問題。

image = common.read_imgfile(image_path, None, None)

Once we have the image as an array, we can apply the method inference to the estimator (e), having the image array as input (the image will be resized using the parameters w and h defined at principle).

一旦我們將圖像作為數(shù)組，就可以將方法推論應用于估計器(e)，將圖像數(shù)組作為輸入(將使用原則上定義的參數(shù)w和h調整圖像的大小)。

humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=4.0)

After running the above command, let us inspect the array e.heatmap. This array has a shape of (184, 216, 19), where 184 is h/2, 216 is w/2, and 19 is related to the probability of that specific pixel to belong to one of the 18 joints (0 to 17) + one (18: none). For example, inspecting the top-left pixel, a “none” should be expected:

運行上面的命令后，讓我們檢查數(shù)組e.heatmap。此數(shù)組的形狀為(184，216，19)，其中184為h / 2，216為w / 2，并且19與該特定像素屬于18個關節(jié)之一(0至17)的概率有關)+一(18：無)。例如，檢查左上角的像素，應該是“無”：

It is possible to verify the last value of this array

可以驗證此數(shù)組的最后一個值

which is the highest value of all; what can be understood that with 99.6% of chance, this pixel does not belong to any one of the 18 joints.

這是所有產(chǎn)品中的最高價值；可以理解，這個像素有99.6％的機會不屬于18個關節(jié)中的任何一個。

Let us try to find the base of the neck (midpoint between shoulders). It is located on the original picture around mid-width (0.5 * w = 108) and around 20% of height, starting top/down (0.2 * h = 37). So, let us inspect this specific pixel:

讓我們嘗試找到脖子的根部(肩膀之間的中點)。它位于原始圖片的中間寬度(0.5 * w = 108)和高度的20％左右，從上/下開始(0.2 * h = 37)。因此，讓我們檢查此特定像素：

It is easy to realize that position 1 has a maximum value of 0.7059… (or by calculating e.heatMat[37][108].max()), which means that that specific pixel has a 70% probability of being a “base neck.” The figure below shows all 18 COCO Keypoints (or “body joints”), showing that "1" corresponds to the "base neck".

很容易意識到位置1的最大值為0.7059…(或通過計算e.heatMat [37] [108] .max() )，這意味著特定像素有70％的概率成為“基礎”頸部。” 下圖顯示了所有18個COCO關鍵點(或“身體關節(jié)”)，顯示“ 1”對應于“基礎頸部”。

COCO keypoint format for human pose skeletons.用于人體姿勢骨架的COCO關鍵點格式。

It is possible to plot for every pixel, a color representing its maximum value. Doing that, a heat map, showing the key points will magically appear:

可以為每個像素繪制代表其最大值的顏色。這樣做，魔術圖將顯示關鍵點：

max_prob = np.amax(e.heatMat[:, :, :-1], axis=2)
plt.imshow(max_prob)
plt.grid();

Le us now plot the key points over the reshaped original image:

現(xiàn)在，我們將繪制的關鍵點繪制在重塑的原始圖像上：

plt.figure(figsize=(15,8))
bgimg = cv2.cvtColor(image.astype(np.uint8), cv2.COLOR_BGR2RGB)
bgimg = cv2.resize(bgimg, (e.heatMat.shape[1], e.heatMat.shape[0]), interpolation=cv2.INTER_AREA)
plt.imshow(bgimg, alpha=0.5)
plt.imshow(max_prob, alpha=0.5)
plt.colorbar()
plt.grid();

So, it is possible to see the keypoints (S’s) over the image, being the values shown at colorbar means that more yellow means higher probability.

因此，可以在圖像上看到關鍵點(S)，因為在顏色欄顯示的值表示黃色越多表示概率越高。

To get the L’s, the most probable connections (or “bones”) between the key points (or “joints”), we can use the resulted array of e.pafMat. This array has a shape of (184, 216, 38), where here the 38 (2 x 19) is related to the probability of that pixel be part of a horizontal (x) or vertical (y)connection with one of the 18 specific joints + nones.

為了獲得L，關鍵點(或“接頭”)之間最可能的連接(或“骨骼”)，我們可以使用e.pafMat的結果數(shù)組。該數(shù)組的形狀為(184，216，38)，其中38(2 x 19)與像素成為18個像素之一的水平(x)或垂直(y)連接的一部分的概率有關。特定關節(jié)+無。

The functions to plot the above figures are in the Notebook.

繪制以上圖形的功能在筆記本電腦中。

使用方法draw_human繪制骨架 (Draw the skeleton using method draw_human)

With the list human, resultant of e.inference() method, it is possible to draw the skeleton using method draw_human:

使用e.inference()方法的結果列表human ，可以使用方法draw_human繪制骨架：

image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)

The result will be below image:

結果將如下圖所示：

If desired, it is possible to plot only the skeleton, as shown here (let us rerun all code for a recap):

如果需要，可以僅繪制骨架，如下所示(讓我們重新運行所有代碼進行回顧)：

image = common.read_imgfile(image_path, None, None)
humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=4.0)
black_background = np.zeros(image.shape)
skeleton = TfPoseEstimator.draw_humans(black_background, humans, imgcopy=False)
plt.figure(figsize=(15,8))
plt.imshow(skeleton);
plt.grid();
plt.axis(‘off’);

獲取關鍵點(關節(jié))坐標 (Getting the Key points (Joints) coordinates)

Pose estimation can be used on a series of applications such as robotics, gaming, or medicine. For that, it could be interesting to get the physical keypoints coordinates from the image to be used by other applications.

姿勢估計可用于一系列應用程序，例如機器人技術，游戲或醫(yī)學。為此，從圖像中獲取物理關鍵點坐標以供其他應用程序使用可能會很有趣。

Looking at the human list resulted from e.inference(), it can be verified that it is a list with a single element, a string. In this string, every key point appears with its relative coordinate and associated probability. For example, for the human image used so far, we have:

查看由e.inference()生成的人員列表，可以驗證它是具有單個元素(字符串)的列表。在此字符串中，每個關鍵點均以其相對坐標和相關概率出現(xiàn)。例如，對于到目前為止使用的人類圖像，我們有：

For example:

例如：

BodyPart:0-(0.49, 0.09) score=0.79
BodyPart:1-(0.49, 0.20) score=0.75
...
BodyPart:17-(0.53, 0.09) score=0.73

We can extract an array (size of 18) from this list with the real coordinates related tothe original image shape:

我們可以從此列表中提取一個數(shù)組(大小為18)，并使用與原始圖像形狀相關的真實坐標：

keypoints = str(str(str(humans[0]).split('BodyPart:')[1:]).split('-')).split(' score=')keypts_array = np.array(keypoints_list)
keypts_array = keypts_array*(image.shape[1],image.shape[0])
keypts_array = keypts_array.astype(int)

Let us plot this array (being that the array’s index is the key point), over the original image. Here the result:

讓我們在原始圖像上繪制該數(shù)組(因為數(shù)組的索引是關鍵點)。結果如下：

plt.figure(figsize=(10,10))
plt.axis([0, image.shape[1], 0, image.shape[0]])
plt.scatter(*zip(*keypts_array), s=200, color='orange', alpha=0.6)
img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(img)
ax=plt.gca()
ax.set_ylim(ax.get_ylim()[::-1])
ax.xaxis.tick_top()
plt.grid();for i, txt in enumerate(keypts_array):
ax.annotate(i, (keypts_array[i][0]-5, keypts_array[i][1]+5)

創(chuàng)建函數(shù)以快速重現(xiàn)普通圖像的研究結果： (Creating Functions to reproduce the studies on generic images quickly:)

The Notebook shows all the code developed so far, “encapsulated” as functions. For example, let us see another image:

筆記本將到目前為止開發(fā)的所有代碼顯示為“封裝”為功能。例如，讓我們看到另一個圖像：

image_path = '../images/einstein_oxford.jpg'
img, hum = get_human_pose(image_path)
keypoints = show_keypoints(img, hum, color='orange')PhysicsWorld — Einstein in Oxford (1933)PhysicsWorld —愛因斯坦在牛津(1933) img, hum = get_human_pose(image_path, showBG=False)
keypoints = show_keypoints(img, hum, color='white', showBG=False)

與多人學習圖像 (Studying images with multiple persons)

So far, only was explored images that contain a single person. Once the algorithm was developed to capture all joints (S’s) and PAFs (L’s) at the same time from the image, finding the most probable connections was only for simplicity. So, the code to get the result is the same; only when we get the result (“human”), for example, the list will have a size compatible with the number of people in the image.

到目前為止，僅瀏覽了包含一個人的圖像。一旦開發(fā)了算法以從圖像中同時捕獲所有關節(jié)(S)和PAF(L)，發(fā)現(xiàn)最可能的連接只是為了簡化。因此，獲得結果的代碼是相同的；例如，僅當我們獲得結果(“人類”)時，列表的大小才與圖像中的人數(shù)兼容。

For example, let us use a “busy image” with five people on it:

例如，讓我們使用一個有五個人的“忙碌圖片”：

image_path = './images/ski.jpg'
img, hum = get_human_pose(image_path)
plot_img(img, axis=False)OpenPose — IEEE-2019OpenPose — IEEE-2019

The algorithm found all Ss and Ls associating them with the five people. The result is excellent!

該算法發(fā)現(xiàn)所有S和L將它們與這五個人相關聯(lián)。結果非常好！

From reading the image path to plotting the result, all the process took less than 0.5s, independent of the number of people found in the image.

從讀取圖像路徑到繪制結果，所有過程花費的時間少于0.5s，與圖像中發(fā)現(xiàn)的人數(shù)無關。

Let us complicate it and see an image where people are more “mixed” as a couple dancing:

讓我們復雜化它，并看到一個圖像，其中人們隨著情侶跳舞而更加“融合”：

image_path = '../images/figure-836178_1920.jpg
img, hum = get_human_pose(image_path)
plot_img(img, axis=False)PixabayPixabay

The result also seems very good. Let us plot only the keypoints, having a different color for each person:

結果似乎也很好。讓我們僅繪制關鍵點，每個人的顏色各不相同：

plt.figure(figsize=(10,10))
plt.axis([0, img.shape[1], 0, img.shape[0]])
plt.scatter(*zip(*keypoints_1), s=200, color='green', alpha=0.6)
plt.scatter(*zip(*keypoints_2), s=200, color='yellow', alpha=0.6)
ax=plt.gca()
ax.set_ylim(ax.get_ylim()[::-1])
ax.xaxis.tick_top()
plt.title('Keypoints of all humans detected\n')
plt.grid();

第3部分：視頻和實時攝像機中的姿勢估計 (Part 3: Pose Estimation in Videos and live camera)

The process of getting the pose estimation in videos is the same as we did with images because a video can be treated as a succession of images (frames). It is advised to follow the section, trying to reproduce Jupyter Notebook: 20_Pose_Estimation_Video which can be downloaded from Project GitHub.

在視頻中獲取姿勢估計的過程與處理圖像的過程相同，因為可以將視頻視為一系列圖像(幀)。建議遵循本節(jié)，嘗試重現(xiàn)Jupyter Notebook： 20_Pose_Estimation_Video ，可以從GitHub項目下載。

OpenCV does a fantastic job of handling videos.

OpenCV在處理視頻方面做得非常出色。

So, let us get a .mp4 video and inform OpenCV that we will capture its frames:

因此，讓我們獲取一個.mp4視頻，并通知OpenCV我們將捕獲其幀：

video_path = '../videos/dance.mp4
cap = cv2.VideoCapture(video_path)

Now let us create a loop that will capture each frame. Having the frame, we will apply e.inference(), and from the result, we will draw the skeleton, the same way as we did with images. A code at the end was included to stop the video when a key (‘q’, for example) is pressed.

現(xiàn)在讓我們創(chuàng)建一個將捕獲每個幀的循環(huán)。有了框架，我們將應用e.inference()，然后從結果中繪制骨骼，就像處理圖像一樣。末尾包含一個代碼，可在按下鍵(例如“ q”)時停止播放視頻。

Below the necessary code:

下面的必要代碼：

fps_time = 0while True:
ret_val, image = cap.read() humans = e.inference(image,
resize_to_default=(w > 0 and h > 0),
upsample_size=4.0)
if not showBG:
image = np.zeros(image.shape)
image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False) cv2.putText(image, "FPS: %f" % (1.0 / (time.time() - fps_time)), (10, 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('tf-pose-estimation result', image)
fps_time = time.time()
if cv2.waitKey(1) & 0xFF == ord('q'):
breakcap.release()
cv2.destroyAllWindows()tf-pose-estimation GitHubtf-pose-estimation GitHub上的視頻示例中的ScreenShot tf-pose-estimation GitHubtf-pose-estimation GitHub上的視頻示例中的ScreenShot

The result is fantastic, but a little slow. The movie that originally had around 30 FPS (Frames per second), will run here in “slow camera”, around 3 FPS.

結果很棒，但是有點慢。最初具有約30 FPS(每秒幀數(shù))的電影將在此處以3 FPS的“慢速相機”運行。

Here another experience where the movie was run twice, recording the pose estimated skeleton with and w/o the background video. The videos were manually synchronized, but if the result is not perfect, it is fascinating. I cut the last scene of the 1928 Chaplin movie “The Circus, “ where the way the Tramp walks is classic.

這是電影放映兩次的另一種體驗，它在沒有背景視頻的情況下錄制了估計姿態(tài)的骨架。視頻是手動同步的，但是如果效果不理想，那就很有趣。我剪輯了1928年卓別林電影《馬戲團》的最后一幕，流浪漢的走步是經(jīng)典之作。

使用實時攝像頭進行測試 (Testing with a live camera)

It is advised to follow the section, trying to reproduce Jupyter Notebook: 30_Pose_Estimation_Camera which can be downloaded from Project GitHub.

建議遵循本節(jié)，嘗試重現(xiàn)Jupyter Notebook： 30_Pose_Estimation_Camera ，可以從GitHub項目下載。

The code needed to run a live camera is almost the same as that used with video, except that the OpenCV videoCapture() method will receive as an input parameter an integer that refers to what real camera is used. For example, an internal camera uses “0” and an external “1”. Also the camera should be set to capture frames as ‘432x368’ as used by the model.

運行實時攝像頭所需的代碼與視頻所使用的代碼幾乎相同，不同之處在于OpenCV videoCapture()方法將接收一個整數(shù)，該整數(shù)表示使用的是真實攝像頭。例如，內(nèi)部攝像機使用“ 0”，外部攝像機使用“ 1”。此外，應將相機設置為捕捉模型使用的“ 432x368”幀。

Parameters initialization:

參數(shù)初始化：

camera = 1
resize = '432x368' # resize images before they are processed
resize_out_ratio = 4.0 # resize heatmaps before they are post-processed
model = 'mobilenet_thin'
show_process = False
tensorrt = False # for tensorrt processcam = cv2.VideoCapture(camera)
cam.set(3, w)
cam.set(4, h)

The loop part of the code should be very similar to the one used with video:

該代碼的循環(huán)部分應與視頻使用的部分非常相似：

while True:
ret_val, image = cam.read() humans = e.inference(image,
resize_to_default=(w > 0 and h > 0),
upsample_size=resize_out_ratio)
image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
cv2.putText(image, "FPS: %f" % (1.0 / (time.time() - fps_time)), (10, 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
cv2.imshow('tf-pose-estimation result', image)
fps_time = time.time()
if cv2.waitKey(1) & 0xFF == ord('q'):
breakcam.release()
cv2.destroyAllWindows()Image source: PrintScreen Author's WebCam圖片來源：PrintScreen作者的網(wǎng)絡攝像頭

Again, the standard video capture at 30 FPS, is reduced to around 10% when the algorithm is used. Here a full video where the delay can be better observed. However, the result is excellent!

同樣，使用該算法時，以30 FPS的標準視頻捕獲率可降低到10％左右。這是完整的視頻，可以更好地觀察延遲。但是，結果是極好的！

結論 (Conclusion)

As always, I hope this article can inspire others to find their way in the fantastic world of AI!

與往常一樣，我希望本文能夠激發(fā)其他人在夢幻般的AI世界中找到自己的路！

All the codes used in this article are available for download on project GitHub: TF2_Pose_Estimation

本文中使用的所有代碼都可以在項目GitHub上下載： TF2_Pose_Estimation

Regards from the South of the World!

南方的問候！

See you in my next article!

下一篇再見！

Thank you

謝謝

Marcelo

馬塞洛

翻譯自: https://towardsdatascience.com/realtime-multiple-person-2d-pose-estimation-using-tensorflow2-x-93e4c156d45f