當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

3DMM-Fitting_Pytorch代码阅读

發布時間：2023/12/20 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 3DMM-Fitting_Pytorch代码阅读小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

convert_bfm_data.py

(transfer original BFM09 to our face model)
Scipy是世界上著名的Python開源科學計算庫，建立在Numpy之上。它增加的功能包括數值積分、最優化、統計和一些專用函數。 SciPy函數庫在NumPy庫的基礎上增加了眾多的數學、科學以及工程計算中常用的庫函數。例如線性代數、常微分方程數值求解、信號處理、圖像處理、稀疏矩陣等等。

BFM模型介紹官網

01_MorphableModel.mat（數據主體）
BFM模型由53490個3D頂點構成。也就是其shape/texture的數據長度為160470（53490*3），因為其排列方式如下：

shape: x_1, y_1, z_1, x_2, y_2, z_2, ..., x_{53490}, y_{53490}, z_{53490} texture: r_1, g_1, b_1, r_2, g_2, b_2, ..., r_{53490}, g_{53490}, b_{53490}

segbin：是segment binary，用熱點法標注屬于面部哪一部分。
不同恒等式的面可以由199個主分量組成線性組合。

% Generate a random head alpha = randn(msz.n_shape_dim, 1); beta = randn(msz.n_tex_dim, 1); shape = coef2object( alpha, model.shapeMU, model.shapePC, model.shapeEV ); tex = coef2object( beta, model.texMU, model.texPC, model.texEV ); 模型的使用步驟：加載模型 -> 形成一個隨機的頭 -> 渲染它 -> 保存成相應的格式

為了增加模型的靈活性，我們獨立處理面部的四個部分。每個部分定義在一個掩碼(每個頂點索引)中。(對應09_mask)

基本原理

目標shape或者texture都可以通過如下式子得到：
obj = average + pc * (coeficient .* pcVariance)
其中系數（coeficient）是變量，其余均是數據庫里的常量，其是一個199維（對應199個PC）的向量。
BFM數據集的使用可以參照該博客：BFM使用 - 獲取平均臉模型的68個特征點坐標

將臉部模型裁剪對齊臉部地標，其中只包含35709個頂點。

from scipy.io import loadmat,savemat import numpy as np from array import array# load expression basis def LoadExpBasis():n_vertex = 53215 #標準的bfm模型包含頂點個數為53490個.#表情系數(來自Exp_Pca.bin)只針對53215個頂點有參數#不含脖子的模型頂點個數為35709個.Expbin = open('BFM/Exp_Pca.bin','rb')#這里使用的是Exp_Pca.bin（表情基）文件里的表情系數, 這個表情系數是從 Facewarehouse 數據集中提取的，為29維，形狀參數為199維．exp_dim = array('i')#[python3.9新特性](https://docs.python.org/3/library/array.html) 'i'表示signed intexp_dim.fromfile(Expbin,1)#array.fromfile(f, n) Read n items (as machine values) from the file object f and append them to the end of the array.expMU = array('f') #floatexpPC = array('f')expMU.fromfile(Expbin,3*n_vertex)expPC.fromfile(Expbin,3*exp_dim[0]*n_vertex)expPC = np.array(expPC)expPC = np.reshape(expPC,[exp_dim[0],-1])expPC = np.transpose(expPC)expEV = np.loadtxt('BFM/std_exp.txt')return expPC,expEV# transfer original BFM09 to our face model def transferBFM09():original_BFM = loadmat('BFM/01_MorphableModel.mat')shapePC = original_BFM['shapePC'] # shape basisshapeEV = original_BFM['shapeEV'] # corresponding eigen valueshapeMU = original_BFM['shapeMU'] # mean facetexPC = original_BFM['texPC'] # texture basistexEV = original_BFM['texEV'] # eigen valuetexMU = original_BFM['texMU'] # mean texture#上面都是保存一些BFM形狀和紋理屬性expPC,expEV = LoadExpBasis()#保存BFM文件的表情屬性# transfer BFM09 to our face modelidBase = shapePC*np.reshape(shapeEV,[-1,199])idBase = idBase/1e5 # unify the scale to decimeteridBase = idBase[:,:80] # use only first 80 basisexBase = expPC*np.reshape(expEV,[-1,79])exBase = exBase/1e5 # unify the scale to decimeterexBase = exBase[:,:64] # use only first 64 basistexBase = texPC*np.reshape(texEV,[-1,199])texBase = texBase[:,:80] # use only first 80 basis# our face model is cropped align face landmarks which contains only 35709 vertex.# original BFM09 contains 53490 vertex, and expression basis provided by JuYong contains 53215 vertex.# thus we select corresponding vertex to get our face model.index_exp = loadmat('BFM/BFM_front_idx.mat')index_exp = index_exp['idx'].astype(np.int32) - 1 #starts from 0 (to 53215)index_shape = loadmat('BFM/BFM_exp_idx.mat')index_shape = index_shape['trimIndex'].astype(np.int32) - 1 #starts from 0 (to 53490)index_shape = index_shape[index_exp]idBase = np.reshape(idBase,[-1,3,80])idBase = idBase[index_shape,:,:]idBase = np.reshape(idBase,[-1,80])texBase = np.reshape(texBase,[-1,3,80])texBase = texBase[index_shape,:,:]texBase = np.reshape(texBase,[-1,80])exBase = np.reshape(exBase,[-1,3,64])exBase = exBase[index_exp,:,:]exBase = np.reshape(exBase,[-1,64])meanshape = np.reshape(shapeMU,[-1,3])/1e5meanshape = meanshape[index_shape,:]meanshape = np.reshape(meanshape,[1,-1])meantex = np.reshape(texMU,[-1,3])meantex = meantex[index_shape,:]meantex = np.reshape(meantex,[1,-1])# other info contains triangles, region used for computing photometric loss,# region used for skin texture regularization, and 68 landmarks index etc.other_info = loadmat('BFM/facemodel_info.mat')frontmask2_idx = other_info['frontmask2_idx']skinmask = other_info['skinmask']keypoints = other_info['keypoints']point_buf = other_info['point_buf']tri = other_info['tri']tri_mask2 = other_info['tri_mask2']# save our face modelsavemat('BFM/BFM_model_front.mat',{'meanshape':meanshape,'meantex':meantex,'idBase':idBase,'exBase':exBase,'texBase':texBase,'tri':tri,'point_buf':point_buf,'tri_mask2':tri_mask2\,'keypoints':keypoints,'frontmask2_idx':frontmask2_idx,'skinmask':skinmask})if __name__ == '__main__':transferBFM09()

fit.py

face_alignment庫

這篇講解很好
根據這里的源碼，可以看到get_landmarks_from_image函數的返回值。

def get_landmarks_from_image(self, image_or_path, detected_faces=None, return_bboxes=False,return_landmark_score=False):"""Predict the landmarks for each face present in the image.This function predicts a set of 68 2D or 3D images, one for each image present.If detect_faces is None the method will also run a face detector.Arguments:image_or_path {string or numpy.array or torch.tensor} -- The input image or path to it.Keyword Arguments:detected_faces {list of numpy.array} -- list of bounding boxes, one for each face foundin the image (default: {None})return_bboxes {boolean} -- If True, return the face bounding boxes in addition to the keypoints.return_landmark_score {boolean} -- If True, return the keypoint scores along with the keypoints.Return:result:1. if both return_bboxes and return_landmark_score are False, result will be:landmark2. Otherwise, result will be one of the following, depending on the actual value of return_* arguments.(landmark, landmark_score, detected_face)(landmark, None, detected_face)(landmark, landmark_score, None )"""

fit.py里面的代碼

#圖片的寬和高的獲取 h,w = orig_img.shape[:2] # 結合代碼可以看到，代碼這里的返回值為landmark tmp_lms = fa.get_landmarks_from_image(orig_img) #但這里得到是兩個array，第二個也沒用過，就暫時不探究了。第一個array的數據就是landmark點坐標，但是其是68*3的一個而且向量，對于第三列是什么我也還是不太清楚，前兩列就是橫縱坐標了。

單獨運行這行代碼時報錯

lms = fa.get_landmarks_from_image(orig_img)[0] print(lms[:,0])

TypeError: list indices must be integers or slices, not tuple
因為列表可以存放不同類型的數據，因此列表中每個元素的大小可以相同，也可以不同，也就不支持一次性讀取一列，即使是對于標準的二維數字列表。
用列表解析的方法讀取一列：

print( [ x[0] for x in lms ] )

Bounding box Regression bbox 邊框回歸

重新設置了一下邊框的大小

def pad_bbox(bbox, img_wh, padding_ratio=0.2):x1, y1, x2, y2 = bboxwidth = x2 - x1height = y2 - y1size_bb = int(max(width, height) * (1+padding_ratio))center_x, center_y = (x1 + x2) // 2, (y1 + y2) // 2x1 = max(int(center_x - size_bb // 2), 0)y1 = max(int(center_y - size_bb // 2), 0)size_bb = min(img_wh[0] - x1, size_bb)size_bb = min(img_wh[1] - y1, size_bb)return [x1, y1, x1+size_bb, y1+size_bb]

cv2.resize函數參考這里

cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) -> dst

參數說明：
src ：需要改變尺寸的圖像
dsize：目標圖像大小
dst：目標圖像
fx：w方向上的縮放比例
fy：h方向上的縮放比例
interpolation - 插值方法。共有5種：

有三點需要注意：

注意，dsize的形狀是(w,h)，而opencv讀取出來的圖像的形狀是(h,w)
當參數dsize不為0時，dst的大小為dsize；否則，由src的大小以及縮放比例fx和fy來決定；可以看出dsize和(fx,fy)兩者不能同時為0
因為dsize是沒有默認值的，所以必須指定，也即我們使用fx和fy來控制大小的時候必須設置dsize=(0,0)

import cv2img=cv2.imread("1.jpg") print(img.shape) #輸出：(1559, 924, 3)img_resize1=cv2.resize(img,dsize=(500,1000),interpolation=cv2.INTER_NEAREST) print(img_resize1.shape) #輸出：(1000, 500, 3)img_resize2=cv2.resize(img_resize1,dsize=(0,0),fx=0.5,fy=0.5,interpolation=cv2.INTER_NEAREST) print(img_resize2.shape) #輸出：(500, 250, 3)

關于代碼中這一部分的理解，這里出現了[None, …]

lms = lms[:, :2][None, ...] #由(68,2)維的數據變成了(1,68,2)的數據 lms = torch.tensor(lms, dtype=torch.float32).cuda() img_tensor = torch.tensor(cropped_img[None, ...], dtype=torch.float32).cuda()

Python中xx[:,None]是分開切片的意思
a[:,None]相當于調用a.getitem(slice(None, None, None), None)

省略號表示根據對應的ndim展開相應數量的冒號，如對于ndim=3，以下兩句等價

arr[..., 0]arr[:, :, 0]

a[None,…]相當于調用a.getitem(None, slice(None, None, None)) (代碼中lms是一個2維的數組) 這樣一處理就變成了三維的數組
python切片的介紹

[:,None]
None表示該維不進行切片，而是將該維整體作為數組元素處理。
所以，[:,None]的效果就是將二維數組按每行分割，最后形成一個三維數組

xx = np.array([[1,2,3],[4,5,6]]) xx[:,None] 輸出： array([[[1, 2, 3]],[[4, 5, 6]]])

load facemodel階段

很多數據集都是mat格式的標注信息，使用模塊scipy.io的函數loadmat和savemat可以實現Python對mat數據的讀寫。

scipy.io.loadmat(file_name, mdict=None, appendmat=True, **kwargs) scipy.io.savemat(file_name, mdict, appendmat=True, format='5', long_field_names=False, do_compression=False, oned_as='row')

計算機圖形學中的基本變換

torch.optim.Adam 方法的使用和參數的解釋

torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

參數：

params (iterable) – 待優化參數的iterable或者是定義了參數組的dict
lr (float, 可選) – 學習率（默認：1e-3）
betas (Tuple[float, float], 可選) – 用于計算梯度以及梯度平方的運行平均值的系數（默認：0.9，0.999）
eps (float, 可選) – 為了增加數值計算的穩定性而加到分母里的項（默認：1e-8）
weight_decay (float, 可選) – 權重衰減（L2懲罰）（默認: 0）

optimizer的方法

基本方法：
zero_grad() :清空所管理參數的梯度
step() :執行一步更新
add_param_group():添加參數組
state_dict() :獲取優化器當前狀態信息字典
load_state_dict() :加載狀態信息字典
pytorch特性:張量梯度不自動清零

sum(1) 求數組每一行的和，等價于 sum(axis=1)

tqdm模塊——進度條配置

Tqdm 是一個快速，可擴展的Python進度條，可以在 Python 長循環中添加一個進度提示信息，用戶只需要封裝任意的迭代器 tqdm(iterator)。
總之，它是用來顯示進度條的，很漂亮，使用很直觀（在循環體里邊加個tqdm），而且基本不影響原程序效率

import time from tqdm import tqdm from tqdm._tqdm import trangefor i in tqdm(range(100)):time.sleep(0.01)

關于with torch.no_grad():

在使用pytorch時，并不是所有的操作都需要進行計算圖的生成（計算過程的構建，以便梯度反向傳播等操作）。而對于tensor的計算操作，默認是要進行計算圖的構建的，在這種情況下，可以使用 with torch.no_grad():，強制之后的內容不進行計算圖構建。

model.train() model.eval() with torch.no_grad() 參考這里

with torch.no_grad是指停止自動求導
model.train()用于在訓練階段
model.eval()用在驗證和測試階段

他們的區別是對于Dropout和Batch Normlization層的影響。在train模式下，dropout網絡層會按照設定的參數p設置保留激活單元的概率（保留概率=p); batchnorm層會繼續計算數據的mean和var等參數并更新。在val模式下，dropout層會讓所有的激活單元都通過，而batchnorm層會停止計算和更新mean和var，直接使用在訓練階段已經學出的mean和var值。

models.py

最最重要的一個模塊了

pytorch3d Renderer模塊

可區分渲染通過允許2D圖像像素與場景的3D屬性相關聯來彌合2D和3D之間的差距。

例如，根據神經網絡預測的3D形狀來渲染一個圖像，其可能使用參考圖像來計算2D損失。反轉渲染步驟意味著我們可以將像素的2D損失與形狀的3D屬性(如網格頂點的位置)聯系起來，使3D形狀可以在沒有任何明確的3D監督的情況下學習。

paddle 填補一般在深度學習中都是將其填補成同一長度的序列

data loading and transformation / loss function / differentiable rendering

支持3d數據的異構批處理
第二層有力的支持了異構批處理
圖卷積

render模塊整體結構官網

Fragments
The rasterizer returns 4 output tensors in a named tuple.

pix_to_face: LongTensor of shape (N, image_size, image_size, faces_per_pixel) specifying the indices of the faces (in the packed faces) which overlap each pixel in the image.
zbuf: FloatTensor of shape (N, image_size, image_size, faces_per_pixel) giving the z-coordinates of the nearest faces at each pixel in world coordinates, sorted in ascending z-order.
bary_coords: FloatTensor of shape (N, image_size, image_size, faces_per_pixel, 3) giving the barycentric coordinates in NDC units of the nearest faces at each pixel, sorted in ascending z-order.
pix_dists: FloatTensor of shape (N, image_size, image_size, faces_per_pixel) giving the signed Euclidean distance (in NDC units) in the x/y plane of each point closest to the pixel.

渲染需要在幾個不同的坐標框架之間進行轉換:世界空間、視圖/攝像機空間、NDC空間和屏幕空間。在每一步中，重要的是要知道相機的位置，+X， +Y， +Z軸是如何對齊的，以及可能的值范圍。下圖概述了PyTorch3D使用的約定。
NDC全稱：Normalized Device Coordinates。

Pytorch3D和OpenGL的坐標系約定存在差異

get_render函數

源碼在 renderer/camera_utils.py // eye, at and up vectors represent its position.
from cameras import look_at_view_transform
eye, at, up = camera_to_eye_at_up(cam.get_world_to_view_transform())
R, T = look_at_view_transform(eye=eye, at=at, up=up)
只要根據同一個R T創造出來的相機，其觀察的視角都是一樣的

R, T = look_at_view_transform(10, 0, 0)

源碼在renderer/cameras.py
znear: near clipping plane of the view frustrum
zfar: far clipping plane of the view frustrum.
R: Rotation matrix of shape (N, 3, 3)
T: T: Translation matrix of shape (N, 3)
fov: field of view angle of the camera.
#其原來是OpenGLPerspectiveCameras，現在改成FoVPerspective
#FoVPerspectiveCameras是一個類，它存儲一批參數，通過指定視場來生成一批投影矩陣。
#下面這是初始化一個類的成員變量

cameras = FoVPerspectiveCameras(device=device, R=R, T=T, znear=0.01, zfar=50, fov=2*np.arctan(self.img_size//2/self.focal)*180./np.pi)

源碼在 renderer/lighting.py
PointLights也是一個類，這里也是初始化一個類的成員
ambient_color: RGB color of the ambient component
diffuse_color: RGB color of the diffuse component
specular_color: RGB color of the specular component
location: xyz position of the light.

lights = PointLights(device=device, location=[[0.0, 0.0, 1e5]], ambient_color=[[1, 1, 1]],specular_color=[[0., 0., 0.]], diffuse_color=[[0., 0., 0.]])

源碼在 render/mesh/rasterize_meshes.py

blur_radius:范圍[0,2]內的浮動距離，用于擴展面邊界框以進行柵格化。設置模糊半徑會使形狀周圍的邊緣模糊，而不是硬邊界。設置為0表示沒有模糊。
faces_per_pixel(可選):每個像素保存的面數，返回z軸上最近的faces_per_pixel點。
image_size:要光柵化的輸出圖像的像素大小。在非正方形圖像的情況下，可以選擇為(H, W)的元組。

raster_settings = RasterizationSettings(image_size=self.img_size, #設置輸出圖像的大小blur_radius=0.0, #因為只是為了可視化目的而渲染圖像，所以設置faces_per_pixel=1 和blur_radius = 0.0faces_per_pixel=1, #set the value of k)

class BlendParams(NamedTuple):
sigma: float = 1e-4
gamma: float = 1e-4
background_color: Union[torch.Tensor, Sequence[float]] = (1.0, 1.0, 1.0)

blend_params = blending.BlendParams(background_color=[0, 0, 0])

最后一步，將前面的參數組合起來，來初始化一個render by composing a rasterizer and a shader

renderer = MeshRenderer(rasterizer=MeshRasterizer(cameras=cameras,raster_settings=raster_settings),shader=SoftPhongShader(device=device,cameras=cameras,lights=lights,blend_params=blend_params))

Split_coeff函數

將傳過來的系數分別分開，最后分別返回這些系數：
return id_coeff, ex_coeff, tex_coeff, angles, gamma, translation

def Split_coeff(self, coeff):id_coeff = coeff[:, :80] # identity(shape) coeff of dim 80ex_coeff = coeff[:, 80:144] # expression coeff of dim 64tex_coeff = coeff[:, 144:224] # texture(albedo) coeff of dim 80angles = coeff[:, 224:227] # ruler angles(x,y,z) for rotation of dim 3gamma = coeff[:, 227:254] # lighting coeff for 3 channel SH function of dim 27translation = coeff[:, 254:] # translation coeff of dim 3return id_coeff, ex_coeff, tex_coeff, angles, gamma, translation

Shape_formation函數

只是用了身份和表情參數
einsum 愛因斯坦求和約定

def Shape_formation(self, id_coeff, ex_coeff):n_b = id_coeff.size(0)# 矩陣乘法face_shape = torch.einsum('ij,aj->ai', self.idBase, id_coeff) + \torch.einsum('ij,aj->ai', self.exBase, ex_coeff) + self.meanshape# 改變其維度face_shape = face_shape.view(n_b, -1, 3)face_shape = face_shape - self.meanshape.view(1, -1, 3).mean(dim=1, keepdim=True)return face_shape

Texture_formation函數

使用了紋理系數

def Texture_formation(self, tex_coeff):n_b = tex_coeff.size(0)face_texture = torch.einsum('ij,aj->ai', self.texBase, tex_coeff) + self.meantexface_texture = face_texture.view(n_b, -1, 3)return face_texture

Compute_norm函數

def Compute_norm(self, face_shape):face_id = self.tri.long() - 1 #tri是triangle三角形 long() 函數將數字或字符串轉換為一個長整型。point_id = self.point_buf.long() - 1 shape = face_shapev1 = shape[:, face_id[:, 0], :]v2 = shape[:, face_id[:, 1], :]v3 = shape[:, face_id[:, 2], :]e1 = v1 - v2e2 = v2 - v3face_norm = e1.cross(e2) #返回兩個（數組）向量的叉積。empty = torch.zeros((face_norm.size(0), 1, 3), dtype=face_norm.dtype, device=face_norm.device)face_norm = torch.cat((face_norm, empty), 1) v_norm = face_norm[:, point_id, :].sum(2) #torch.norm()函數#inputs的一共N維的話對就這N個數據求p范數#p指的是求p范數的p值，函數默認p=2，那么就是求2范數#inputs3 = inputs.norm(p=2, dim=1, keepdim=False)v_norm = v_norm / v_norm.norm(dim=2).unsqueeze(2) return v_norm

Compute_rotation_matrix

#靜態方法 #輸入angles 輸出rotation rotXYZ = torch.eye(3).view(1, 3, 3).repeat(n_b * 3, 1, 1).view(3, n_b, 3, 3)

Rigid_transform_block

輸入：face_shape, rotation, translation
輸出：face_shape_t
將臉部旋轉之后平移

def Rigid_transform_block(face_shape, rotation, translation):face_shape_r = torch.matmul(face_shape, rotation)face_shape_t = face_shape_r + translation.view(-1, 1, 3)return face_shape_t

Illumination_layer

輸入：face_texture, norm, gamma
輸出：face_color

get_lms

def get_lms(self, face_shape, kp_inds):lms = face_shape[:, kp_inds, :]return lms

Projection_block

輸入：face_shape
輸出：face_projection

pytorch函數

bmm函數
- 計算兩個tensor的矩陣乘法，torch.bmm(a,b),tensor a 的size為(b,h,w),tensor b的size為(b,w,h),注意兩個tensor的維度必須為3. 結果維度為: (b,h,h)
permute函數
- torch.Tensor.permute (Python method, in torch.Tensor)
- permute(dims) 將tensor的維度換位
clone函數
- orch 為了提高速度，向量或是矩陣的賦值是指向同一內存的，這不同于 Matlab。如果需要保存舊的tensor即需要開辟新的存儲地址而不是引用，可以用 clone() 進行深拷貝
stack函數
- stack（tensors,dim=0,out=None）
- dim=0時，將tensor在一維上連接
- dim=1時，將每個tensor的第i行按行連接組成一個新的2維tensor，再將這些新tensor按照dim=0的方式連接。
- dim=2時，將每個tensor的第i行轉置后按列連接組成一個新的2維tensor，再將這些新tesnor按照dim=0的方式連接
tile函數
- tile函數位于python模塊 numpy.lib.shape_base中，他的功能是重復某個數組。比如tile(A,n)，功能是將數組A重復n次，構成一個新的數組
clamp函數
- torch.clamp(input, min, max, out=None) → Tensor
- clamp（）函數的功能將輸入input張量每個元素的值壓縮到區間 [min,max]，并返回結果到一個新張量。

TexturesVertex

源代碼位置：renderer/mesh/textures.py
這里也是初始化一個類的成員

face_color = TexturesVertex(face_color)

Meshes

源代碼位置：structures/meshes.py

def __init__(self,verts=None,faces=None,textures=None,*,verts_normals=None,) -> None: mesh = Meshes(face_shape_t, tri.repeat(batch_num, 1, 1), face_color)

總結

以上是生活随笔為你收集整理的3DMM-Fitting_Pytorch代码阅读的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：【leetcode刷题笔记】Single
下一篇： Hadoop学习笔记—10.Shuffl