當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

一步步做一个数字手势识别APP

發布時間：2023/12/15 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了一步步做一个数字手势识别APP 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一步步做一個數字手勢識別APP

??這篇博客主要基于我做的一個數字手勢識別APP，具體分享下如何一步步訓練一個卷積神經網絡模型（CNN）模型，然后把模型集成到Android Studio中，開發一個數字手勢識別APP。整個project的源碼已經開源在github上，github地址：Chinese-number-gestures-recognition，歡迎star，哈哈。先說下這個數字手勢識別APP的功能：能夠識別做出的 0，1，2，3，4，5，6，7，8，9，10這11個手勢。

開發環境：TensorFlow-gpu1.8.0、NVIDIA GTX1070、keras2.1.6、Android Studio3.1.2、OpenCV3.4。

一、數據集的收集

??關于數據集，如果能找到現成的數據集那更好。但更多時候要自己去收集，我這里就是自己收集，這個真的要感謝我的好基友：蔣雯、宋俞璋、彭仲俊、張蒙、袁程、邢守一、鄭超，當然還有女票大人。他們幫助我共拍得了215張手勢的照片。有一點非常重要，我們在收集圖片的過程中給圖片命名，一定要在命名中體現圖片的標簽信息，懂點機器學習的都知道原因。比如，我的圖片命名規則如下：

??這么點照片想訓練模型簡直天方夜譚，只能祭出 data augmentation（數據增強）神器了，通過旋轉，平移，拉伸等操作每張圖片生成100張，這樣圖片就變成了21500張。下面是 data augmentation 的代碼：

from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img import osdatagen = ImageDataGenerator(rotation_range=20,width_shift_range=0.15,height_shift_range=0.15,zoom_range=0.15,shear_range=0.2,horizontal_flip=True,fill_mode='nearest') dirs = os.listdir("picture") print(len(dirs)) for filename in dirs:img = load_img("picture//{}".format(filename))x = img_to_array(img)# print(x.shape)x = x.reshape((1,) + x.shape) #datagen.flow要求rank為4# print(x.shape)datagen.fit(x)prefix = filename.split('.')[0]print(prefix)counter = 0for batch in datagen.flow(x, batch_size=4 , save_to_dir='generater_pic', save_prefix=prefix, save_format='jpg'):counter += 1if counter > 100:break # 否則生成器會退出循環

二、數據集的處理

1.縮放圖片

??接下來對這21500張照片進行處理，首先要把每張照片縮放到64*64的尺寸，這么做的原因如下：

不同手機拍出的照片的size各不相同，要統一
如果手機拍出來的高分辨率圖片，太大，GPU顯存有限，要壓縮下，減少體積。
APP通過手機攝像頭拍攝出來的照片，不同機型有差異，要統一。

對圖片的縮放不能簡單的直接縮小尺寸，那樣的話會失真嚴重。所以要用到一些縮放算法，TensorFlow中已經提供了四種縮放算法，分別為： 雙線性插值法（Bilinear interpolation）、最近鄰居法（Nearest neighbor interpolation）、雙三次插值法（Bicubic interpolation）和面積插值法（area interpolation）。我這里使用了面積插值法（area interpolation）。代碼為：

#壓縮圖片,把圖片壓縮成64*64的 def resize_img():dirs = os.listdir("split_pic//6")for filename in dirs:im = tf.gfile.FastGFile("split_pic//6//{}".format(filename), 'rb').read()# print("正在處理第%d張照片"%counter)with tf.Session() as sess:img_data = tf.image.decode_jpeg(im)image_float = tf.image.convert_image_dtype(img_data, tf.float32)resized = tf.image.resize_images(image_float, [64, 64], method=3)resized_im = resized.eval()# new_mat = np.asarray(resized_im).reshape(1, 64, 64, 3)scipy.misc.imsave("resized_img6//{}".format(filename),resized_im)

2.把圖片轉成 .h5文件

??h5文件的種種好處，這里不再累述。我們首先把圖片轉成RGB矩陣，即每個圖片是一個64*64*3的矩陣（因為是彩色圖片，所以通道是3）。這里不做歸一化，因為我認為歸一化應該在你用到的時候自己代碼歸一化，如果直接把數據集做成了歸一化，有點死板了，不靈活。在我們把矩陣存進h5文件時，此時標簽一定要對應每一張圖片（矩陣），直接上代碼：

#圖片轉h5文件 def image_to_h5():dirs = os.listdir("resized_img")Y = [] #labelX = [] #dataprint(len(dirs))for filename in dirs:label = int(filename.split('_')[0])Y.append(label)im = Image.open("resized_img//{}".format(filename)).convert('RGB')mat = np.asarray(im) #image 轉矩陣X.append(mat)file = h5py.File("dataset//data.h5","w")file.create_dataset('X', data=np.array(X))file.create_dataset('Y', data=np.array(Y))file.close()#test# data = h5py.File("dataset//data.h5","r")# X_data = data['X']# print(X_data.shape)# Y_data = data['Y']# print(Y_data[123])# image = Image.fromarray(X_data[123]) #矩陣轉圖片并顯示# image.show()

3.訓練模型

??接下來就是訓練模型了，首先把數據集劃分為訓練集和測試集，然后先坐下歸一化，把標簽轉化為one-hot向量表示，代碼如下：

#load dataset def load_dataset():#劃分訓練集、測試集data = h5py.File("dataset//data.h5","r")X_data = np.array(data['X']) #data['X']是h5py._hl.dataset.Dataset類型，轉化為arrayY_data = np.array(data['Y'])# print(type(X_data))X_train, X_test, y_train, y_test = train_test_split(X_data, Y_data, train_size=0.9, test_size=0.1, random_state=22)# print(X_train.shape)# print(y_train[456])# image = Image.fromarray(X_train[456])# image.show()# y_train = y_train.reshape(1,y_train.shape[0])# y_test = y_test.reshape(1,y_test.shape[0])print(X_train.shape)# print(X_train[0])X_train = X_train / 255. # 歸一化X_test = X_test / 255.# print(X_train[0])# one-hoty_train = np_utils.to_categorical(y_train, num_classes=11)print(y_train.shape)y_test = np_utils.to_categorical(y_test, num_classes=11)print(y_test.shape)return X_train, X_test, y_train, y_test

??構建CNN模型，這里用了最簡單的類LeNet-5，具體兩層卷積層、兩層池化層、一層全連接層，一層softmax輸出。具體的小trick有：dropout、relu、regularize、mini-batch、adam。具體看代碼吧：

def weight_variable(shape):tf.set_random_seed(1)return tf.Variable(tf.truncated_normal(shape, stddev=0.1))def bias_variable(shape):return tf.Variable(tf.constant(0.0, shape=shape))def conv2d(x, W):return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')def max_pool_2x2(z):return tf.nn.max_pool(z, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')def random_mini_batches(X, Y, mini_batch_size=16, seed=0):"""Creates a list of random minibatches from (X, Y)Arguments:X -- input data, of shape (input size, number of examples)Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)mini_batch_size - size of the mini-batches, integerseed -- this is only for the purpose of grading, so that you're "random minibatches are the same as ours.Returns:mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)"""m = X.shape[0] # number of training examplesmini_batches = []np.random.seed(seed)# Step 1: Shuffle (X, Y)permutation = list(np.random.permutation(m))shuffled_X = X[permutation]shuffled_Y = Y[permutation,:].reshape((m, Y.shape[1]))# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.num_complete_minibatches = math.floor(m / mini_batch_size) # number of mini batches of size mini_batch_size in your partitionningfor k in range(0, num_complete_minibatches):mini_batch_X = shuffled_X[k * mini_batch_size: k * mini_batch_size + mini_batch_size]mini_batch_Y = shuffled_Y[k * mini_batch_size: k * mini_batch_size + mini_batch_size]mini_batch = (mini_batch_X, mini_batch_Y)mini_batches.append(mini_batch)# Handling the end case (last mini-batch < mini_batch_size)if m % mini_batch_size != 0:mini_batch_X = shuffled_X[num_complete_minibatches * mini_batch_size: m]mini_batch_Y = shuffled_Y[num_complete_minibatches * mini_batch_size: m]mini_batch = (mini_batch_X, mini_batch_Y)mini_batches.append(mini_batch)return mini_batchesdef cnn_model(X_train, y_train, X_test, y_test, keep_prob, lamda, num_epochs = 450, minibatch_size = 16):X = tf.placeholder(tf.float32, [None, 64, 64, 3], name="input_x")y = tf.placeholder(tf.float32, [None, 11], name="input_y")kp = tf.placeholder_with_default(1.0, shape=(), name="keep_prob")lam = tf.placeholder(tf.float32, name="lamda")#conv1W_conv1 = weight_variable([5,5,3,32])b_conv1 = bias_variable([32])z1 = tf.nn.relu(conv2d(X, W_conv1) + b_conv1)maxpool1 = max_pool_2x2(z1) #max_pool1完后maxpool1維度為[?,32,32,32]#conv2W_conv2 = weight_variable([5,5,32,64])b_conv2 = bias_variable([64])z2 = tf.nn.relu(conv2d(maxpool1, W_conv2) + b_conv2)maxpool2 = max_pool_2x2(z2) #max_pool2,shape [?,16,16,64]#conv3 效果比較好的一次模型是沒有這一層，只有兩次卷積層，隱藏單元100，訓練20次# W_conv3 = weight_variable([5, 5, 64, 128])# b_conv3 = bias_variable([128])# z3 = tf.nn.relu(conv2d(maxpool2, W_conv3) + b_conv3)# maxpool3 = max_pool_2x2(z3) # max_pool3,shape [?,8,8,128]#full connection1W_fc1 = weight_variable([16*16*64, 200])b_fc1 = bias_variable([200])maxpool2_flat = tf.reshape(maxpool2, [-1, 16*16*64])z_fc1 = tf.nn.relu(tf.matmul(maxpool2_flat, W_fc1) + b_fc1)z_fc1_drop = tf.nn.dropout(z_fc1, keep_prob=kp)#softmax layerW_fc2 = weight_variable([200, 11])b_fc2 = bias_variable([11])z_fc2 = tf.add(tf.matmul(z_fc1_drop, W_fc2),b_fc2, name="outlayer")prob = tf.nn.softmax(z_fc2, name="probability")#cost functionregularizer = tf.contrib.layers.l2_regularizer(lam)regularization = regularizer(W_fc1) + regularizer(W_fc2)cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=z_fc2)) + regularizationtrain = tf.train.AdamOptimizer().minimize(cost)# output_type='int32', name="predict"pred = tf.argmax(prob, 1, output_type="int32", name="predict") # 輸出結點名稱predict方便后面保存為pb文件correct_prediction = tf.equal(pred, tf.argmax(y, 1, output_type='int32'))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))tf.set_random_seed(1) # to keep consistent resultsseed = 0init = tf.global_variables_initializer()with tf.Session() as sess:sess.run(init)for epoch in range(num_epochs):seed = seed + 1epoch_cost = 0.num_minibatches = int(X_train.shape[0] / minibatch_size)minibatches = random_mini_batches(X_train, y_train, minibatch_size, seed)for minibatch in minibatches:(minibatch_X, minibatch_Y) = minibatch_, minibatch_cost = sess.run([train, cost], feed_dict={X: minibatch_X, y: minibatch_Y, kp: keep_prob, lam: lamda})epoch_cost += minibatch_cost / num_minibatchesif epoch % 10 == 0:print("Cost after epoch %i: %f" % (epoch, epoch_cost))print(str((time.strftime('%Y-%m-%d %H:%M:%S'))))# 這個accuracy是前面的accuracy，tensor.eval()和Session.run區別很小train_acc = accuracy.eval(feed_dict={X: X_train[:1000], y: y_train[:1000], kp: 0.8, lam: lamda})print("train accuracy", train_acc)test_acc = accuracy.eval(feed_dict={X: X_test[:1000], y: y_test[:1000], lam: lamda})print("test accuracy", test_acc)#save modelsaver = tf.train.Saver({'W_conv1':W_conv1, 'b_conv1':b_conv1, 'W_conv2':W_conv2, 'b_conv2':b_conv2,'W_fc1':W_fc1, 'b_fc1':b_fc1, 'W_fc2':W_fc2, 'b_fc2':b_fc2})saver.save(sess, "model_500_200_c3//cnn_model.ckpt")#將訓練好的模型保存為.pb文件，方便在Android studio中使用output_graph_def = graph_util.convert_variables_to_constants(sess, sess.graph_def, output_node_names=['predict'])with tf.gfile.FastGFile('model_500_200_c3//digital_gesture.pb', mode='wb') as f: # ’wb’中w代表寫文件，b代表將數據以二進制方式寫入文件。f.write(output_graph_def.SerializeToString())

**這里有一個非常非常非常重要的事情，要注意，具體請參考上一篇博客中的 2. 模型訓練注意事項鏈接為：將TensorFlow訓練好的模型遷移到Android APP上（TensorFlowLite）。整個模型訓練幾個小時即可，當然調參更是門藝術活，不多說了。
??這里小小感慨下，i7-7700k跑一個epoch需要2分鐘，750ti需要36秒，1070需要6秒。。。這里再次感謝宋俞璋的神機。。關于如何搭建TensorFlow GPU環境，請參見我的博客：ubuntu16.04+GTX750ti+python3.6.5配置cuda9.0+cudnn7.05+TensorFlow-gpu1.8.0

訓練完的模型性能：

但是在APP上因為面臨的環境更加復雜，準備遠沒有這么高。

PC端隨便實測的效果圖：

4.在Android Studio中調用訓練好的模型

??關于如何把模型遷移到Android studio中，請參考我的上一篇博客：將TensorFlow訓練好的模型遷移到Android APP上（TensorFlowLite）。這里面解釋下為何會用到OpenCV，這一切都要源于那個圖片縮放，還記得我們在上面提到的area interpolation嗎，這個算法不像那些雙線性插值法等，網上并沒有java版本的實現，無奈去仔細翻了遍TensorFlow API文檔，發現這么一段話：

Each output pixel is computed by first transforming the pixel’s footprint into the input tensor and then averaging the pixels that intersect the footprint. An input pixel’s contribution to the average is weighted by the fraction of its area that intersects the footprint. This is the same as OpenCV’s INTER_AREA.

這就是為什么會用OpenCV了，OpenCV在Android studio中的配置也是坑多，具體的配置請參見我的博客：Android Studio中配置OpenCV。這里只說下，TensorFlowLite只提供了幾個簡單的接口，雖然在我的博客將TensorFlow訓練好的模型遷移到Android APP上（TensorFlowLite）也提過了，但是這里還是想提一下，提供的接口（官網地址：Here’s what a typical Inference Library sequence looks like on Android.）：

// Load the model from disk. TensorFlowInferenceInterface inferenceInterface = new TensorFlowInferenceInterface(assetManager, modelFilename);// Copy the input data into TensorFlow. inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);// Run the inference call. inferenceInterface.run(outputNames, logStats);// Copy the output Tensor back into the output array. inferenceInterface.fetch(outputName, outputs);

注釋也都說明了各個接口的作用，就不多說了。

??我也不知道是不是因為OpenCV里的area interpolation算法實現的和TensorFlow不一樣還是其他什么原因，總感覺在APP上測得效果要比在PC上模型性能差。。也許有可能只是我感覺。。
關于Android APP代碼也沒啥好說的了，代碼都放到github上了，地址：Chinese-number-gestures-recognition，歡迎star，哈哈。

下面上幾張測試的效果圖吧，更多的展示效果見github，：Chinese-number-gestures-recognition

總結

以上是生活随笔為你收集整理的一步步做一个数字手势识别APP的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python 读取图片转换为一维向量_对
下一篇： python矩阵和向量乘积_矩阵与向量的