當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Tensorflow—TFRecord文件生成与读取

發布時間：2023/12/20 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了 Tensorflow—TFRecord文件生成与读取小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Tensorflow—TFRecord文件生成與讀取

微信公眾號：幼兒園的學霸
個人的學習筆記，關于OpenCV,關于機器學習, …。問題或建議，請公眾號留言;

一.為什么使用TFRecord

關于 tensorflow 讀取數據，官網提供了3種方法:

Feeding：在tensorflow程序運行的每一步，用python代碼在線提供數據。
Reader ：在一個計算圖（tf.graph）的開始前，將文件讀入到流（queue）中。
在聲明tf.variable變量或numpy數組時保存數據。受限于內存大小，適用于數據較小的情況。

我們在剛學習Tensorflow時，幾乎所有的例子都是使用第一種或第三種方法，因為例子中的數據量都比較少，而當數據量比較大時，由于這些文件被散列存著，這樣不僅占用磁盤空間，并且在被一個個讀取的時候會非常慢，繁瑣，占用大量內存空間（有的大型數據不足以一次性加載），效率比較低。此時，第二種方法就會發揮巨大的作用，因此它存儲的是二進制文件，PC讀取二進制文件是比讀取格式文件要快的多。

TFRecords是TensorFlow中的設計的一種內置的文件格式，它是一種二進制文件。其具有以下優點：

統一不同輸入文件的框架。
它是更好的利用內存，更方便復制和移動。TFRecord壓縮的二進制文件采用protocal buffer序列化，只占用一個內存塊，只需要一次性加載一個二進制文件的方式即可，簡單，快速，尤其對大型訓練數據很友好。而且當我們的訓練數據量比較大的時候，可以將數據分成多個TFRecord文件，來提高處理效率。
是用于將二進制數據和標簽（訓練的類別標簽）數據存儲在同一個文件中

二.TFRecord文件生成

在將其他數據生成為TFRecords文件存儲的時候，需要經過兩個步驟：

建立TFRecord生成器(存儲器)
構造每個樣本的Example模塊

1.TFRecord生成器

writer = tf.python_io.TFRecordWriter(record_path) #for :writer.write(tf_example.SerializeToString()) #... writer.close()

此處的writer就是我們的TFRecord生成器，輸出參數record_path為我們將要生成的TFRecord文件的存儲路徑。
構建完畢TFRecord文件生成器后就可以調用生成器的write()方法向文件中寫入一個字符串記錄(即一個樣本),不斷的調用該方法以將每一個樣本存儲于生成器中，最后調用close()函數來關閉文件的寫操作。
其中writer.write()的參數為一個序列化的Example,通過Example.SerializeToString()來實現，它的作用是將Example中的map壓縮為二進制，節約大量空間。而Example是通過Example模塊生成的。

2.Example模塊

首先們來看一下Example協議塊是什么樣子的。

message Example {Features features = 1; };message Features {map<string, Feature> feature = 1; };message Feature {oneof kind {BytesList bytes_list = 1;FloatList float_list = 2;Int64List int64_list = 3;} };

從定義中可以看出tf.train.Example是以字典的形式存儲數據格式，string為字典的key值，字典的屬性值有三種類型：bytes、float、int64。詳解如下：
（1）tf.train.Example(features = None)

寫入tfrecords文件
features ： tf.train.Features類型的特征實例
return ： example協議格式塊

（2）tf.train.Features(feature = None)

構造每個樣本的信息鍵值對
feature : 字典數據，key為要保存的名字，value為tf.train.Feature實例
return ： Features類型

（3）tf.train.Feature(**options)
options可以選擇如下三種格式數據：

bytes_list = tf.train.BytesList(value = [Bytes])
int64_list = tf.train.Int64List(value = [Value])
float_list = tf.trian.FloatList(value = [Value])
那我們如何構造一個tf_example呢？下面有一個簡單的例子

def int64_feature(value):return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))def int64_list_feature(value):if not isinstance(value, collections.Iterable):value = [value]return tf.train.Feature(int64_list=tf.train.Int64List(value=value))def bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.tobytes()]))tf_example = tf.train.Example(#key-value形式features=tf.train.Features(feature={'image/image': bytes_feature(image),'image/shape': int64_list_feature(list(image.shape)),"bbox/xmins": int64_list_feature(xmins),"bbox/ymins": int64_list_feature(ymins),"bbox/xmaxs": int64_list_feature(xmaxs),"bbox/ymaxs": int64_list_feature(ymaxs),'image/classes': int64_list_feature(classes),}))

3.生成TFRecord文件完整代碼實例

代碼及圖片路徑：https://github.com/leonardohaig/yolov3_tensorflow/blob/master/generate_tfrecord.py

1)準備圖片文件夾存放圖片，此處我采用了小浣熊數據集
2)準備標簽文件，文件格式如下：

xxx/xxx.jpg 18.19,6.32,424.13,421.83,20 323.86,2.65,640.0,421.94,20 xxx/xxx.jpg 48,240,195,371,11 8,12,352,498,14 # image_path x_min, y_min, x_max, y_max, class_id x_min, y_min ,..., class_id

每一行表示圖像路徑，矩形框的左上頂點、右下頂點坐標，該矩形框類別矩形框的左上頂點、右下頂點坐標，該矩形框類別 …
這兩份文件分別為訓練集和驗證集。在該項目中已制作好，位于data/classes文件夾中，分別為data/classes/train_yoloTF.txt和test_yoloTF.txt
3)生成TFRecord文件，generate_tfrecord.py

import os import collections import sys import cv2import tensorflow as tfdef int64_feature(value):return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))def int64_list_feature(value):if not isinstance(value, collections.Iterable):value = [value]return tf.train.Feature(int64_list=tf.train.Int64List(value=value))def bytes_feature(value):return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value.tobytes()]))def create_tf_example(annotation):'''創建一條tf_example格式的數據:param annotation:list類型，一行label標簽，內容：圖片路徑，目標位置，類別，....:return:'''line = annotation.split()image_path = line[0]assert os.path.exists(image_path),'{} not exist !'.format(image_path)xmins = []ymins = []xmaxs = []ymaxs = []classes = []for content in line[1:]:content = list(map(int,content.split(','))) #將其轉換為int listxmins.append(content[0])ymins.append(content[1])xmaxs.append(content[2])ymaxs.append(content[3])classes.append(content[4])image = cv2.imread(image_path,cv2.IMREAD_UNCHANGED)image = cv2.resize(image, (413, 413), interpolation=cv2.INTER_LINEAR)tf_example = tf.train.Example(#key-value形式features=tf.train.Features(feature={'image/image': bytes_feature(image),'image/shape': int64_list_feature(list(image.shape)),"bbox/xmins": int64_list_feature(xmins),"bbox/ymins": int64_list_feature(ymins),"bbox/xmaxs": int64_list_feature(xmaxs),"bbox/ymaxs": int64_list_feature(ymaxs),'image/classes': int64_list_feature(classes),}))#print(tf_example)return tf_exampledef generate_tfrecord(labelFile, recordPath):''':param labelFile: label file 文件路徑:param recordPath: 創建的TFRecord文件存儲路徑:return:'''file_dir = os.path.dirname(os.path.abspath(recordPath))# 獲取當前文件所在目錄的絕對路徑assert os.path.exists(file_dir),'{} not exist !'.format(file_dir)with open(labelFile,'r') as file:# writer = tf.python_io.TFRecordWriter(recordPath)writer = tf.io.TFRecordWriter(recordPath)for line in file.readlines():# annotation = line.split('\n') # 去除末尾的'\n'tf_example = create_tf_example(line)writer.write(tf_example.SerializeToString())writer.close()return Trueif __name__ == '__main__':# 生成TFRecords文件generate_tfrecord('/home/liheng/PycharmProjects/yolov3_tensorflow/data/classes/test_yoloTF.txt','./test.tfrecord')

Note:大多數情況下圖片進行encode編碼保存在tfrecord時是一個一維張量，shape為(1,)，因此有必要將尺寸信息保存下來，以便于恢復圖片

三.TFRecord文件讀取

1.基本流程

文件讀取和文件創建的流程基本相同，只是中間多了一步解析過程。
1)將TFRecord文件test.record文件讀入到文件隊列中，如下所示：

filename_queue = tf.train.string_input_producer([tfrecords_filename])

使用tf.train.string_input_producer生成一個輸入文件隊列。這里我們的輸入列表文件只有一個[path]，而如果當訓練數據比較大時，就需要將數據拆分多個TFRecord文件來提高處理效率。
例如，Cifar10的例子中，將訓練集數據拆分為5個bin文件以提高文件處理效率，Cifar10例子使用下面方式獲取所有的訓練集輸入文件列表，而Tensorflow既然讓我們將訓練數據拆分為多個TFRecord文件，那么它也提供函數tf.train.match_filenames_once，通過正則表達式獲取某個目錄下的輸入文件列表。

filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)for i in xrange(1, 6)] filenames =tf.train.match_filenames_once（'data_batch_×')

2)通過TFRecordReader讀入生成的文件隊列

reader = tf.TFRecordReader()_, serialized_example = reader.read(filename_queue) #返回文件名和文件

3)通過解析器tf.parse_single_example將我們的example解析出來
當然，也可以采用tf.parse_example來解析，和tf.parse_single_example區別在于后者解析的是單個example.

2.代碼示例

代碼路徑：https://github.com/leonardohaig/yolov3_tensorflow/blob/master/generate_tfrecord.py

def read_tfrecord(batchsize, recordFileList):'''從TFRecords文件當中讀取圖片數據（解析example):param batchsize::param recordFileList: TFRecord file文件列表，list類型:return:'''assert isinstance(recordFileList, collections.Iterable),'param recordFileList need type list!'# 1.構造文件隊列filename_queue = tf.train.string_input_producer(recordFileList,num_epochs=None, shuffle=True) # 參數為文件名列表# 2.構造閱讀器reader = tf.TFRecordReader()_, serialized_example = reader.read(filename_queue) # 返回文件名和文件# 3.批處理,此處批處理提前放置batch = tf.train.shuffle_batch([serialized_example],batch_size=batchsize, capacity=batchsize * 5, min_after_dequeue=batchsize * 2,num_threads=1)# 4.解析協議塊,返回的值是字典.采用tf.parse_example,其返回的Tensor具有batch的維度_feature = {'image/image': tf.io.FixedLenFeature([], tf.string),'image/shape': tf.io.FixedLenFeature([3], dtype=tf.int64),'bbox/xmins': tf.io.VarLenFeature(dtype=tf.int64),'bbox/ymins': tf.io.VarLenFeature(dtype=tf.int64),'bbox/xmaxs': tf.io.VarLenFeature(dtype=tf.int64),'bbox/ymaxs': tf.io.VarLenFeature(dtype=tf.int64),'image/classes': tf.io.VarLenFeature(dtype=tf.int64)}features = tf.io.parse_example(batch,features=_feature)# 得到圖片shape信息image_shape = features['image/shape']# 處理圖片數據，由于是一個string,要進行解碼， #將字節轉換為數字向量表示，字節為一字符串類型的張量# 如果之前用了tostring(),那么必須要用decode_raw()轉換為最初的int類型# decode_raw()可以將數據從string,bytes轉換為int，float類型的image_raw = features['image/image']# Get the image as raw bytes.image_tensor = tf.decode_raw(image_raw, tf.uint8)# Decode the raw bytes so it becomes a tensor with type.# 轉換圖片的形狀，此處需要用動態形狀進行轉換image_tensor = tf.reshape(image_tensor,shape=[batchsize,image_shape[0][0],image_shape[0][1],image_shape[0][2]])image_tensor = tf.image.convert_image_dtype(image_tensor,dtype=tf.float32) # The type is now uint8 but we need it to be float.bbox_xmins = features['bbox/xmins']bbox_ymins = features['bbox/ymins']bbox_xmaxs = features['bbox/xmaxs']bbox_ymaxs = features['bbox/ymaxs']bbox_classes = features['image/classes']bbox_classes = tf.cast(bbox_classes,dtype=tf.int32)bbox_xmins = tf.sparse.to_dense(bbox_xmins)bbox_ymins = tf.sparse.to_dense(bbox_ymins)bbox_xmaxs = tf.sparse.to_dense(bbox_xmaxs)bbox_ymaxs = tf.sparse.to_dense(bbox_ymaxs)bbox_classes = tf.sparse.to_dense(bbox_classes)return image_tensor,bbox_xmins,bbox_ymins,bbox_xmaxs,bbox_ymaxs,bbox_classesif __name__ == '__main__':# # 生成TFRecords文件# generate_tfrecord('/home/liheng/PycharmProjects/yolov3_tensorflow/data/classes/test_yoloTF.txt',# './test.tfrecord')# 從已經存儲的TFRecords文件中解析出原始數據image_tensor, bbox_xmins, bbox_ymins, bbox_xmaxs, bbox_ymaxs, bbox_classes = read_tfrecord(4,['./test.tfrecord'])with tf.compat.v1.Session() as sess:sess.run(tf.compat.v1.global_variables_initializer())# 線程協調器coord = tf.train.Coordinator()# 開啟線程thread = tf.train.start_queue_runners(sess, coord)for i in range(5):_image_tensor, _bbox_xmins, _bbox_ymins, _bbox_xmaxs,\_bbox_ymaxs, _bbox_classes = sess.run([image_tensor,bbox_xmins,bbox_ymins,bbox_xmaxs,bbox_ymaxs,bbox_classes])print(i,_image_tensor.shape)#print(_bbox_xmins)cv2.imshow('image0', _image_tensor[0])cv2.imshow('image1', _image_tensor[1])cv2.waitKey(0)cv2.destroyAllWindows()# 回收線程coord.request_stop()coord.join(thread)

參考資料

1.Tensorflow(一) TFRecord生成與讀取.
2.TensorFlow基礎5：TFRecords文件的存儲與讀取講解及代碼實現
3.Tensorflow針對不定尺寸的圖片讀寫tfrecord文件總結

下面的是我的公眾號二維碼圖片，歡迎關注。

總結

以上是生活随笔為你收集整理的Tensorflow—TFRecord文件生成与读取的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： CentOS Redis安装报错：“Yo
下一篇： linux pkg文件,pkg文件结构详

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活随笔