當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

Python常用的模块的使用技巧

發(fā)布時(shí)間：2024/4/15 python 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python常用的模块的使用技巧小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Python常用模塊的使用技巧

?目錄

Python常用模塊的使用技巧

1.Python配置說明

（1）Python注釋說明

（2）函數(shù)說明

（3）ipynb文件轉(zhuǎn).py文件

（4）Python計(jì)算運(yùn)行時(shí)間

（5）鏡像加速方法

（6）代碼分析工具 Pylint安裝+pycharm下的配置

（7）Python添加環(huán)境路徑和搜索路徑的方法

（8）conda常用命令

2.常用的模塊

2.1 numpy模塊：

(1)矩陣的拼接和分割,奇偶項(xiàng)分割數(shù)據(jù)

(2)按照列進(jìn)行排序

(3)提取符合條件的某行某列

(4)查找符合條件的向量

(5)打亂順序

2.2 pickle模塊

2.3 random.shuffle產(chǎn)生固定種子

2.4 zip()與zip(*) 函數(shù)：

2.5 map、for快速遍歷方法：

2.6 glob模塊

2.7 os模塊

2.8?判斷圖像文件為空和文件不存，文件過小

2.9?保存多維array數(shù)組的方法

2.10讀取txt數(shù)據(jù)的方法

2.11 pandas模塊

（1）文件數(shù)據(jù)拼接

（2）DataFrame

Pandas DataFrame數(shù)據(jù)的增、刪、改、查

2.12 csv模塊

2.13?logging模塊

3. 數(shù)據(jù)預(yù)處理

3.1 數(shù)據(jù)（圖像）分塊處理

3.2 讀取圖片和顯示

（1）matplotlib.image、PIL.Image、cv2圖像讀取模塊

（2）將 numpy 數(shù)組轉(zhuǎn)換為 PIL 圖片：

（3）python中PIL.Image和OpenCV圖像格式相互轉(zhuǎn)換

（4）matplotlib顯示阻塞問題

（5）matplotlib繪制矩形框

3.3 one-hot獨(dú)熱編碼

3.4 循環(huán)產(chǎn)生batch數(shù)據(jù):

3.5 統(tǒng)計(jì)元素個(gè)數(shù)和種類

3.6 python 字典(dict)按鍵和值排序

3.7 自定義排序sorted

3.8 加載yml配置文件

3.9 移動(dòng)、復(fù)制、重命名文件?

3.10 產(chǎn)生batch_size的數(shù)據(jù)

1.Python配置說明

（1）Python注釋說明

? ? ? ?在pyCharm中File->Setting->Editor->File and Code Templates->Python Script:

# -*-coding: utf-8 -*- """@Project: ${PROJECT_NAME}@File : ${NAME}.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : ${YEAR}-${MONTH}-${DAY} ${HOUR}:${MINUTE}:${SECOND} """

（2）函數(shù)說明

def my_fun(para1,para2):'''函數(shù)功能實(shí)現(xiàn)簡介:param para1: 輸入?yún)?shù)說明，類型:param para2: 輸入?yún)?shù)說明,類型:return: 返回內(nèi)容，類型'''

（3）ipynb文件轉(zhuǎn).py文件

jupyter nbconvert --to script demo.ipynb

（4）Python計(jì)算運(yùn)行時(shí)間

import datetimedef RUN_TIME(deta_time):'''返回毫秒,deta_time.seconds獲得秒數(shù)=1000ms，deta_time.microseconds獲得微妙數(shù)=1/1000ms:param deta_time: ms:return:'''time_=deta_time.seconds * 1000 + deta_time.microseconds / 1000.0return time_T0 = datetime.datetime.now() # do something T1 = datetime.datetime.now()print("rum time:{}".format(RUN_TIME(T1-T0)))

（5）鏡像加速方法

TUNA 還提供了 Anaconda 倉庫的鏡像，運(yùn)行以下命令:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

conda config --set show_channel_urls yes

設(shè)置上述鏡像后，瞬間提速，但該鏡像僅限該命令窗口有效

windows 下在用戶目錄下面創(chuàng)建pip，然后創(chuàng)建pip.ini文件，把阿里的源復(fù)制進(jìn)去：

[global]

trusted-host=mirrors.aliyun.com

index-url = http://mirrors.aliyun.com/pypi/simple/

Linux下，修改 ~/.pip/pip.conf (沒有就創(chuàng)建一個(gè)文件夾及文件。文件夾要加“.”，表示是隱藏文件夾)

內(nèi)容如下：

[global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple [install] trusted-host=mirrors.aliyun.com

windows下，直接在user目錄中創(chuàng)建一個(gè)pip目錄，如：C:\Users\xx\pip，新建文件pip.ini。內(nèi)容同上。

臨時(shí)的方法:pip時(shí)加上"-i https://mirrors.aliyun.com/pypi/simple/":,如

pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/

（6）代碼分析工具 Pylint安裝+pycharm下的配置

代碼分析工具 Pylint安裝+pycharm下的配置 - oohy - 博客園

（7）Python添加環(huán)境路徑和搜索路徑的方法

添加環(huán)境路徑：

# 添加graphviz環(huán)境路徑 import os os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'

?搜索路徑：

import sys import os# 打印當(dāng)前python搜索模塊的路徑集 print(sys.path) # 打印當(dāng)前文件所在路徑 print("os.path.dirname(__file__):", os.path.dirname(__file__)) print("os.getcwd(): ", os.getcwd()) # get current work directory：cwd:獲得當(dāng)前工作目錄'''添加相關(guān)的路徑 sys.path.append(‘你的模塊的名稱’)。 sys.path.insert(0,’模塊的名稱’) ''' # 先添加image_processing所在目錄路徑 sys.path.append("F:/project/python-learning-notes/utils") # sys.path.append(os.getcwd()) # 再倒入該包名 import image_processing# os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'image_path = "F:/project/python-learning-notes/dataset/test_image/1.jpg" image = image_processing.read_image(image_path) image_processing.cv_show_image("image", image)

（8）conda常用命令

列舉當(dāng)前所有環(huán)境：conda info --envs 或者conda env list
生成一個(gè)environment.yml文件：conda env export > environment.yml
根據(jù)environment.yml文件安裝該環(huán)境：conda env create -f environment.yml
列舉當(dāng)前活躍環(huán)境下的所有包：conda list
參數(shù)某個(gè)環(huán)境：conda?remove?--name?your_env_name?--all

2.常用的模塊

2.1 numpy模塊：

(1)矩陣的拼接和分割,奇偶項(xiàng)分割數(shù)據(jù)

# 產(chǎn)生5*2的矩陣數(shù)據(jù) data1=np.arange(0,10) data1=data1.reshape([5,2])# 矩陣拼接 y = np.concatenate([data1, data2], 0)# 矩陣拼接 def cat_labels_indexMat(labels,indexMat):indexMat_labels = np.concatenate([labels,indexMat], axis=1)return indexMat_labels# 矩陣分割 def split_labels_indexMat(indexMat_labels,label_index=0):labels = indexMat_labels[:, 0:label_index+1] # 第一列是labelsindexMat = indexMat_labels[:, label_index+1:] # 其余是indexMatreturn labels, indexMatdef split_data(data):'''按照奇偶項(xiàng)分割數(shù)據(jù):param data: :return: '''data1 = data[0::2]data2 = data[1::2]return data1,data2 if __name__=='__main__':data = np.arange(0, 20)data = data.reshape([10, 2])data1,data2=split_data(data)print("embeddings:{}".format(data))print("embeddings1:{}".format(data1))print("embeddings2:{}".format(data2))

(2)按照列進(jìn)行排序

pair_issame = pair_issame[np.lexsort(pair_issame.T)]#按最后一列進(jìn)行排序

(3)提取符合條件的某行某列

假設(shè)有數(shù)據(jù):pair_issame:

如果想提取第三列的為"1"的數(shù)據(jù),可以這樣:

pair_issame_1 = pair_issame[pair_issame[:, -1] == "1", :] # 篩選數(shù)組

(4)查找符合條件的向量

import numpy as npdef matching_data_vecror(data, vector):'''從data中匹配vector向量，查找出現(xiàn)vector的index,如：data = [[1., 0., 0.],[0., 0., 0.],[2., 0., 0.],[0., 0., 0.],[0., 3., 0.],[0., 0., 4.]]# 查找data中出現(xiàn)[0, 0, 0]的indexdata = np.asarray(data)vector=[0, 0, 0]index =find_index(data,vector)print(index)>>[False True False True False False]# 實(shí)現(xiàn)去除data數(shù)組中元素為[0, 0, 0]的行向量pair_issame_1 = data[~index, :] # 篩選數(shù)組:param data::param vector::return:'''# index = (data[:, 0] == 0) & (data[:, 1] == 0) & (data[:, 2] == 0)row_nums = len(data)clo_nums = len(vector)index = np.asarray([True] * row_nums)for i in range(clo_nums):index = index & (data[:, i] == vector[i])return indexdef set_mat_vecror(data, index, vector):'''實(shí)現(xiàn)將data指定index位置的數(shù)據(jù)設(shè)置為vector# 實(shí)現(xiàn)將大于閾值分?jǐn)?shù)的point，設(shè)置為vector = [10, 10]point = [[0., 0.], [1., 1.], [2., 2.],[3., 3.], [4., 4.], [5., 5.]]point = np.asarray(point) # 每個(gè)數(shù)據(jù)點(diǎn)score = np.array([0.7, 0.2, 0.3, 0.4, 0.5, 0.6])# 每個(gè)數(shù)據(jù)點(diǎn)的分?jǐn)?shù)score_th=0.5index = np.where(score > score_th) # 獲得大于閾值分?jǐn)?shù)的所有下標(biāo)vector = [10, 10] # 將大于閾值的數(shù)據(jù)設(shè)置為vectorout = set_mat_vecror(point, index, vector):param data::param index::param vector::return:'''data[index, :] = vectorreturn data

(5)打亂順序

python numpy array random 隨機(jī)排列（打亂訓(xùn)練數(shù)據(jù)）_Song_Lynn的博客-CSDN博客_numpy 隨機(jī)排列

per = np.random.permutation(pair_issame_1.shape[0]) # 打亂后的行號pair_issame_1 = pair_issame_0[per, :] # 獲取打亂后的數(shù)據(jù)

2.2 pickle模塊

? ? pickle可以存儲(chǔ)什么類型的數(shù)據(jù)呢？

所有python支持的原生類型：布爾值，整數(shù)，浮點(diǎn)數(shù)，復(fù)數(shù)，字符串，字節(jié)，None。

由任何原生類型組成的列表，元組，字典和集合。

函數(shù)，類，類的實(shí)例

import pickle import numpy as npdef save_data(data, file):with open(file, 'wb') as f:pickle.dump(data, f)def load_data(file):with open(file, 'rb') as f:data = pickle.load(f)return data if __name__ == "__main__":data1 = ['aa', 'bb', 'cc'] # listdata1=np.asarray(data1) # ndarraydata_path = "data.pk"save_data(data1, data_path)data2 = load_data(data_path)print(data1)print(data2)

2.3 random.shuffle產(chǎn)生固定種子

files_list=...labels_list=...shuffle=Trueif shuffle:# seeds = random.randint(0,len(files_list)) #產(chǎn)生一個(gè)隨機(jī)數(shù)種子seeds = 100 # 固定種子,只要seed的值一樣，后續(xù)生成的隨機(jī)數(shù)都一樣random.seed(seeds)random.shuffle(files_list)random.seed(seeds)random.shuffle(labels_list)

**2.4 zip()與zip(*) 函數(shù)：**

? ? ?zip()?函數(shù)用于將可迭代的對象作為參數(shù)，將對象中對應(yīng)的元素打包成一個(gè)個(gè)元組，然后返回由這些元組組成的列表。如果各個(gè)迭代器的元素個(gè)數(shù)不一致，則返回列表長度與最短的對象相同，利用 * 號操作符，可以將元組解壓為列表。

zip 方法在 Python 2 和 Python 3 中的不同：在 Python 3.x 中為了減少內(nèi)存，zip() 返回的是一個(gè)對象。如需展示列表，需手動(dòng) list() 轉(zhuǎn)換。

a = [1,2,3] b = [4,5,6] c = [4,5,6,7,8] zipped = zip(a,b) # 打包為元組的列表 # 結(jié)果：[(1, 4), (2, 5), (3, 6)] zip(a,c) # 元素個(gè)數(shù)與最短的列表一致 # 結(jié)果：[(1, 4), (2, 5), (3, 6)] zip(*zipped) # 與 zip 相反，*zipped 可理解為解壓，返回二維矩陣式 # 結(jié)果：[(1, 2, 3), (4, 5, 6)]

2.5 map、for快速遍歷方法：

# 假設(shè)files_list為： files_list=['../training_data/test\\0.txt', '../training_data/test\\1.txt', '../training_data/test\\2.txt', '../training_data/test\\3.txt', '../training_data/test\\4.txt', '../training_data/test\\5.txt', '../training_data/test\\6.txt']# 下面的三個(gè)方法都是現(xiàn)實(shí)獲得files_list的文件名 files_nemes1=list(map(lambda s: os.path.basename(s),files_list)) files_nemes2=list(os.path.basename(i)for i in files_list) files_nemes3=[os.path.basename(i)for i in files_list]

2.6 glob模塊

? ? ?glob模塊是最簡單的模塊之一，內(nèi)容非常少。用它可以查找符合特定規(guī)則的文件路徑名。跟使用windows下的文件搜索差不多。查找文件只用到三個(gè)匹配符："*", "?", "[]"。"*"匹配0個(gè)或多個(gè)字符；"?"匹配單個(gè)字符；"[]"匹配指定范圍內(nèi)的字符，如：[0-9]匹配數(shù)字。

import glob #獲取指定目錄下的所有圖片 print glob.glob(r"E:\Picture\*\*.jpg") #獲取上級目錄的所有.py文件 print glob.glob(r'../*.py') #相對路徑

? ? ?對于遍歷指定目錄的jpg圖片,可以這樣：

# -*- coding:utf-8 -*- import glob #遍歷指定目錄下的jpg圖片 image_path="/home/ubuntu/TFProject/view-finding-network/test_images/*.jpg" for per_path in glob.glob(image_path):print(per_path)

? ? 若想遍歷多個(gè)格式的文件，可以這樣：

# 遍歷'jpg','png','jpeg'的圖片 image_format=['jpg','png','jpeg']#圖片格式 image_dir='./test_image' #圖片目錄 image_list=[] for format in image_format:path=image_dir+'/*.'+formatimage_list.extend(glob.glob(path)) print(image_list)

2.7 os模塊

import os os.getcwd()#獲得當(dāng)前工作目錄 os.path.abspath('.')#獲得當(dāng)前工作目錄 os.path.abspath('..')#獲得當(dāng)前工作目錄的父目錄 os.path.abspath(os.curdir)#獲得當(dāng)前工作目錄 os.path.join(os.getcwd(),'filename')#獲取當(dāng)前目錄，并組合成新目錄 os.path.exists(path)#判斷文件是否存在 os.path.isfile(path)#如果path是一個(gè)存在的文件，返回True。否則返回False。 os.path.basename('path/to/test.jpg')#獲得路徑下的文件名:test.jpg os.path.getsize(path) #返回文件大小，如果文件不存在就返回錯(cuò)誤 path=os.path.dirname('path/to/test.jpg')#獲得路徑：path/to os.sep#當(dāng)前操作系統(tǒng)的路徑分隔符，Linux/UNIX是‘/’,Windows是‘\\’ dirname='path/to/test.jpg'.split(os.sep)[-1]#獲得當(dāng)前文件夾的名稱“test.jpg” dirname='path/to/test.jpg'.split(os.sep)[-2]#獲得當(dāng)前文件夾的名稱“to”# 刪除該目錄下的所有文件 def delete_dir_file(dir_path):ls = os.listdir(dir_path)for i in ls:c_path = os.path.join(dir_path, i)if os.path.isdir(c_path):delete_dir_file(c_path)else:os.remove(c_path) # 若目錄不存在，則創(chuàng)建新的目錄（只能創(chuàng)建一級目錄） if not os.path.exists(out_dir):os.mkdir(out_dir)# 創(chuàng)建多級目錄 if not os.path.exists(segment_out_name):os.makedirs(segment_out_dir)# 刪除該目錄下的所有文件 delete_dir_file(out_dir) # 或者： shutil.rmtree(out_dir) # delete output folder

? ? ?下面是實(shí)現(xiàn)：【1】getFilePathList：獲取file_dir目錄下，所有文本路徑，包括子目錄文件，【2】get_files_list：獲得file_dir目錄下，后綴名為postfix所有文件列表，包括子目錄，? ?【3】gen_files_labels：獲取files_dir路徑下所有文件路徑，以及l(fā)abels,其中l(wèi)abels用子級文件名表示

# coding: utf-8 import os import os.path import pandas as pddef getFilePathList(file_dir):'''獲取file_dir目錄下，所有文本路徑，包括子目錄文件:param rootDir::return:'''filePath_list = []for walk in os.walk(file_dir):part_filePath_list = [os.path.join(walk[0], file) for file in walk[2]]filePath_list.extend(part_filePath_list)return filePath_listdef get_files_list(file_dir,postfix='ALL'):'''獲得file_dir目錄下，后綴名為postfix所有文件列表，包括子目錄:param file_dir::param postfix::return:'''postfix=postfix.split('.')[-1]file_list=[]filePath_list = getFilePathList(file_dir)if postfix=='ALL':file_list=filePath_listelse:for file in filePath_list:basename=os.path.basename(file) # 獲得路徑下的文件名postfix_name=basename.split('.')[-1]if postfix_name==postfix:file_list.append(file)file_list.sort()return file_listdef gen_files_labels(files_dir):'''獲取files_dir路徑下所有文件路徑，以及l(fā)abels,其中l(wèi)abels用子級文件名表示files_dir目錄下，同一類別的文件放一個(gè)文件夾，其labels即為文件的名:param files_dir::return:filePath_list所有文件的路徑,label_list對應(yīng)的labels'''filePath_list = getFilePathList(files_dir)print("files nums:{}".format(len(filePath_list)))# 獲取所有樣本標(biāo)簽label_list = []for filePath in filePath_list:label = filePath.split(os.sep)[-2]label_list.append(label)labels_set=list(set(label_list))print("labels:{}".format(labels_set))# 標(biāo)簽統(tǒng)計(jì)計(jì)數(shù)print(pd.value_counts(label_list))return filePath_list,label_listif __name__=='__main__':file_dir='JPEGImages'file_list=get_files_list(file_dir)for file in file_list:print(file)

實(shí)現(xiàn)遍歷dir目錄下,所有文件(包含子文件夾的文件)

# coding: utf-8 import os import os.pathdef get_files_list(dir):'''實(shí)現(xiàn)遍歷dir目錄下,所有文件(包含子文件夾的文件):param dir:指定文件夾目錄:return:包含所有文件的列表->list'''# parent:父目錄, filenames:該目錄下所有文件夾,filenames:該目錄下的文件名files_list=[]for parent, dirnames, filenames in os.walk(dir):for filename in filenames:# print("parent is: " + parent)# print("filename is: " + filename)# print(os.path.join(parent, filename)) # 輸出rootdir路徑下所有文件（包含子文件）信息files_list.append([os.path.join(parent, filename)])return files_list if __name__=='__main__':dir = 'images'files_list=get_files_list(dir)print(files_list)

下面是一個(gè)封裝好的get_input_list()函數(shù),path是文件夾,則遍歷所有png,jpg,jpeg等圖像文件,?path是txt文件路徑,則讀取txt中保存的文件列表(不要出現(xiàn)多余一個(gè)的空行),path是單個(gè)圖片文件:path/to/1.png。

# -*-coding: utf-8 -*- """@Project: hdrnet@File : my_test.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-08-28 14:30:51 """ import os import logging import relogging.basicConfig(format="[%(process)d] %(levelname)s %(filename)s:%(lineno)s | %(message)s") log = logging.getLogger("train") log.setLevel(logging.INFO)def get_input_list(path):'''返回所有圖片的路徑:param path:單張圖片的路徑,或文件夾,或者txt文件:return:'''regex = re.compile(".*.(png|jpeg|jpg|tif|tiff)")# path是文件夾,則遍歷所有png,jpg,jpeg等圖像文件# path/toif os.path.isdir(path):inputs = os.listdir(path)inputs = [os.path.join(path, f) for f in inputs if regex.match(f)]log.info("Directory input {}, with {} images".format(path, len(inputs)))# path是txt文件路徑,則讀取txt中保存的文件列表(不要出現(xiàn)多余一個(gè)的空行)# path/to/filelist.txtelif os.path.splitext(path)[-1] == ".txt":dirname = os.path.dirname(path)with open(path, 'r') as fid:inputs = [l.strip() for l in fid.readlines()]inputs = [os.path.join(dirname, im) for im in inputs]log.info("Filelist input {}, with {} images".format(path, len(inputs)))# path是單個(gè)圖片文件:path/to/1.pngelif regex.match(path):inputs = [path]log.info("Single input {}".format(path))return inputsif __name__ == '__main__':path='dataset/filelist.txt';result=get_input_list(path);print(result);

2.8?判斷圖像文件為空和文件不存，文件過小

def isValidImage(images_list,sizeTh=1000,isRemove=False):''' 去除不存的文件和文件過小的文件列表:param images_list::param sizeTh: 文件大小閾值,單位：字節(jié)B，默認(rèn)1000B:param isRemove: 是否在硬盤上刪除被損壞的原文件:return:'''i=0while i<len(images_list):path=images_list[i]# 判斷文件是否存在if not (os.path.exists(path)):print(" non-existent file:{}".format(path))images_list.pop(i)continue# 判斷文件是否為空if os.path.getsize(path)<sizeTh:print(" empty file:{}".format(path))if isRemove:os.remove(path)print(" info:----------------remove image:{}".format(path))images_list.pop(i)continue# 判斷圖像文件是否損壞try:Image.open(path).verify()except :print(" damaged image:{}".format(path))if isRemove:os.remove(path)print(" info:----------------remove image:{}".format(path))images_list.pop(i)continuei += 1return images_list

2.9?保存多維array數(shù)組的方法

? ?由于np.savetxt()不能直接保存三維以上的數(shù)組，因此需要轉(zhuǎn)為向量的形式來保存

import numpy as nparr1 = np.zeros((3,4,5), dtype='int16') # 創(chuàng)建3*4*5全0三維數(shù)組 print("維度：",np.shape(arr1)) arr1[0,:,:]=0 arr1[1,:,:]=1 arr1[2,:,:]=2 print("arr1=",arr1) # 由于savetxt不能保存三維以上的數(shù)組，因此需要轉(zhuǎn)為向量來保存 vector=arr1.reshape((-1,1)) np.savetxt("data.txt", vector)data= np.loadtxt("data.txt") print("data=",data) arr2=data.reshape(arr1.shape) print("arr2=",arr2)

2.10讀取txt數(shù)據(jù)的方法

這是封裝好的txt讀寫模塊，這里輸入和輸出的數(shù)據(jù)都是list列表：

# -*-coding: utf-8 -*- """@Project: TxtStorage@File : TxtStorage.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-07-12 17:32:47 """ from numpy import *class TxtStorage:# def __init__(self):def write_txt(self, content, filename, mode='w'):"""保存txt數(shù)據(jù):param content:需要保存的數(shù)據(jù),type->list:param filename:文件名:param mode:讀寫模式:'w' or 'a':return: void"""with open(filename, mode) as f:for line in content:str_line=""for col,data in enumerate(line):if not col == len(line) - 1:# 以空格作為分隔符str_line=str_line+str(data)+" "else:# 每行最后一個(gè)數(shù)據(jù)用換行符“\n”str_line=str_line+str(data)+"\n"f.write(str_line)def read_txt(self, fileName):"""讀取txt數(shù)據(jù)函數(shù):param filename:文件名:return: txt的數(shù)據(jù)列表:rtype: listPython中有三個(gè)去除頭尾字符、空白符的函數(shù)，它們依次為:strip：用來去除頭尾字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)lstrip：用來去除開頭字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)rstrip：用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)注意：這些函數(shù)都只會(huì)刪除頭和尾的字符，中間的不會(huì)刪除。"""txtData=[]with open(fileName, 'r') as f:lines = f.readlines()for line in lines:lineData = line.rstrip().split(" ")data=[]for l in lineData:if self.is_int(l): # isdigit() 方法檢測字符串是否只由數(shù)字組成,只能判斷整數(shù)data.append(int(l))elif self.is_float(l):#判斷是否為小數(shù)data.append(float(l))else:data.append(l)txtData.append(data)return txtDatadef is_int(self,str):# 判斷是否為整數(shù)try:x = int(str)return isinstance(x, int)except ValueError:return Falsedef is_float(self,str):# 判斷是否為整數(shù)和小數(shù)try:x = float(str)return isinstance(x, float)except ValueError:return Falseif __name__ == '__main__':txt_filename = 'test.txt'w_data = [['1.jpg', 'dog', 200, 300,1.0], ['2.jpg', 'dog', 20, 30,-2]]print("w_data=",w_data)txt_str = TxtStorage()txt_str.write_txt(w_data, txt_filename, mode='w')r_data = txt_str.read_txt(txt_filename)print('r_data=',r_data)

一個(gè)讀取TXT文本數(shù)據(jù)的常用操作：

# -*-coding: utf-8 -*- """@Project: TxtStorage@File : TxtStorage.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-07-12 17:32:47 """ from numpy import *def write_txt(content, filename, mode='w'):"""保存txt數(shù)據(jù):param content:需要保存的數(shù)據(jù),type->list:param filename:文件名:param mode:讀寫模式:'w' or 'a':return: void"""with open(filename, mode) as f:for line in content:str_line = ""for col, data in enumerate(line):if not col == len(line) - 1:# 以空格作為分隔符str_line = str_line + str(data) + " "else:# 每行最后一個(gè)數(shù)據(jù)用換行符“\n”str_line = str_line + str(data) + "\n"f.write(str_line)def read_txt(fileName):"""讀取txt數(shù)據(jù)函數(shù):param filename:文件名:return: txt的數(shù)據(jù)列表:rtype: listPython中有三個(gè)去除頭尾字符、空白符的函數(shù)，它們依次為:strip：用來去除頭尾字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)lstrip：用來去除開頭字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)rstrip：用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)注意：這些函數(shù)都只會(huì)刪除頭和尾的字符，中間的不會(huì)刪除。"""txtData = []with open(fileName, 'r') as f:lines = f.readlines()for line in lines:lineData = line.rstrip().split(" ")data = []for l in lineData:if is_int(l): # isdigit() 方法檢測字符串是否只由數(shù)字組成,只能判斷整數(shù)data.append(int(l))elif is_float(l): # 判斷是否為小數(shù)data.append(float(l))else:data.append(l)txtData.append(data)return txtDatadef is_int(str):# 判斷是否為整數(shù)try:x = int(str)return isinstance(x, int)except ValueError:return Falsedef is_float(str):# 判斷是否為整數(shù)和小數(shù)try:x = float(str)return isinstance(x, float)except ValueError:return Falsedef merge_list(data1,data2):'''將兩個(gè)list進(jìn)行合并:param data1::param data2::return:返回合并后的list'''if not len(data1)==len(data2):returnall_data=[]for d1,d2 in zip(data1,data2):all_data.append(d1+d2)return all_datadef split_list(data,split_index=1):'''將data切分成兩部分:param data: list:param split_index: 切分的位置:return:'''data1=[]data2=[]for d in data:d1=d[0:split_index]d2=d[split_index:]data1.append(d1)data2.append(d2)return data1,data2if __name__ == '__main__':txt_filename = 'test.txt'w_data = [['1.jpg', 'dog', 200, 300, 1.0], ['2.jpg', 'dog', 20, 30, -2]]print("w_data=", w_data)write_txt(w_data, txt_filename, mode='w')r_data = read_txt(txt_filename)print('r_data=', r_data)data1,data2=split_list(w_data)mer_data=merge_list(data1,data2)print('mer_data=', mer_data)

讀取以下txt文件，可使用以下方法：

test_image/dog/1.jpg 0 11 test_image/dog/2.jpg 0 12 test_image/dog/3.jpg 0 13 test_image/dog/4.jpg 0 14 test_image/cat/1.jpg 1 15 test_image/cat/2.jpg 1 16 test_image/cat/3.jpg 1 17 test_image/cat/4.jpg 1 18 def load_image_labels(test_files):'''載圖txt文件，文件中每行為一個(gè)圖片信息，且以空格隔開：圖像路徑標(biāo)簽1 標(biāo)簽1，如：test_image/1.jpg 0 2:param test_files::return:'''images_list=[]labels_list=[]with open(test_files) as f:lines = f.readlines()for line in lines:#rstrip：用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)content=line.rstrip().split(' ')name=content[0]labels=[]for value in content[1:]:labels.append(float(value))images_list.append(name)labels_list.append(labels)return images_list,labels_list

2.11 pandas模塊

（1）文件數(shù)據(jù)拼接

假設(shè)有'data1.txt', 'data2.txt', 'data3.txt'數(shù)據(jù)：

#'data1.txt' 1.jpg 11 2.jpg 12 3.jpg 13 #'data2.txt' 1.jpg 110 2.jpg 120 3.jpg 130 #'data3.txt' 1.jpg 1100 2.jpg 1200 3.jpg 1300

需要拼接成：

1.jpg 11 110 1100 2.jpg 12 120 1200 3.jpg 13 130 1300

實(shí)現(xiàn)代碼：

# coding: utf-8 import pandas as pddef concat_data(page,save_path):pd_data=[]for i in range(len(page)):content=pd.read_csv(page[i], dtype=str, delim_whitespace=True, header=None)if i==0:pd_data=pd.concat([content], axis=1)else:# 每一列數(shù)據(jù)拼接pd_data=pd.concat([pd_data,content.iloc[:,1]], axis=1)pd_data.to_csv(save_path, index=False, sep=' ', header=None)if __name__=='__main__':txt_path = ['data1.txt', 'data2.txt', 'data3.txt']out_path = 'all_data.txt'concat_data(txt_path,out_path)

（2）DataFrame

import pandas as pd import numpy as npdef print_info(class_name,labels):# index =range(len(class_name))+1index=np.arange(0,len(class_name))+1columns = ['class_name', 'labels']content = np.array([class_name, labels]).Tdf = pd.DataFrame(content, index=index, columns=columns) # 生成6行4列位置print(df) # 輸出6行4列的表格class_name=['C1','C2','C3'] labels=[100,200,300] print_info(class_name,labels)

Pandas DataFrame數(shù)據(jù)的增、刪、改、查

?Pandas DataFrame數(shù)據(jù)的增、刪、改、查_夏雨淋河的博客-CSDN博客_dataframe修改數(shù)據(jù)
?

import pandas as pd import numpy as npdf = pd.DataFrame(data = [['tom1','f',22],['tom2','f',22],['tom3','m',21]],index = [1,2,3],columns = ['name','sex','age'])#測試數(shù)據(jù)。 namesexage123

tom1	f	22
tom2	f	22
tom3	m	21

citys = ['shenzhen1','shenzhen2','shenzhen3'] df.insert(2,'city',citys) #在第2列，加上column名稱為city，值為citys的數(shù)值。 jobs = ['student','teacher','teacher'] df['job'] = jobs #默認(rèn)在df最后一列加上column名稱為job，值為jobs的數(shù)據(jù)。 df.loc[:,'salary'] = ['1k','2k','2k'] #在df最后一列加上column名稱為salary，值為等號右邊數(shù)據(jù)。 df

namesexcityagejobsalary123

tom1	f	shenzhen1	22	student	1k
tom2	f	shenzhen2	22	teacher	2k
tom3	m	shenzhen3	21	teacher	2k

#若df中沒有index為“4”的這一行的話，該行代碼作用是往df中加一行index為“4”，值為等號右邊值的數(shù)據(jù)。 #若df中已經(jīng)有index為“4”的這一行，則該行代碼作用是把df中index為“4”的這一行修改為等號右邊數(shù)據(jù)。 df.loc[4] = ['tom4','m','shenzhen4',24,"engineer",'3k'] df namesexcityagejobsalary1234

tom1	f	shenzhen1	22	student	1k
tom2	f	shenzhen2	22	teacher	2k
tom3	m	shenzhen3	21	teacher	2k
tom4	m	shenzhen4	24	engineer	3k

# 按照age的值進(jìn)行排序 df=df.sort_values(by=["age"],ascending=False) df namesexcityagejobsalary4123

tom4	m	shenzhen4	24	engineer	3k
tom1	f	shenzhen1	22	student	1k
tom2	f	shenzhen2	22	teacher	2k
tom3	m	shenzhen3	21	teacher	2k

2.12 csv模塊

? ? 使用csv模塊讀取csv文件的數(shù)據(jù)

# -*- coding:utf-8 -*- import csv csv_path='test.csv' with open(csv_path,'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader:#遍歷全部元素print(item)with open(csv_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader: # 遍歷全部元素print(item['filename'],item['class'],item.get('height'),item.get('width'))

?運(yùn)行結(jié)果：

{'filename': 'test01.jpg', 'height': '638', 'class': 'dog', 'width': '486'} {'filename': 'test02.jpg', 'height': '954', 'class': 'person', 'width': '726'} test01.jpg dog 638 486 test02.jpg person 954 726

讀寫過程：

import csvcsv_path = 'test.csv' #寫csv data=["1.jpg",200,300,'dog'] with open(csv_path, 'w+',newline='') as csv_file:# headers = [k for k in dictionaries[0]]headers=['filename','width','height', 'class']print(headers)writer = csv.DictWriter(csv_file, fieldnames=headers)writer.writeheader()dictionary={'filename': data[0],'width': data[1],'height': data[2],'class': data[3],}writer.writerow(dictionary)print(dictionary)#讀csv with open(csv_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader: # 遍歷全部元素print(item)with open(csv_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader: # 遍歷全部元素print(item['filename'], item['class'], item.get('height'), item.get('width'))

2.13?logging模塊

import logging# level級別：debug、info、warning、error以及critical# logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')logger = logging.getLogger(__name__)logger.debug("----1----")logger.info("----2----")logger.warning("----3----")logger.error("----4----")

3. 數(shù)據(jù)預(yù)處理

3.1 數(shù)據(jù)（圖像）分塊處理

import numpy as npdef split_cell(mat,cell=(3,3),stepsize=(1,1)):''':param mat:輸入單通道的圖像數(shù)據(jù)（可能有誤，需要驗(yàn)證）:param cell:塊大小:param stepsize: 步長stepsize<cell:return:'''rows,cols=np.shape(mat)Rx=cell[0]//2Ry=cell[1]//2stepX=stepsize[0]stepY=stepsize[1]dest=np.zeros(shape=(int((rows+stepX-1)/stepX),int((cols+stepY-1)/stepY)),dtype=np.float32)for i in range(0,rows,stepX):for j in range(0,cols,stepY):x1=i-Rxx2=i+Rxy1=j-Ry//坐標(biāo)有誤y2=j+Ry//x1=np.clip(x1,0,rows-1)x2=np.clip(x2,0,rows-1)y1=np.clip(y1,0,cols-1)y2=np.clip(y2,0,cols-1)#計(jì)算block的平均值block=mat[y1:(y2+1),x1:(x2+1)]m=np.mean(block)indexX=int((i+stepX-1)/stepX)#向上取整indexY=int((j+stepY-1)/stepY)dest[indexX,indexY]=m/255# dest=dest.reshape()return destdef split_block(mat,grid=(7,7)):rows,cols=gridblock_image=[]height,width = np.shape(mat)step_width = int(width / cols)step_height = int( height/ rows)for i in range(0,rows):for j in range(0,cols):x1 = j * step_widthx2=(j + 1) * step_widthy1 = i * step_heighty2=(i + 1) * step_heightblock=mat[y1:y2,x1:x2]#注意順序：mat[row,col]# fea=block_feature(block, feature_type="LBP")block_image.append(block)return block_imageif __name__=="__main__":data=np.arange(0,100)image=data.reshape((20,5))dest=split_block(image,cell=(3,3),stepsize=(1,1))

3.2 讀取圖片和顯示

? ? ?Python中讀取圖片和顯示圖片的方式很多，絕大部分圖像處理模塊讀取圖片的通道是RGB格式，只有opencv-python模塊讀取的圖片的BGR格式，如果采用其他模塊顯示opencv讀取的圖片，需要轉(zhuǎn)換通道順序，方法也比較簡單，即：

import cv2 import matplotlib.pyplot as plttemp_img=cv2.imread(image_path) #默認(rèn):BGR(不是RGB),uint8,[0,255],ndarry() cv2.imshow("opencv-python",temp_img5) cv2.waitKey(0) # b, g, r = cv2.split(temp_img5)# 將BGR轉(zhuǎn)為RGB格式 # img = cv2.merge([r, g, b]) # 推薦使用cv2.COLOR_BGR2RGB->將BGR轉(zhuǎn)為RGB格式 img = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB)plt.imshow(img) # 顯示圖片 plt.axis('off') # 不顯示坐標(biāo)軸 plt.show()

（1）matplotlib.image、PIL.Image、cv2圖像讀取模塊

# coding: utf-8 '''在Caffe中,彩色圖像的通道要求是BGR格式，輸入數(shù)據(jù)是float32類型,范圍[0,255],對每一層shape=(batch_size, channel_dim, height, width)。[1]caffe的訓(xùn)練/測試prototxt文件,一般在數(shù)據(jù)層設(shè)置:cale:0.00392156885937,即1/255.0,即將數(shù)據(jù)歸一化到[0,1][2]當(dāng)輸入數(shù)據(jù)為RGB圖像,float32,[0,1],則需要轉(zhuǎn)換:--transformer.set_raw_scale('data',255) # 縮放至0~255--transformer.set_channel_swap('data',(2,1,0))# 將RGB變換到BGR[3]當(dāng)輸入數(shù)據(jù)是RGB圖像,int8類型,[0,255],則輸入數(shù)據(jù)之前必須乘以*1.0轉(zhuǎn)換為float32--transformer.set_raw_scale('data',1.0) # 數(shù)據(jù)不用縮放了--transformer.set_channel_swap('data',(2,1,0))#將RGB變換到BGR--通道：img = img.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w][4]在Python所有讀取圖片的模塊,其圖像格式都是shape=[height, width, channels],比較另類的是,opencv-python讀取的圖片的BGR(caffe通道要求是BGR格式),而其他模塊是RGB格式 '''import numpy as np import matplotlib.pyplot as pltimage_path = 'test_image/C0.jpg'#C0.jpg是高h(yuǎn)=400,寬w=200 # 1.caffe import caffeimg1 = caffe.io.load_image(image_path) # 默認(rèn):RGB,float32,[0-1],ndarry,shape=[400,200,3]# 2.skimage import skimage.ioimg2 = skimage.io.imread(image_path) # 默認(rèn):RGB,uint8,[0,255],ndarry,shape=[400,200,3] # img2=img2/255.0# 3.matplotlib import matplotlib.imageimg3 = matplotlib.image.imread(image_path) # 默認(rèn):RGB,uint8,[0,255],ndarry,shape=[400,200,3]# 4.PIL from PIL import Imagetemp_img4 = Image.open(image_path) # 默認(rèn):RGB,uint8,[0,255], # temp_img4.show() #會(huì)調(diào)用系統(tǒng)自定的圖片查看器顯示圖片 img4 = np.array(temp_img4) # 轉(zhuǎn)為ndarry類型,shape=[400,200,3]# 5.opencv import cv2temp_img5 = cv2.imread(image_path) # 默認(rèn):BGR(不是RGB),uint8,[0,255],ndarry,shape=[400,200,3] # cv2.imshow("opencv-python",temp_img5) # cv2.waitKey(0) # b, g, r = cv2.split(temp_img5)# 將BGR轉(zhuǎn)為RGB格式 # img5 = cv2.merge([r, g, b]) # 推薦使用cv2.COLOR_BGR2RGB->將BGR轉(zhuǎn)為RGB格式 img5 = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB) img6 = img5.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w]# 以上ndarry類型圖像數(shù)據(jù)都可以用下面的方式直接顯示 plt.imshow(img5) # 顯示圖片 plt.axis('off') # 不顯示坐標(biāo)軸 plt.show()

? ? 封裝好的圖像讀取和保存模塊：

import matplotlib.pyplot as plt import cv2def show_image(title, image):'''顯示圖片:param title: 圖像標(biāo)題:param image: 圖像的數(shù)據(jù):return:'''# plt.figure("show_image")# print(image.dtype)plt.imshow(image)plt.axis('on') # 關(guān)掉坐標(biāo)軸為 offplt.title(title) # 圖像題目plt.show()def show_image_rect(win_name, image, rect):plt.figure()plt.title(win_name)plt.imshow(image)rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')plt.gca().add_patch(rect)plt.show()def read_image(filename, resize_height, resize_width,normalization=False):'''讀取圖片數(shù)據(jù),默認(rèn)返回的是uint8,[0,255]:param filename::param resize_height::param resize_width::param normalization:是否歸一化到[0.,1.0]:return: 返回的圖片數(shù)據(jù)'''bgr_image = cv2.imread(filename)if len(bgr_image.shape)==2:#若是灰度圖則轉(zhuǎn)為三通道print("Warning:gray image",filename)bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)#將BGR轉(zhuǎn)為RGB# show_image(filename,rgb_image)# rgb_image=Image.open(filename)if resize_height>0 and resize_width>0:rgb_image=cv2.resize(rgb_image,(resize_width,resize_height))rgb_image=np.asanyarray(rgb_image)if normalization:# 不能寫成:rgb_image=rgb_image/255rgb_image=rgb_image/255.0# show_image("src resize image",image)return rgb_imagedef save_image(image_path,image):plt.imsave(image_path,image)

（2）將 numpy 數(shù)組轉(zhuǎn)換為 PIL 圖片：

這里采用 matplotlib.image 讀入圖片數(shù)組，注意這里讀入的數(shù)組是 float32 型的，范圍是 0-1，而 PIL.Image 數(shù)據(jù)是 uinit8 型的，范圍是0-255，所以要進(jìn)行轉(zhuǎn)換：

import matplotlib.image as mpimg from PIL import Image lena = mpimg.imread('lena.png') # 這里讀入的數(shù)據(jù)是 float32 型的，范圍是0-1 im = Image.fromarray(np.uinit8(lena*255)) im.show()

（3）python中PIL.Image和OpenCV圖像格式相互轉(zhuǎn)換

PIL.Image轉(zhuǎn)換成OpenCV格式：

import cv2 from PIL import Image import numpyimage = Image.open("plane.jpg") image.show() img = cv2.cvtColor(numpy.asarray(image),cv2.COLOR_RGB2BGR) cv2.imshow("OpenCV",img) cv2.waitKey()

OpenCV轉(zhuǎn)換成PIL.Image格式：

import cv2 from PIL import Image import numpyimg = cv2.imread("plane.jpg") cv2.imshow("OpenCV",img) image = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2RGB)) image.show() cv2.waitKey()

判斷圖像數(shù)據(jù)是否是OpenCV格式：

isinstance(img, np.ndarray)

（4）matplotlib顯示阻塞問題

matplotlib.pyplot 中顯示圖像的兩種模式（交互和阻塞）及其在Python畫圖中的應(yīng)用_wonengguwozai的博客-CSDN博客_matplotlib 交互模式

? ? 下面這個(gè)例子講的是如何像matlab一樣同時(shí)打開多個(gè)窗口顯示圖片或線條進(jìn)行比較，同時(shí)也是在腳本中開啟交互模式后圖像一閃而過的解決辦法：

import matplotlib.pyplot as pltplt.ion() # 打開交互模式# 同時(shí)打開兩個(gè)窗口顯示圖片plt.figure()plt.imshow(image1)plt.figure()plt.imshow(image2)plt.ioff()# 顯示前關(guān)掉交互模式,避免一閃而過plt.show()

（5）matplotlib繪制矩形框

import matplotlib.pyplot as pltdef show_image(win_name, image, rect):plt.figure()plt.title(win_name)plt.imshow(image)rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')plt.gca().add_patch(rect)plt.show()

3.3 one-hot獨(dú)熱編碼

import os import numpy as np from sklearn import preprocessingdef gen_data_labels(label_list,ont_hot=True): ''' label_list:輸入labels ->list '''# 將labels轉(zhuǎn)為整數(shù)編碼# labels_set=list(set(label_list))# labels=[]# for label in label_list:# for k in range(len(labels_set)):# if label==labels_set[k]:# labels+=[k]# break# labels = np.asarray(labels)# 也可以用下面的方法：將labels轉(zhuǎn)為整數(shù)編碼labelEncoder = preprocessing.LabelEncoder()labels = labelEncoder.fit_transform(label_list)labels_set = labelEncoder.classes_for i in range(len(labels_set)):print("labels:{}->{}".format(labels_set[i],i))# 是否進(jìn)行獨(dú)熱編碼if ont_hot:labels_nums=len(labels_set)labels = labels.reshape(len(labels), 1)onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories=[range(labels_nums)])onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories='auto')labels = onehot_encoder.fit_transform(labels)return labels

3.4 循環(huán)產(chǎn)生batch數(shù)據(jù):

TXT文本：

1.jpg 1 11 2.jpg 2 12 3.jpg 3 13 4.jpg 4 14 5.jpg 5 15 6.jpg 6 16 7.jpg 7 17 8.jpg 8 18 # -*-coding: utf-8 -*- """@Project: LSTM@File : create_batch_data.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-10-27 18:20:15 """ import math import random import os import glob import numpy as npdef get_list_batch(inputs, batch_size=None, shuffle=False):'''循環(huán)產(chǎn)生batch數(shù)據(jù):param inputs: list數(shù)據(jù):param batch_size: batch大小:param shuffle: 是否打亂inputs數(shù)據(jù):return: 返回一個(gè)batch數(shù)據(jù)'''if shuffle:random.shuffle(inputs)while True:batch_inouts = inputs[0:batch_size]inputs=inputs[batch_size:] + inputs[:batch_size]# 循環(huán)移位，以便產(chǎn)生下一個(gè)batchyield batch_inoutsdef get_data_batch(inputs, batch_size=None, shuffle=False):'''循環(huán)產(chǎn)生batch數(shù)據(jù):param inputs: list數(shù)據(jù):param batch_size: batch大小:param shuffle: 是否打亂inputs數(shù)據(jù):return: 返回一個(gè)batch數(shù)據(jù)'''# rows,cols=inputs.shaperows=len(inputs)indices =list(range(rows))if shuffle:random.shuffle(indices )while True:batch_indices = indices[0:batch_size]indices= indices [batch_size:] + indices[:batch_size] # 循環(huán)移位，以便產(chǎn)生下一個(gè)batchbatch_data=find_list(batch_indices,inputs)# batch_data=find_array(batch_indices,inputs)yield batch_datadef find_list(indices,data):out=[]for i in indices:out=out+[data[i]]return outdef find_array(indices,data):rows,cols=data.shapeout = np.zeros((len(indices), cols))for i,index in enumerate(indices):out[i]=data[index]return outdef load_file_list(text_dir):text_dir = os.path.join(text_dir, '*.txt')text_list = glob.glob(text_dir)return text_listdef get_next_batch(batch):return batch.__next__()def load_image_labels(test_files):'''載圖txt文件，文件中每行為一個(gè)圖片信息，且以空格隔開：圖像路徑標(biāo)簽1 標(biāo)簽1，如：test_image/1.jpg 0 2:param test_files::return:'''images_list=[]labels_list=[]with open(test_files) as f:lines = f.readlines()for line in lines:#rstrip：用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' '，即：換行、回車、制表符、空格)content=line.rstrip().split(' ')name=content[0]labels=[]for value in content[1:]:labels.append(float(value))images_list.append(name)labels_list.append(labels)return images_list,labels_listif __name__ == '__main__':filename='./training_data/train.txt'images_list, labels_list=load_image_labels(filename)# inputs = np.reshape(np.arange(8*3), (8,3))iter = 10 # 迭代10次，每次輸出5個(gè)batch = get_data_batch(images_list, batch_size=3, shuffle=False)for i in range(iter):print('**************************')# train_batch=batch.__next__()batch_images=get_next_batch(batch)print(batch_images)

3.5 統(tǒng)計(jì)元素個(gè)數(shù)和種類

label_list=['星座', '星座', '財(cái)經(jīng)', '財(cái)經(jīng)', '財(cái)經(jīng)', '教育', '教育', '教育', ] set1 = set(label_list) # set1 ={'財(cái)經(jīng)', '教育', '星座'},set集合中不允許重復(fù)元素出現(xiàn) set2 = np.unique(label_list)# set2=['教育' '星座' '財(cái)經(jīng)']# 若要輸出對應(yīng)元素的個(gè)數(shù)： from collections import Counter arr = [1, 2, 3, 3, 2, 1, 0, 2] result = {} for i in set(arr):result[i] = arr.count(i) print(result)# 更加簡單的方法： import pandas as pd print(pd.value_counts(label_list))

3.6 python 字典(dict)按鍵和值排序

python 字典（dict）的特點(diǎn)就是無序的，按照鍵（key）來提取相應(yīng)值（value），如果我們需要字典按值排序的話，那可以用下面的方法來進(jìn)行：
1 .下面的是按照value的值從大到小的順序來排序

dic = {'a':31, 'bc':5, 'c':3, 'asd':4, 'aa':74, 'd':0} dict= sorted(dic.items(), key=lambda d:d[1], reverse = True) print dict

輸出的結(jié)果：
[('aa', 74), ('a', 31), ('bc', 5), ('asd', 4), ('c', 3), ('d', 0)]

下面我們分解下代碼
print dic.items() 得到[(鍵，值)]的列表。
然后用sorted方法，通過key這個(gè)參數(shù)，指定排序是按照value，也就是第一個(gè)元素d[1的值來排序。reverse = True表示是需要翻轉(zhuǎn)的，默認(rèn)是從小到大，翻轉(zhuǎn)的話，那就是從大到小。
2 .對字典按鍵（key）排序：

dic = {'a':31, 'bc':5, 'c':3, 'asd':4, 'aa':74, 'd':0} dict= sorted(dic.items(), key=lambda d:d[0]) d[0]表示字典的鍵 print dict

3.7 自定義排序sorted

? ? 下面my_sort函數(shù)，將根據(jù)labels的相同的個(gè)數(shù)進(jìn)行排序，把labels相同的個(gè)數(shù)多的樣本，排在前面

# -*-coding: utf-8 -*- """@Project: IntelligentManufacture@File : statistic_analysis.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2019-02-15 13:47:58 """ import pandas as pd import numpy as np import functoolsdef print_cluster_info(title,labels_id, labels,columns = ['labels_id', 'labels']):index= np.arange(0, len(labels_id)) + 1content = np.array([labels_id, labels]).Tdf = pd.DataFrame(content, index=index, columns=columns) # 生成6行4列位置print('*************************************************')print("{}{}".format(title,df))def print_cluster_container(title,cluster_container,columns = ['labels_id', 'labels']):''':param cluster_container:type:list[tupe()]:param columns::return:'''labels_id, labels=zip(*cluster_container)labels_id=list(labels_id)labels=list(labels)print_cluster_info(title,labels_id, labels, columns=columns)def sort_cluster_container(cluster_container):'''自定義排序：將根據(jù)labels的相同的個(gè)數(shù)進(jìn)行排序，把labels相同的個(gè)數(shù)多的樣本，排在前面:param labels_id::param labels::return:'''# labels_id=list(cluster_container.keys())# labels=list(cluster_container.values())labels_id, labels=zip(*cluster_container)labels_id=list(labels_id)labels=list(labels)# 求每個(gè)labels的樣本個(gè)數(shù)value_counts_dictvalue_counts_dict = {}labels_set = set(labels)for i in labels_set:value_counts_dict[i] = labels.count(i)def cmp(a, b):# 降序a_key, a_value = ab_key, b_value = ba_count = value_counts_dict[a_value]b_count = value_counts_dict[b_value]if a_count > b_count: # 個(gè)數(shù)多的放在前面return -1elif (a_count == b_count) and (a_value > b_value): # 當(dāng)個(gè)數(shù)相同時(shí)，則value大的放在前面return -1else:return 1out = sorted(cluster_container, key=functools.cmp_to_key(cmp))return outif __name__=='__main__':labels_id=["image0",'image1',"image2","image3","image4","image5","image6"]labels=[0.0,1.0,2.0,1.0,1.0,2.0,3.0]# labels=['L0','L1','L2','L1','L1','L2',"L3"]cluster_container=list(zip(labels_id, labels))print("cluster_container:{}".format(cluster_container))print_cluster_container("排序前:\n",cluster_container, columns=['labels_id', 'labels'])out=sort_cluster_container(cluster_container)print_cluster_container("排序后:\n",out, columns=['labels_id', 'labels'])

結(jié)果：?

3.8 加載yml配置文件

? ? 假設(shè)config.yml的配置文件如下：

## Basic config
batch_size: 2
learning_rate: 0.001
epoch: 1000

## reset image size
height: 128
width: 128

利用Python可以如下加載數(shù)據(jù)：

import yamlclass Dict2Obj:'''dict轉(zhuǎn)類對象'''def __init__(self, bokeyuan):self.__dict__.update(bokeyuan)def load_config_file(file):with open(file, 'r') as f:data_dict = yaml.load(f,Loader=yaml.FullLoader)data_dict = Dict2Obj(data_dict)return data_dictif __name__=="__main__":config_file='../config/config.yml'para=load_config_file(config_file)print("batch_size:{}".format(para.batch_size))print("learning_rate:{}".format(para.learning_rate))print("epoch:{}".format(para.epoch))

?運(yùn)行輸出結(jié)果：

batch_size:2
learning_rate:0.001
epoch:1000

3.9 移動(dòng)、復(fù)制、重命名文件?

# -*- coding: utf-8 -*- #!/usr/bin/python #test_copyfile.pyimport os,shutil def rename(image_list):for name in image_list:cut_len=len('_cropped.jpg')newName = name[:-cut_len]+'.jpg'print(name)print(newName)os.rename(name, newName)def mymovefile(srcfile,dstfile):if not os.path.isfile(srcfile):print "%s not exist!"%(srcfile)else:fpath,fname=os.path.split(dstfile) #分離文件名和路徑if not os.path.exists(fpath):os.makedirs(fpath) #創(chuàng)建路徑shutil.move(srcfile,dstfile) #移動(dòng)文件print "move %s -> %s"%( srcfile,dstfile)def mycopyfile(srcfile,dstfile):if not os.path.isfile(srcfile):print "%s not exist!"%(srcfile)else:fpath,fname=os.path.split(dstfile) #分離文件名和路徑if not os.path.exists(fpath):os.makedirs(fpath) #創(chuàng)建路徑shutil.copyfile(srcfile,dstfile) #復(fù)制文件print "copy %s -> %s"%( srcfile,dstfile)srcfile='/Users/xxx/git/project1/test.sh' dstfile='/Users/xxx/tmp/tmp/1/test.sh'mymovefile(srcfile,dstfile)

3.10 產(chǎn)生batch_size的數(shù)據(jù)

def get_batch(image_list, batch_size):nums = len(image_list)# batch_num = math.ceil(sample_num / batch_size)batch_num = (nums + batch_size - 1) // batch_sizefor i in range(batch_num):start = i * batch_sizeend = min((i + 1) * batch_size, nums)batch_image = image_list[start:end]print("batch_image:{}".format(batch_image))if __name__ == "__main__":nums = 20batch_size = 25image_list = []for i in range(nums): image_list.append(str(i + 1) + ".jpg")get_batch(image_list, batch_size)

總結(jié)

以上是生活随笔為你收集整理的Python常用的模块的使用技巧的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Tensorflow生成自己的图片数据集
下一篇：使用自己的数据集训练GoogLenet

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

python

Python常用的模块的使用技巧

Python常用模塊的使用技巧

1.Python配置說明

（1）Python注釋說明

（2）函數(shù)說明

（3）ipynb文件轉(zhuǎn).py文件

（4）Python計(jì)算運(yùn)行時(shí)間

（5）鏡像加速方法

（6）代碼分析工具 Pylint安裝+pycharm下的配置

（7）Python添加環(huán)境路徑和搜索路徑的方法

（8）conda常用命令

2.常用的模塊

2.1 numpy模塊：

(1)矩陣的拼接和分割,奇偶項(xiàng)分割數(shù)據(jù)

(2)按照列進(jìn)行排序

(3)提取符合條件的某行某列

(4)查找符合條件的向量

(5)打亂順序

2.2 pickle模塊

2.3 random.shuffle產(chǎn)生固定種子

2.4 zip()與zip(*) 函數(shù)：

2.5 map、for快速遍歷方法：

2.6 glob模塊

2.7 os模塊

2.8?判斷圖像文件為空和文件不存，文件過小

2.9?保存多維array數(shù)組的方法

2.10讀取txt數(shù)據(jù)的方法

2.11 pandas模塊

（1）文件數(shù)據(jù)拼接

（2）DataFrame

Pandas DataFrame數(shù)據(jù)的增、刪、改、查

2.12 csv模塊

2.13?logging模塊

3. 數(shù)據(jù)預(yù)處理

3.1 數(shù)據(jù)（圖像）分塊處理

3.2 讀取圖片和顯示

（1）matplotlib.image、PIL.Image、cv2圖像讀取模塊

（2）將 numpy 數(shù)組轉(zhuǎn)換為 PIL 圖片：

（3）python中PIL.Image和OpenCV圖像格式相互轉(zhuǎn)換

（4）matplotlib顯示阻塞問題

（5）matplotlib繪制矩形框

3.3 one-hot獨(dú)熱編碼

3.4 循環(huán)產(chǎn)生batch數(shù)據(jù):

3.5 統(tǒng)計(jì)元素個(gè)數(shù)和種類

3.6 python 字典(dict)按鍵和值排序

3.7 自定義排序sorted

3.8 加載yml配置文件

3.9 移動(dòng)、復(fù)制、重命名文件?

3.10 產(chǎn)生batch_size的數(shù)據(jù)

總結(jié)

**2.4 zip()與zip(*) 函數(shù)：**