Python常用的模块的使用技巧
Python常用模塊的使用技巧
?目錄
Python常用模塊的使用技巧
1.Python配置說明
(1)Python注釋說明
(2)函數(shù)說明
(3)ipynb文件轉(zhuǎn).py文件
(4)Python計(jì)算運(yùn)行時(shí)間
(5)鏡像加速方法
(6)代碼分析工具 Pylint安裝+pycharm下的配置
(7)Python添加環(huán)境路徑和搜索路徑的方法
(8)conda常用命令
2.常用的模塊
2.1 numpy模塊:
(1)矩陣的拼接和分割,奇偶項(xiàng)分割數(shù)據(jù)
(2)按照列進(jìn)行排序
(3)提取符合條件的某行某列
(4)查找符合條件的向量
(5)打亂順序
2.2 pickle模塊
2.3 random.shuffle產(chǎn)生固定種子
2.4 zip()與zip(*) 函數(shù):
2.5 map、for快速遍歷方法:
2.6 glob模塊
2.7 os模塊
2.8?判斷圖像文件為空和文件不存,文件過小
2.9?保存多維array數(shù)組的方法
2.10讀取txt數(shù)據(jù)的方法
2.11 pandas模塊
(1)文件數(shù)據(jù)拼接
(2)DataFrame
Pandas DataFrame數(shù)據(jù)的增、刪、改、查
2.12 csv模塊
2.13?logging模塊
3. 數(shù)據(jù)預(yù)處理
3.1 數(shù)據(jù)(圖像)分塊處理
3.2 讀取圖片和顯示
(1)matplotlib.image、PIL.Image、cv2圖像讀取模塊
(2)將 numpy 數(shù)組轉(zhuǎn)換為 PIL 圖片:
(3)python中PIL.Image和OpenCV圖像格式相互轉(zhuǎn)換
(4)matplotlib顯示阻塞問題
(5)matplotlib繪制矩形框
3.3 one-hot獨(dú)熱編碼
3.4 循環(huán)產(chǎn)生batch數(shù)據(jù):
3.5 統(tǒng)計(jì)元素個(gè)數(shù)和種類
3.6 python 字典(dict)按鍵和值排序
3.7 自定義排序sorted
3.8 加載yml配置文件
3.9 移動(dòng)、復(fù)制、重命名文件?
3.10 產(chǎn)生batch_size的數(shù)據(jù)
1.Python配置說明
(1)Python注釋說明
? ? ? ?在pyCharm中File->Setting->Editor->File and Code Templates->Python Script:
# -*-coding: utf-8 -*- """@Project: ${PROJECT_NAME}@File : ${NAME}.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : ${YEAR}-${MONTH}-${DAY} ${HOUR}:${MINUTE}:${SECOND} """(2)函數(shù)說明
def my_fun(para1,para2):'''函數(shù)功能實(shí)現(xiàn)簡介:param para1: 輸入?yún)?shù)說明,類型:param para2: 輸入?yún)?shù)說明,類型:return: 返回內(nèi)容,類型'''(3)ipynb文件轉(zhuǎn).py文件
jupyter nbconvert --to script demo.ipynb(4)Python計(jì)算運(yùn)行時(shí)間
import datetimedef RUN_TIME(deta_time):'''返回毫秒,deta_time.seconds獲得秒數(shù)=1000ms,deta_time.microseconds獲得微妙數(shù)=1/1000ms:param deta_time: ms:return:'''time_=deta_time.seconds * 1000 + deta_time.microseconds / 1000.0return time_T0 = datetime.datetime.now() # do something T1 = datetime.datetime.now()print("rum time:{}".format(RUN_TIME(T1-T0)))(5)鏡像加速方法
TUNA 還提供了 Anaconda 倉庫的鏡像,運(yùn)行以下命令:
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --set show_channel_urls yes
設(shè)置上述鏡像后,瞬間提速,但該鏡像僅限該命令窗口有效
windows 下在用戶目錄下面創(chuàng)建pip,然后創(chuàng)建pip.ini文件,把阿里的源復(fù)制進(jìn)去:
[global]
trusted-host=mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/
Linux下,修改 ~/.pip/pip.conf (沒有就創(chuàng)建一個(gè)文件夾及文件。文件夾要加“.”,表示是隱藏文件夾)
內(nèi)容如下:
[global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple [install] trusted-host=mirrors.aliyun.comwindows下,直接在user目錄中創(chuàng)建一個(gè)pip目錄,如:C:\Users\xx\pip,新建文件pip.ini。內(nèi)容同上。
臨時(shí)的方法:pip時(shí)加上"-i https://mirrors.aliyun.com/pypi/simple/":,如
pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/
(6)代碼分析工具 Pylint安裝+pycharm下的配置
代碼分析工具 Pylint安裝+pycharm下的配置 - oohy - 博客園
(7)Python添加環(huán)境路徑和搜索路徑的方法
添加環(huán)境路徑:
# 添加graphviz環(huán)境路徑 import os os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'?搜索路徑:
import sys import os# 打印當(dāng)前python搜索模塊的路徑集 print(sys.path) # 打印當(dāng)前文件所在路徑 print("os.path.dirname(__file__):", os.path.dirname(__file__)) print("os.getcwd(): ", os.getcwd()) # get current work directory:cwd:獲得當(dāng)前工作目錄'''添加相關(guān)的路徑 sys.path.append(‘你的模塊的名稱’)。 sys.path.insert(0,’模塊的名稱’) ''' # 先添加image_processing所在目錄路徑 sys.path.append("F:/project/python-learning-notes/utils") # sys.path.append(os.getcwd()) # 再倒入該包名 import image_processing# os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'image_path = "F:/project/python-learning-notes/dataset/test_image/1.jpg" image = image_processing.read_image(image_path) image_processing.cv_show_image("image", image)(8)conda常用命令
- 列舉當(dāng)前所有環(huán)境:conda info --envs 或者conda env list
- 生成一個(gè)environment.yml文件:conda env export > environment.yml
- 根據(jù)environment.yml文件安裝該環(huán)境:conda env create -f environment.yml
- 列舉當(dāng)前活躍環(huán)境下的所有包:conda list
- 參數(shù)某個(gè)環(huán)境:conda?remove?--name?your_env_name?--all
2.常用的模塊
2.1 numpy模塊:
(1)矩陣的拼接和分割,奇偶項(xiàng)分割數(shù)據(jù)
# 產(chǎn)生5*2的矩陣數(shù)據(jù) data1=np.arange(0,10) data1=data1.reshape([5,2])# 矩陣拼接 y = np.concatenate([data1, data2], 0)# 矩陣拼接 def cat_labels_indexMat(labels,indexMat):indexMat_labels = np.concatenate([labels,indexMat], axis=1)return indexMat_labels# 矩陣分割 def split_labels_indexMat(indexMat_labels,label_index=0):labels = indexMat_labels[:, 0:label_index+1] # 第一列是labelsindexMat = indexMat_labels[:, label_index+1:] # 其余是indexMatreturn labels, indexMatdef split_data(data):'''按照奇偶項(xiàng)分割數(shù)據(jù):param data: :return: '''data1 = data[0::2]data2 = data[1::2]return data1,data2 if __name__=='__main__':data = np.arange(0, 20)data = data.reshape([10, 2])data1,data2=split_data(data)print("embeddings:{}".format(data))print("embeddings1:{}".format(data1))print("embeddings2:{}".format(data2))(2)按照列進(jìn)行排序
pair_issame = pair_issame[np.lexsort(pair_issame.T)]#按最后一列進(jìn)行排序(3)提取符合條件的某行某列
假設(shè)有數(shù)據(jù):pair_issame:
如果想提取第三列的為"1"的數(shù)據(jù),可以這樣:
pair_issame_1 = pair_issame[pair_issame[:, -1] == "1", :] # 篩選數(shù)組(4)查找符合條件的向量
import numpy as npdef matching_data_vecror(data, vector):'''從data中匹配vector向量,查找出現(xiàn)vector的index,如:data = [[1., 0., 0.],[0., 0., 0.],[2., 0., 0.],[0., 0., 0.],[0., 3., 0.],[0., 0., 4.]]# 查找data中出現(xiàn)[0, 0, 0]的indexdata = np.asarray(data)vector=[0, 0, 0]index =find_index(data,vector)print(index)>>[False True False True False False]# 實(shí)現(xiàn)去除data數(shù)組中元素為[0, 0, 0]的行向量pair_issame_1 = data[~index, :] # 篩選數(shù)組:param data::param vector::return:'''# index = (data[:, 0] == 0) & (data[:, 1] == 0) & (data[:, 2] == 0)row_nums = len(data)clo_nums = len(vector)index = np.asarray([True] * row_nums)for i in range(clo_nums):index = index & (data[:, i] == vector[i])return indexdef set_mat_vecror(data, index, vector):'''實(shí)現(xiàn)將data指定index位置的數(shù)據(jù)設(shè)置為vector# 實(shí)現(xiàn)將大于閾值分?jǐn)?shù)的point,設(shè)置為vector = [10, 10]point = [[0., 0.], [1., 1.], [2., 2.],[3., 3.], [4., 4.], [5., 5.]]point = np.asarray(point) # 每個(gè)數(shù)據(jù)點(diǎn)score = np.array([0.7, 0.2, 0.3, 0.4, 0.5, 0.6])# 每個(gè)數(shù)據(jù)點(diǎn)的分?jǐn)?shù)score_th=0.5index = np.where(score > score_th) # 獲得大于閾值分?jǐn)?shù)的所有下標(biāo)vector = [10, 10] # 將大于閾值的數(shù)據(jù)設(shè)置為vectorout = set_mat_vecror(point, index, vector):param data::param index::param vector::return:'''data[index, :] = vectorreturn data(5)打亂順序
python numpy array random 隨機(jī)排列(打亂訓(xùn)練數(shù)據(jù))_Song_Lynn的博客-CSDN博客_numpy 隨機(jī)排列
per = np.random.permutation(pair_issame_1.shape[0]) # 打亂后的行號pair_issame_1 = pair_issame_0[per, :] # 獲取打亂后的數(shù)據(jù)2.2 pickle模塊
? ? pickle可以存儲(chǔ)什么類型的數(shù)據(jù)呢?
2.3 random.shuffle產(chǎn)生固定種子
files_list=...labels_list=...shuffle=Trueif shuffle:# seeds = random.randint(0,len(files_list)) #產(chǎn)生一個(gè)隨機(jī)數(shù)種子seeds = 100 # 固定種子,只要seed的值一樣,后續(xù)生成的隨機(jī)數(shù)都一樣random.seed(seeds)random.shuffle(files_list)random.seed(seeds)random.shuffle(labels_list)2.4 zip()與zip(*) 函數(shù):
? ? ?zip()?函數(shù)用于將可迭代的對象作為參數(shù),將對象中對應(yīng)的元素打包成一個(gè)個(gè)元組,然后返回由這些元組組成的列表。如果各個(gè)迭代器的元素個(gè)數(shù)不一致,則返回列表長度與最短的對象相同,利用 * 號操作符,可以將元組解壓為列表。
zip 方法在 Python 2 和 Python 3 中的不同:在 Python 3.x 中為了減少內(nèi)存,zip() 返回的是一個(gè)對象。如需展示列表,需手動(dòng) list() 轉(zhuǎn)換。
a = [1,2,3] b = [4,5,6] c = [4,5,6,7,8] zipped = zip(a,b) # 打包為元組的列表 # 結(jié)果:[(1, 4), (2, 5), (3, 6)] zip(a,c) # 元素個(gè)數(shù)與最短的列表一致 # 結(jié)果:[(1, 4), (2, 5), (3, 6)] zip(*zipped) # 與 zip 相反,*zipped 可理解為解壓,返回二維矩陣式 # 結(jié)果:[(1, 2, 3), (4, 5, 6)]2.5 map、for快速遍歷方法:
# 假設(shè)files_list為: files_list=['../training_data/test\\0.txt', '../training_data/test\\1.txt', '../training_data/test\\2.txt', '../training_data/test\\3.txt', '../training_data/test\\4.txt', '../training_data/test\\5.txt', '../training_data/test\\6.txt']# 下面的三個(gè)方法都是現(xiàn)實(shí)獲得files_list的文件名 files_nemes1=list(map(lambda s: os.path.basename(s),files_list)) files_nemes2=list(os.path.basename(i)for i in files_list) files_nemes3=[os.path.basename(i)for i in files_list]2.6 glob模塊
? ? ?glob模塊是最簡單的模塊之一,內(nèi)容非常少。用它可以查找符合特定規(guī)則的文件路徑名。跟使用windows下的文件搜索差不多。查找文件只用到三個(gè)匹配符:"*", "?", "[]"。"*"匹配0個(gè)或多個(gè)字符;"?"匹配單個(gè)字符;"[]"匹配指定范圍內(nèi)的字符,如:[0-9]匹配數(shù)字。
import glob #獲取指定目錄下的所有圖片 print glob.glob(r"E:\Picture\*\*.jpg") #獲取上級目錄的所有.py文件 print glob.glob(r'../*.py') #相對路徑? ? ?對于遍歷指定目錄的jpg圖片,可以這樣:
# -*- coding:utf-8 -*- import glob #遍歷指定目錄下的jpg圖片 image_path="/home/ubuntu/TFProject/view-finding-network/test_images/*.jpg" for per_path in glob.glob(image_path):print(per_path)? ? 若想遍歷多個(gè)格式的文件,可以這樣:
# 遍歷'jpg','png','jpeg'的圖片 image_format=['jpg','png','jpeg']#圖片格式 image_dir='./test_image' #圖片目錄 image_list=[] for format in image_format:path=image_dir+'/*.'+formatimage_list.extend(glob.glob(path)) print(image_list)2.7 os模塊
import os os.getcwd()#獲得當(dāng)前工作目錄 os.path.abspath('.')#獲得當(dāng)前工作目錄 os.path.abspath('..')#獲得當(dāng)前工作目錄的父目錄 os.path.abspath(os.curdir)#獲得當(dāng)前工作目錄 os.path.join(os.getcwd(),'filename')#獲取當(dāng)前目錄,并組合成新目錄 os.path.exists(path)#判斷文件是否存在 os.path.isfile(path)#如果path是一個(gè)存在的文件,返回True。否則返回False。 os.path.basename('path/to/test.jpg')#獲得路徑下的文件名:test.jpg os.path.getsize(path) #返回文件大小,如果文件不存在就返回錯(cuò)誤 path=os.path.dirname('path/to/test.jpg')#獲得路徑:path/to os.sep#當(dāng)前操作系統(tǒng)的路徑分隔符,Linux/UNIX是‘/’,Windows是‘\\’ dirname='path/to/test.jpg'.split(os.sep)[-1]#獲得當(dāng)前文件夾的名稱“test.jpg” dirname='path/to/test.jpg'.split(os.sep)[-2]#獲得當(dāng)前文件夾的名稱“to”# 刪除該目錄下的所有文件 def delete_dir_file(dir_path):ls = os.listdir(dir_path)for i in ls:c_path = os.path.join(dir_path, i)if os.path.isdir(c_path):delete_dir_file(c_path)else:os.remove(c_path) # 若目錄不存在,則創(chuàng)建新的目錄(只能創(chuàng)建一級目錄) if not os.path.exists(out_dir):os.mkdir(out_dir)# 創(chuàng)建多級目錄 if not os.path.exists(segment_out_name):os.makedirs(segment_out_dir)# 刪除該目錄下的所有文件 delete_dir_file(out_dir) # 或者: shutil.rmtree(out_dir) # delete output folder? ? ?下面是實(shí)現(xiàn):【1】getFilePathList:獲取file_dir目錄下,所有文本路徑,包括子目錄文件,【2】get_files_list:獲得file_dir目錄下,后綴名為postfix所有文件列表,包括子目錄,? ?【3】gen_files_labels: 獲取files_dir路徑下所有文件路徑,以及l(fā)abels,其中l(wèi)abels用子級文件名表示
# coding: utf-8 import os import os.path import pandas as pddef getFilePathList(file_dir):'''獲取file_dir目錄下,所有文本路徑,包括子目錄文件:param rootDir::return:'''filePath_list = []for walk in os.walk(file_dir):part_filePath_list = [os.path.join(walk[0], file) for file in walk[2]]filePath_list.extend(part_filePath_list)return filePath_listdef get_files_list(file_dir,postfix='ALL'):'''獲得file_dir目錄下,后綴名為postfix所有文件列表,包括子目錄:param file_dir::param postfix::return:'''postfix=postfix.split('.')[-1]file_list=[]filePath_list = getFilePathList(file_dir)if postfix=='ALL':file_list=filePath_listelse:for file in filePath_list:basename=os.path.basename(file) # 獲得路徑下的文件名postfix_name=basename.split('.')[-1]if postfix_name==postfix:file_list.append(file)file_list.sort()return file_listdef gen_files_labels(files_dir):'''獲取files_dir路徑下所有文件路徑,以及l(fā)abels,其中l(wèi)abels用子級文件名表示files_dir目錄下,同一類別的文件放一個(gè)文件夾,其labels即為文件的名:param files_dir::return:filePath_list所有文件的路徑,label_list對應(yīng)的labels'''filePath_list = getFilePathList(files_dir)print("files nums:{}".format(len(filePath_list)))# 獲取所有樣本標(biāo)簽label_list = []for filePath in filePath_list:label = filePath.split(os.sep)[-2]label_list.append(label)labels_set=list(set(label_list))print("labels:{}".format(labels_set))# 標(biāo)簽統(tǒng)計(jì)計(jì)數(shù)print(pd.value_counts(label_list))return filePath_list,label_listif __name__=='__main__':file_dir='JPEGImages'file_list=get_files_list(file_dir)for file in file_list:print(file)實(shí)現(xiàn)遍歷dir目錄下,所有文件(包含子文件夾的文件)
# coding: utf-8 import os import os.pathdef get_files_list(dir):'''實(shí)現(xiàn)遍歷dir目錄下,所有文件(包含子文件夾的文件):param dir:指定文件夾目錄:return:包含所有文件的列表->list'''# parent:父目錄, filenames:該目錄下所有文件夾,filenames:該目錄下的文件名files_list=[]for parent, dirnames, filenames in os.walk(dir):for filename in filenames:# print("parent is: " + parent)# print("filename is: " + filename)# print(os.path.join(parent, filename)) # 輸出rootdir路徑下所有文件(包含子文件)信息files_list.append([os.path.join(parent, filename)])return files_list if __name__=='__main__':dir = 'images'files_list=get_files_list(dir)print(files_list)下面是一個(gè)封裝好的get_input_list()函數(shù),path是文件夾,則遍歷所有png,jpg,jpeg等圖像文件,?path是txt文件路徑,則讀取txt中保存的文件列表(不要出現(xiàn)多余一個(gè)的空行),path是單個(gè)圖片文件:path/to/1.png。
# -*-coding: utf-8 -*- """@Project: hdrnet@File : my_test.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-08-28 14:30:51 """ import os import logging import relogging.basicConfig(format="[%(process)d] %(levelname)s %(filename)s:%(lineno)s | %(message)s") log = logging.getLogger("train") log.setLevel(logging.INFO)def get_input_list(path):'''返回所有圖片的路徑:param path:單張圖片的路徑,或文件夾,或者txt文件:return:'''regex = re.compile(".*.(png|jpeg|jpg|tif|tiff)")# path是文件夾,則遍歷所有png,jpg,jpeg等圖像文件# path/toif os.path.isdir(path):inputs = os.listdir(path)inputs = [os.path.join(path, f) for f in inputs if regex.match(f)]log.info("Directory input {}, with {} images".format(path, len(inputs)))# path是txt文件路徑,則讀取txt中保存的文件列表(不要出現(xiàn)多余一個(gè)的空行)# path/to/filelist.txtelif os.path.splitext(path)[-1] == ".txt":dirname = os.path.dirname(path)with open(path, 'r') as fid:inputs = [l.strip() for l in fid.readlines()]inputs = [os.path.join(dirname, im) for im in inputs]log.info("Filelist input {}, with {} images".format(path, len(inputs)))# path是單個(gè)圖片文件:path/to/1.pngelif regex.match(path):inputs = [path]log.info("Single input {}".format(path))return inputsif __name__ == '__main__':path='dataset/filelist.txt';result=get_input_list(path);print(result);2.8?判斷圖像文件為空和文件不存,文件過小
def isValidImage(images_list,sizeTh=1000,isRemove=False):''' 去除不存的文件和文件過小的文件列表:param images_list::param sizeTh: 文件大小閾值,單位:字節(jié)B,默認(rèn)1000B:param isRemove: 是否在硬盤上刪除被損壞的原文件:return:'''i=0while i<len(images_list):path=images_list[i]# 判斷文件是否存在if not (os.path.exists(path)):print(" non-existent file:{}".format(path))images_list.pop(i)continue# 判斷文件是否為空if os.path.getsize(path)<sizeTh:print(" empty file:{}".format(path))if isRemove:os.remove(path)print(" info:----------------remove image:{}".format(path))images_list.pop(i)continue# 判斷圖像文件是否損壞try:Image.open(path).verify()except :print(" damaged image:{}".format(path))if isRemove:os.remove(path)print(" info:----------------remove image:{}".format(path))images_list.pop(i)continuei += 1return images_list2.9?保存多維array數(shù)組的方法
? ?由于np.savetxt()不能直接保存三維以上的數(shù)組,因此需要轉(zhuǎn)為向量的形式來保存
import numpy as nparr1 = np.zeros((3,4,5), dtype='int16') # 創(chuàng)建3*4*5全0三維數(shù)組 print("維度:",np.shape(arr1)) arr1[0,:,:]=0 arr1[1,:,:]=1 arr1[2,:,:]=2 print("arr1=",arr1) # 由于savetxt不能保存三維以上的數(shù)組,因此需要轉(zhuǎn)為向量來保存 vector=arr1.reshape((-1,1)) np.savetxt("data.txt", vector)data= np.loadtxt("data.txt") print("data=",data) arr2=data.reshape(arr1.shape) print("arr2=",arr2)2.10讀取txt數(shù)據(jù)的方法
這是封裝好的txt讀寫模塊,這里輸入和輸出的數(shù)據(jù)都是list列表:
# -*-coding: utf-8 -*- """@Project: TxtStorage@File : TxtStorage.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-07-12 17:32:47 """ from numpy import *class TxtStorage:# def __init__(self):def write_txt(self, content, filename, mode='w'):"""保存txt數(shù)據(jù):param content:需要保存的數(shù)據(jù),type->list:param filename:文件名:param mode:讀寫模式:'w' or 'a':return: void"""with open(filename, mode) as f:for line in content:str_line=""for col,data in enumerate(line):if not col == len(line) - 1:# 以空格作為分隔符str_line=str_line+str(data)+" "else:# 每行最后一個(gè)數(shù)據(jù)用換行符“\n”str_line=str_line+str(data)+"\n"f.write(str_line)def read_txt(self, fileName):"""讀取txt數(shù)據(jù)函數(shù):param filename:文件名:return: txt的數(shù)據(jù)列表:rtype: listPython中有三個(gè)去除頭尾字符、空白符的函數(shù),它們依次為:strip: 用來去除頭尾字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)lstrip:用來去除開頭字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)rstrip:用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)注意:這些函數(shù)都只會(huì)刪除頭和尾的字符,中間的不會(huì)刪除。"""txtData=[]with open(fileName, 'r') as f:lines = f.readlines()for line in lines:lineData = line.rstrip().split(" ")data=[]for l in lineData:if self.is_int(l): # isdigit() 方法檢測字符串是否只由數(shù)字組成,只能判斷整數(shù)data.append(int(l))elif self.is_float(l):#判斷是否為小數(shù)data.append(float(l))else:data.append(l)txtData.append(data)return txtDatadef is_int(self,str):# 判斷是否為整數(shù)try:x = int(str)return isinstance(x, int)except ValueError:return Falsedef is_float(self,str):# 判斷是否為整數(shù)和小數(shù)try:x = float(str)return isinstance(x, float)except ValueError:return Falseif __name__ == '__main__':txt_filename = 'test.txt'w_data = [['1.jpg', 'dog', 200, 300,1.0], ['2.jpg', 'dog', 20, 30,-2]]print("w_data=",w_data)txt_str = TxtStorage()txt_str.write_txt(w_data, txt_filename, mode='w')r_data = txt_str.read_txt(txt_filename)print('r_data=',r_data)一個(gè)讀取TXT文本數(shù)據(jù)的常用操作:
# -*-coding: utf-8 -*- """@Project: TxtStorage@File : TxtStorage.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-07-12 17:32:47 """ from numpy import *def write_txt(content, filename, mode='w'):"""保存txt數(shù)據(jù):param content:需要保存的數(shù)據(jù),type->list:param filename:文件名:param mode:讀寫模式:'w' or 'a':return: void"""with open(filename, mode) as f:for line in content:str_line = ""for col, data in enumerate(line):if not col == len(line) - 1:# 以空格作為分隔符str_line = str_line + str(data) + " "else:# 每行最后一個(gè)數(shù)據(jù)用換行符“\n”str_line = str_line + str(data) + "\n"f.write(str_line)def read_txt(fileName):"""讀取txt數(shù)據(jù)函數(shù):param filename:文件名:return: txt的數(shù)據(jù)列表:rtype: listPython中有三個(gè)去除頭尾字符、空白符的函數(shù),它們依次為:strip: 用來去除頭尾字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)lstrip:用來去除開頭字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)rstrip:用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)注意:這些函數(shù)都只會(huì)刪除頭和尾的字符,中間的不會(huì)刪除。"""txtData = []with open(fileName, 'r') as f:lines = f.readlines()for line in lines:lineData = line.rstrip().split(" ")data = []for l in lineData:if is_int(l): # isdigit() 方法檢測字符串是否只由數(shù)字組成,只能判斷整數(shù)data.append(int(l))elif is_float(l): # 判斷是否為小數(shù)data.append(float(l))else:data.append(l)txtData.append(data)return txtDatadef is_int(str):# 判斷是否為整數(shù)try:x = int(str)return isinstance(x, int)except ValueError:return Falsedef is_float(str):# 判斷是否為整數(shù)和小數(shù)try:x = float(str)return isinstance(x, float)except ValueError:return Falsedef merge_list(data1,data2):'''將兩個(gè)list進(jìn)行合并:param data1::param data2::return:返回合并后的list'''if not len(data1)==len(data2):returnall_data=[]for d1,d2 in zip(data1,data2):all_data.append(d1+d2)return all_datadef split_list(data,split_index=1):'''將data切分成兩部分:param data: list:param split_index: 切分的位置:return:'''data1=[]data2=[]for d in data:d1=d[0:split_index]d2=d[split_index:]data1.append(d1)data2.append(d2)return data1,data2if __name__ == '__main__':txt_filename = 'test.txt'w_data = [['1.jpg', 'dog', 200, 300, 1.0], ['2.jpg', 'dog', 20, 30, -2]]print("w_data=", w_data)write_txt(w_data, txt_filename, mode='w')r_data = read_txt(txt_filename)print('r_data=', r_data)data1,data2=split_list(w_data)mer_data=merge_list(data1,data2)print('mer_data=', mer_data)讀取以下txt文件,可使用以下方法:
test_image/dog/1.jpg 0 11 test_image/dog/2.jpg 0 12 test_image/dog/3.jpg 0 13 test_image/dog/4.jpg 0 14 test_image/cat/1.jpg 1 15 test_image/cat/2.jpg 1 16 test_image/cat/3.jpg 1 17 test_image/cat/4.jpg 1 18 def load_image_labels(test_files):'''載圖txt文件,文件中每行為一個(gè)圖片信息,且以空格隔開:圖像路徑 標(biāo)簽1 標(biāo)簽1,如:test_image/1.jpg 0 2:param test_files::return:'''images_list=[]labels_list=[]with open(test_files) as f:lines = f.readlines()for line in lines:#rstrip:用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)content=line.rstrip().split(' ')name=content[0]labels=[]for value in content[1:]:labels.append(float(value))images_list.append(name)labels_list.append(labels)return images_list,labels_list2.11 pandas模塊
(1)文件數(shù)據(jù)拼接
假設(shè)有'data1.txt', 'data2.txt', 'data3.txt'數(shù)據(jù):
#'data1.txt' 1.jpg 11 2.jpg 12 3.jpg 13 #'data2.txt' 1.jpg 110 2.jpg 120 3.jpg 130 #'data3.txt' 1.jpg 1100 2.jpg 1200 3.jpg 1300需要拼接成:
1.jpg 11 110 1100 2.jpg 12 120 1200 3.jpg 13 130 1300實(shí)現(xiàn)代碼:
# coding: utf-8 import pandas as pddef concat_data(page,save_path):pd_data=[]for i in range(len(page)):content=pd.read_csv(page[i], dtype=str, delim_whitespace=True, header=None)if i==0:pd_data=pd.concat([content], axis=1)else:# 每一列數(shù)據(jù)拼接pd_data=pd.concat([pd_data,content.iloc[:,1]], axis=1)pd_data.to_csv(save_path, index=False, sep=' ', header=None)if __name__=='__main__':txt_path = ['data1.txt', 'data2.txt', 'data3.txt']out_path = 'all_data.txt'concat_data(txt_path,out_path)(2)DataFrame
import pandas as pd import numpy as npdef print_info(class_name,labels):# index =range(len(class_name))+1index=np.arange(0,len(class_name))+1columns = ['class_name', 'labels']content = np.array([class_name, labels]).Tdf = pd.DataFrame(content, index=index, columns=columns) # 生成6行4列位置print(df) # 輸出6行4列的表格class_name=['C1','C2','C3'] labels=[100,200,300] print_info(class_name,labels)Pandas DataFrame數(shù)據(jù)的增、刪、改、查
?Pandas DataFrame數(shù)據(jù)的增、刪、改、查_夏雨淋河的博客-CSDN博客_dataframe修改數(shù)據(jù)
?
| tom1 | f | 22 |
| tom2 | f | 22 |
| tom3 | m | 21 |
| tom1 | f | shenzhen1 | 22 | student | 1k |
| tom2 | f | shenzhen2 | 22 | teacher | 2k |
| tom3 | m | shenzhen3 | 21 | teacher | 2k |
| tom1 | f | shenzhen1 | 22 | student | 1k |
| tom2 | f | shenzhen2 | 22 | teacher | 2k |
| tom3 | m | shenzhen3 | 21 | teacher | 2k |
| tom4 | m | shenzhen4 | 24 | engineer | 3k |
| tom4 | m | shenzhen4 | 24 | engineer | 3k |
| tom1 | f | shenzhen1 | 22 | student | 1k |
| tom2 | f | shenzhen2 | 22 | teacher | 2k |
| tom3 | m | shenzhen3 | 21 | teacher | 2k |
2.12 csv模塊
? ? 使用csv模塊讀取csv文件的數(shù)據(jù)
# -*- coding:utf-8 -*- import csv csv_path='test.csv' with open(csv_path,'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader:#遍歷全部元素print(item)with open(csv_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader: # 遍歷全部元素print(item['filename'],item['class'],item.get('height'),item.get('width'))?運(yùn)行結(jié)果:
{'filename': 'test01.jpg', 'height': '638', 'class': 'dog', 'width': '486'} {'filename': 'test02.jpg', 'height': '954', 'class': 'person', 'width': '726'} test01.jpg dog 638 486 test02.jpg person 954 726讀寫過程:
import csvcsv_path = 'test.csv' #寫csv data=["1.jpg",200,300,'dog'] with open(csv_path, 'w+',newline='') as csv_file:# headers = [k for k in dictionaries[0]]headers=['filename','width','height', 'class']print(headers)writer = csv.DictWriter(csv_file, fieldnames=headers)writer.writeheader()dictionary={'filename': data[0],'width': data[1],'height': data[2],'class': data[3],}writer.writerow(dictionary)print(dictionary)#讀csv with open(csv_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader: # 遍歷全部元素print(item)with open(csv_path, 'r') as csvfile:reader = csv.DictReader(csvfile)for item in reader: # 遍歷全部元素print(item['filename'], item['class'], item.get('height'), item.get('width'))2.13?logging模塊
import logging# level級別:debug、info、warning、error以及critical# logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')logger = logging.getLogger(__name__)logger.debug("----1----")logger.info("----2----")logger.warning("----3----")logger.error("----4----")3. 數(shù)據(jù)預(yù)處理
3.1 數(shù)據(jù)(圖像)分塊處理
import numpy as npdef split_cell(mat,cell=(3,3),stepsize=(1,1)):''':param mat:輸入單通道的圖像數(shù)據(jù)(可能有誤,需要驗(yàn)證):param cell:塊大小:param stepsize: 步長stepsize<cell:return:'''rows,cols=np.shape(mat)Rx=cell[0]//2Ry=cell[1]//2stepX=stepsize[0]stepY=stepsize[1]dest=np.zeros(shape=(int((rows+stepX-1)/stepX),int((cols+stepY-1)/stepY)),dtype=np.float32)for i in range(0,rows,stepX):for j in range(0,cols,stepY):x1=i-Rxx2=i+Rxy1=j-Ry//坐標(biāo)有誤y2=j+Ry//x1=np.clip(x1,0,rows-1)x2=np.clip(x2,0,rows-1)y1=np.clip(y1,0,cols-1)y2=np.clip(y2,0,cols-1)#計(jì)算block的平均值block=mat[y1:(y2+1),x1:(x2+1)]m=np.mean(block)indexX=int((i+stepX-1)/stepX)#向上取整indexY=int((j+stepY-1)/stepY)dest[indexX,indexY]=m/255# dest=dest.reshape()return destdef split_block(mat,grid=(7,7)):rows,cols=gridblock_image=[]height,width = np.shape(mat)step_width = int(width / cols)step_height = int( height/ rows)for i in range(0,rows):for j in range(0,cols):x1 = j * step_widthx2=(j + 1) * step_widthy1 = i * step_heighty2=(i + 1) * step_heightblock=mat[y1:y2,x1:x2]#注意順序:mat[row,col]# fea=block_feature(block, feature_type="LBP")block_image.append(block)return block_imageif __name__=="__main__":data=np.arange(0,100)image=data.reshape((20,5))dest=split_block(image,cell=(3,3),stepsize=(1,1))3.2 讀取圖片和顯示
? ? ?Python中讀取圖片和顯示圖片的方式很多,絕大部分圖像處理模塊讀取圖片的通道是RGB格式,只有opencv-python模塊讀取的圖片的BGR格式,如果采用其他模塊顯示opencv讀取的圖片,需要轉(zhuǎn)換通道順序,方法也比較簡單,即:
import cv2 import matplotlib.pyplot as plttemp_img=cv2.imread(image_path) #默認(rèn):BGR(不是RGB),uint8,[0,255],ndarry() cv2.imshow("opencv-python",temp_img5) cv2.waitKey(0) # b, g, r = cv2.split(temp_img5)# 將BGR轉(zhuǎn)為RGB格式 # img = cv2.merge([r, g, b]) # 推薦使用cv2.COLOR_BGR2RGB->將BGR轉(zhuǎn)為RGB格式 img = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB)plt.imshow(img) # 顯示圖片 plt.axis('off') # 不顯示坐標(biāo)軸 plt.show()(1)matplotlib.image、PIL.Image、cv2圖像讀取模塊
# coding: utf-8 '''在Caffe中,彩色圖像的通道要求是BGR格式,輸入數(shù)據(jù)是float32類型,范圍[0,255],對每一層shape=(batch_size, channel_dim, height, width)。[1]caffe的訓(xùn)練/測試prototxt文件,一般在數(shù)據(jù)層設(shè)置:cale:0.00392156885937,即1/255.0,即將數(shù)據(jù)歸一化到[0,1][2]當(dāng)輸入數(shù)據(jù)為RGB圖像,float32,[0,1],則需要轉(zhuǎn)換:--transformer.set_raw_scale('data',255) # 縮放至0~255--transformer.set_channel_swap('data',(2,1,0))# 將RGB變換到BGR[3]當(dāng)輸入數(shù)據(jù)是RGB圖像,int8類型,[0,255],則輸入數(shù)據(jù)之前必須乘以*1.0轉(zhuǎn)換為float32--transformer.set_raw_scale('data',1.0) # 數(shù)據(jù)不用縮放了--transformer.set_channel_swap('data',(2,1,0))#將RGB變換到BGR--通道:img = img.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w][4]在Python所有讀取圖片的模塊,其圖像格式都是shape=[height, width, channels],比較另類的是,opencv-python讀取的圖片的BGR(caffe通道要求是BGR格式),而其他模塊是RGB格式 '''import numpy as np import matplotlib.pyplot as pltimage_path = 'test_image/C0.jpg'#C0.jpg是高h(yuǎn)=400,寬w=200 # 1.caffe import caffeimg1 = caffe.io.load_image(image_path) # 默認(rèn):RGB,float32,[0-1],ndarry,shape=[400,200,3]# 2.skimage import skimage.ioimg2 = skimage.io.imread(image_path) # 默認(rèn):RGB,uint8,[0,255],ndarry,shape=[400,200,3] # img2=img2/255.0# 3.matplotlib import matplotlib.imageimg3 = matplotlib.image.imread(image_path) # 默認(rèn):RGB,uint8,[0,255],ndarry,shape=[400,200,3]# 4.PIL from PIL import Imagetemp_img4 = Image.open(image_path) # 默認(rèn):RGB,uint8,[0,255], # temp_img4.show() #會(huì)調(diào)用系統(tǒng)自定的圖片查看器顯示圖片 img4 = np.array(temp_img4) # 轉(zhuǎn)為ndarry類型,shape=[400,200,3]# 5.opencv import cv2temp_img5 = cv2.imread(image_path) # 默認(rèn):BGR(不是RGB),uint8,[0,255],ndarry,shape=[400,200,3] # cv2.imshow("opencv-python",temp_img5) # cv2.waitKey(0) # b, g, r = cv2.split(temp_img5)# 將BGR轉(zhuǎn)為RGB格式 # img5 = cv2.merge([r, g, b]) # 推薦使用cv2.COLOR_BGR2RGB->將BGR轉(zhuǎn)為RGB格式 img5 = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB) img6 = img5.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w]# 以上ndarry類型圖像數(shù)據(jù)都可以用下面的方式直接顯示 plt.imshow(img5) # 顯示圖片 plt.axis('off') # 不顯示坐標(biāo)軸 plt.show()? ? 封裝好的圖像讀取和保存模塊:
import matplotlib.pyplot as plt import cv2def show_image(title, image):'''顯示圖片:param title: 圖像標(biāo)題:param image: 圖像的數(shù)據(jù):return:'''# plt.figure("show_image")# print(image.dtype)plt.imshow(image)plt.axis('on') # 關(guān)掉坐標(biāo)軸為 offplt.title(title) # 圖像題目plt.show()def show_image_rect(win_name, image, rect):plt.figure()plt.title(win_name)plt.imshow(image)rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')plt.gca().add_patch(rect)plt.show()def read_image(filename, resize_height, resize_width,normalization=False):'''讀取圖片數(shù)據(jù),默認(rèn)返回的是uint8,[0,255]:param filename::param resize_height::param resize_width::param normalization:是否歸一化到[0.,1.0]:return: 返回的圖片數(shù)據(jù)'''bgr_image = cv2.imread(filename)if len(bgr_image.shape)==2:#若是灰度圖則轉(zhuǎn)為三通道print("Warning:gray image",filename)bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)#將BGR轉(zhuǎn)為RGB# show_image(filename,rgb_image)# rgb_image=Image.open(filename)if resize_height>0 and resize_width>0:rgb_image=cv2.resize(rgb_image,(resize_width,resize_height))rgb_image=np.asanyarray(rgb_image)if normalization:# 不能寫成:rgb_image=rgb_image/255rgb_image=rgb_image/255.0# show_image("src resize image",image)return rgb_imagedef save_image(image_path,image):plt.imsave(image_path,image)(2)將 numpy 數(shù)組轉(zhuǎn)換為 PIL 圖片:
這里采用 matplotlib.image 讀入圖片數(shù)組,注意這里讀入的數(shù)組是 float32 型的,范圍是 0-1,而 PIL.Image 數(shù)據(jù)是 uinit8 型的,范圍是0-255,所以要進(jìn)行轉(zhuǎn)換:
import matplotlib.image as mpimg from PIL import Image lena = mpimg.imread('lena.png') # 這里讀入的數(shù)據(jù)是 float32 型的,范圍是0-1 im = Image.fromarray(np.uinit8(lena*255)) im.show()(3)python中PIL.Image和OpenCV圖像格式相互轉(zhuǎn)換
PIL.Image轉(zhuǎn)換成OpenCV格式:
import cv2 from PIL import Image import numpyimage = Image.open("plane.jpg") image.show() img = cv2.cvtColor(numpy.asarray(image),cv2.COLOR_RGB2BGR) cv2.imshow("OpenCV",img) cv2.waitKey()OpenCV轉(zhuǎn)換成PIL.Image格式:
import cv2 from PIL import Image import numpyimg = cv2.imread("plane.jpg") cv2.imshow("OpenCV",img) image = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2RGB)) image.show() cv2.waitKey()判斷圖像數(shù)據(jù)是否是OpenCV格式:
isinstance(img, np.ndarray)(4)matplotlib顯示阻塞問題
matplotlib.pyplot 中顯示圖像的兩種模式(交互和阻塞)及其在Python畫圖中的應(yīng)用_wonengguwozai的博客-CSDN博客_matplotlib 交互模式
? ? 下面這個(gè)例子講的是如何像matlab一樣同時(shí)打開多個(gè)窗口顯示圖片或線條進(jìn)行比較,同時(shí)也是在腳本中開啟交互模式后圖像一閃而過的解決辦法:
import matplotlib.pyplot as pltplt.ion() # 打開交互模式# 同時(shí)打開兩個(gè)窗口顯示圖片plt.figure()plt.imshow(image1)plt.figure()plt.imshow(image2)plt.ioff()# 顯示前關(guān)掉交互模式,避免一閃而過plt.show()(5)matplotlib繪制矩形框
import matplotlib.pyplot as pltdef show_image(win_name, image, rect):plt.figure()plt.title(win_name)plt.imshow(image)rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')plt.gca().add_patch(rect)plt.show()3.3 one-hot獨(dú)熱編碼
import os import numpy as np from sklearn import preprocessingdef gen_data_labels(label_list,ont_hot=True): ''' label_list:輸入labels ->list '''# 將labels轉(zhuǎn)為整數(shù)編碼# labels_set=list(set(label_list))# labels=[]# for label in label_list:# for k in range(len(labels_set)):# if label==labels_set[k]:# labels+=[k]# break# labels = np.asarray(labels)# 也可以用下面的方法:將labels轉(zhuǎn)為整數(shù)編碼labelEncoder = preprocessing.LabelEncoder()labels = labelEncoder.fit_transform(label_list)labels_set = labelEncoder.classes_for i in range(len(labels_set)):print("labels:{}->{}".format(labels_set[i],i))# 是否進(jìn)行獨(dú)熱編碼if ont_hot:labels_nums=len(labels_set)labels = labels.reshape(len(labels), 1)onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories=[range(labels_nums)])onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories='auto')labels = onehot_encoder.fit_transform(labels)return labels3.4 循環(huán)產(chǎn)生batch數(shù)據(jù):
TXT文本:
1.jpg 1 11 2.jpg 2 12 3.jpg 3 13 4.jpg 4 14 5.jpg 5 15 6.jpg 6 16 7.jpg 7 17 8.jpg 8 18 # -*-coding: utf-8 -*- """@Project: LSTM@File : create_batch_data.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2018-10-27 18:20:15 """ import math import random import os import glob import numpy as npdef get_list_batch(inputs, batch_size=None, shuffle=False):'''循環(huán)產(chǎn)生batch數(shù)據(jù):param inputs: list數(shù)據(jù):param batch_size: batch大小:param shuffle: 是否打亂inputs數(shù)據(jù):return: 返回一個(gè)batch數(shù)據(jù)'''if shuffle:random.shuffle(inputs)while True:batch_inouts = inputs[0:batch_size]inputs=inputs[batch_size:] + inputs[:batch_size]# 循環(huán)移位,以便產(chǎn)生下一個(gè)batchyield batch_inoutsdef get_data_batch(inputs, batch_size=None, shuffle=False):'''循環(huán)產(chǎn)生batch數(shù)據(jù):param inputs: list數(shù)據(jù):param batch_size: batch大小:param shuffle: 是否打亂inputs數(shù)據(jù):return: 返回一個(gè)batch數(shù)據(jù)'''# rows,cols=inputs.shaperows=len(inputs)indices =list(range(rows))if shuffle:random.shuffle(indices )while True:batch_indices = indices[0:batch_size]indices= indices [batch_size:] + indices[:batch_size] # 循環(huán)移位,以便產(chǎn)生下一個(gè)batchbatch_data=find_list(batch_indices,inputs)# batch_data=find_array(batch_indices,inputs)yield batch_datadef find_list(indices,data):out=[]for i in indices:out=out+[data[i]]return outdef find_array(indices,data):rows,cols=data.shapeout = np.zeros((len(indices), cols))for i,index in enumerate(indices):out[i]=data[index]return outdef load_file_list(text_dir):text_dir = os.path.join(text_dir, '*.txt')text_list = glob.glob(text_dir)return text_listdef get_next_batch(batch):return batch.__next__()def load_image_labels(test_files):'''載圖txt文件,文件中每行為一個(gè)圖片信息,且以空格隔開:圖像路徑 標(biāo)簽1 標(biāo)簽1,如:test_image/1.jpg 0 2:param test_files::return:'''images_list=[]labels_list=[]with open(test_files) as f:lines = f.readlines()for line in lines:#rstrip:用來去除結(jié)尾字符、空白符(包括\n、\r、\t、' ',即:換行、回車、制表符、空格)content=line.rstrip().split(' ')name=content[0]labels=[]for value in content[1:]:labels.append(float(value))images_list.append(name)labels_list.append(labels)return images_list,labels_listif __name__ == '__main__':filename='./training_data/train.txt'images_list, labels_list=load_image_labels(filename)# inputs = np.reshape(np.arange(8*3), (8,3))iter = 10 # 迭代10次,每次輸出5個(gè)batch = get_data_batch(images_list, batch_size=3, shuffle=False)for i in range(iter):print('**************************')# train_batch=batch.__next__()batch_images=get_next_batch(batch)print(batch_images)3.5 統(tǒng)計(jì)元素個(gè)數(shù)和種類
label_list=['星座', '星座', '財(cái)經(jīng)', '財(cái)經(jīng)', '財(cái)經(jīng)', '教育', '教育', '教育', ] set1 = set(label_list) # set1 ={'財(cái)經(jīng)', '教育', '星座'},set集合中不允許重復(fù)元素出現(xiàn) set2 = np.unique(label_list)# set2=['教育' '星座' '財(cái)經(jīng)']# 若要輸出對應(yīng)元素的個(gè)數(shù): from collections import Counter arr = [1, 2, 3, 3, 2, 1, 0, 2] result = {} for i in set(arr):result[i] = arr.count(i) print(result)# 更加簡單的方法: import pandas as pd print(pd.value_counts(label_list))3.6 python 字典(dict)按鍵和值排序
python 字典(dict)的特點(diǎn)就是無序的,按照鍵(key)來提取相應(yīng)值(value),如果我們需要字典按值排序的話,那可以用下面的方法來進(jìn)行:
1 .下面的是按照value的值從大到小的順序來排序
輸出的結(jié)果:
[('aa', 74), ('a', 31), ('bc', 5), ('asd', 4), ('c', 3), ('d', 0)]
下面我們分解下代碼
print dic.items() 得到[(鍵,值)]的列表。
然后用sorted方法,通過key這個(gè)參數(shù),指定排序是按照value,也就是第一個(gè)元素d[1的值來排序。reverse = True表示是需要翻轉(zhuǎn)的,默認(rèn)是從小到大,翻轉(zhuǎn)的話,那就是從大到小。
2 .對字典按鍵(key)排序:
3.7 自定義排序sorted
? ? 下面my_sort函數(shù),將根據(jù)labels的相同的個(gè)數(shù)進(jìn)行排序,把labels相同的個(gè)數(shù)多的樣本,排在前面
# -*-coding: utf-8 -*- """@Project: IntelligentManufacture@File : statistic_analysis.py@Author : panjq@E-mail : pan_jinquan@163.com@Date : 2019-02-15 13:47:58 """ import pandas as pd import numpy as np import functoolsdef print_cluster_info(title,labels_id, labels,columns = ['labels_id', 'labels']):index= np.arange(0, len(labels_id)) + 1content = np.array([labels_id, labels]).Tdf = pd.DataFrame(content, index=index, columns=columns) # 生成6行4列位置print('*************************************************')print("{}{}".format(title,df))def print_cluster_container(title,cluster_container,columns = ['labels_id', 'labels']):''':param cluster_container:type:list[tupe()]:param columns::return:'''labels_id, labels=zip(*cluster_container)labels_id=list(labels_id)labels=list(labels)print_cluster_info(title,labels_id, labels, columns=columns)def sort_cluster_container(cluster_container):'''自定義排序:將根據(jù)labels的相同的個(gè)數(shù)進(jìn)行排序,把labels相同的個(gè)數(shù)多的樣本,排在前面:param labels_id::param labels::return:'''# labels_id=list(cluster_container.keys())# labels=list(cluster_container.values())labels_id, labels=zip(*cluster_container)labels_id=list(labels_id)labels=list(labels)# 求每個(gè)labels的樣本個(gè)數(shù)value_counts_dictvalue_counts_dict = {}labels_set = set(labels)for i in labels_set:value_counts_dict[i] = labels.count(i)def cmp(a, b):# 降序a_key, a_value = ab_key, b_value = ba_count = value_counts_dict[a_value]b_count = value_counts_dict[b_value]if a_count > b_count: # 個(gè)數(shù)多的放在前面return -1elif (a_count == b_count) and (a_value > b_value): # 當(dāng)個(gè)數(shù)相同時(shí),則value大的放在前面return -1else:return 1out = sorted(cluster_container, key=functools.cmp_to_key(cmp))return outif __name__=='__main__':labels_id=["image0",'image1',"image2","image3","image4","image5","image6"]labels=[0.0,1.0,2.0,1.0,1.0,2.0,3.0]# labels=['L0','L1','L2','L1','L1','L2',"L3"]cluster_container=list(zip(labels_id, labels))print("cluster_container:{}".format(cluster_container))print_cluster_container("排序前:\n",cluster_container, columns=['labels_id', 'labels'])out=sort_cluster_container(cluster_container)print_cluster_container("排序后:\n",out, columns=['labels_id', 'labels'])結(jié)果:?
3.8 加載yml配置文件
? ? 假設(shè)config.yml的配置文件如下:
## Basic config
batch_size: 2
learning_rate: 0.001
epoch: 1000
## reset image size
height: 128
width: 128
利用Python可以如下加載數(shù)據(jù):
import yamlclass Dict2Obj:'''dict轉(zhuǎn)類對象'''def __init__(self, bokeyuan):self.__dict__.update(bokeyuan)def load_config_file(file):with open(file, 'r') as f:data_dict = yaml.load(f,Loader=yaml.FullLoader)data_dict = Dict2Obj(data_dict)return data_dictif __name__=="__main__":config_file='../config/config.yml'para=load_config_file(config_file)print("batch_size:{}".format(para.batch_size))print("learning_rate:{}".format(para.learning_rate))print("epoch:{}".format(para.epoch))?運(yùn)行輸出結(jié)果:
batch_size:2
learning_rate:0.001
epoch:1000
3.9 移動(dòng)、復(fù)制、重命名文件?
# -*- coding: utf-8 -*- #!/usr/bin/python #test_copyfile.pyimport os,shutil def rename(image_list):for name in image_list:cut_len=len('_cropped.jpg')newName = name[:-cut_len]+'.jpg'print(name)print(newName)os.rename(name, newName)def mymovefile(srcfile,dstfile):if not os.path.isfile(srcfile):print "%s not exist!"%(srcfile)else:fpath,fname=os.path.split(dstfile) #分離文件名和路徑if not os.path.exists(fpath):os.makedirs(fpath) #創(chuàng)建路徑shutil.move(srcfile,dstfile) #移動(dòng)文件print "move %s -> %s"%( srcfile,dstfile)def mycopyfile(srcfile,dstfile):if not os.path.isfile(srcfile):print "%s not exist!"%(srcfile)else:fpath,fname=os.path.split(dstfile) #分離文件名和路徑if not os.path.exists(fpath):os.makedirs(fpath) #創(chuàng)建路徑shutil.copyfile(srcfile,dstfile) #復(fù)制文件print "copy %s -> %s"%( srcfile,dstfile)srcfile='/Users/xxx/git/project1/test.sh' dstfile='/Users/xxx/tmp/tmp/1/test.sh'mymovefile(srcfile,dstfile)3.10 產(chǎn)生batch_size的數(shù)據(jù)
def get_batch(image_list, batch_size):nums = len(image_list)# batch_num = math.ceil(sample_num / batch_size)batch_num = (nums + batch_size - 1) // batch_sizefor i in range(batch_num):start = i * batch_sizeend = min((i + 1) * batch_size, nums)batch_image = image_list[start:end]print("batch_image:{}".format(batch_image))if __name__ == "__main__":nums = 20batch_size = 25image_list = []for i in range(nums): image_list.append(str(i + 1) + ".jpg")get_batch(image_list, batch_size)總結(jié)
以上是生活随笔為你收集整理的Python常用的模块的使用技巧的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Tensorflow生成自己的图片数据集
- 下一篇: 使用自己的数据集训练GoogLenet