核酸结果统计难?130行代码实现基于ocr的核酸截图识别存储Excel(复现代码核查核酸报告)
生活随笔
收集整理的這篇文章主要介紹了
核酸结果统计难?130行代码实现基于ocr的核酸截图识别存储Excel(复现代码核查核酸报告)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
來源:
核酸結果統計難?復旦博士生的操作火了https://m.gmw.cn/baijia/2022-04/08/35644611.html
1 程序背景
學校要收核酸截圖,匯總太麻煩了,故將OCR整合到數據中。
1.1 命名規范
20200250202003曹文舉4.12陰性.jpg
1.2?EasyOCR
easyOCRhttps://github.com/JaidedAI/EasyOCR
pip install easyocr2 程序
import timeit import xlsxwriter import cv2 as cv import numpy as np import os import easyocr import re os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'def load_file_name(file_name):# file_name = "201902811221邱江4.12陰性.jpg"file_result_date_no = re.findall(r"\d+", file_name)no = file_result_date_no[0]date = '2022-0' + file_result_date_no[1] + '-' + file_result_date_no[2]file_result_name_name_check = re.findall('[\u4e00-\u9fa5]', file_name)# print(file_result_name_name_check)check = file_result_name_name_check[len(file_result_name_name_check)-2] +file_result_name_name_check[len(file_result_name_name_check)-1] # 檢測結果up = len(file_result_name_name_check)-2name = ''for i in range(up):name = name + file_result_name_name_check[i]# print("姓名:",name)# print("日期:",date)# print("學號:",no)# print("結果:",check)return name,date,no,checkdef load_image(path,name,date,check):# coding=utf-8# 當前目錄讀取一張圖片,轉化為灰色img = cv.imdecode(np.fromfile(path, dtype=np.uint8), 0)reader = easyocr.Reader(['ch_sim', 'en'])result = reader.readtext(img)# 拼接單個圖片的識別結果result_str = ''for i in result :result_str = result_str+i[1]# print("包含無關文字:", result_str)temp = re.sub('[a-zA-Z]', '', result_str)# print("去掉無關文字:",temp)a = temp.find("若上述")result = temp[0:a]# print(result)# print("消除無用信息:",result)result = result.replace('\n', '') # 去掉換行符result = result.replace(' ', '') # 去掉空格result = result.replace(':', '') # 去掉-result = result.replace(':', '') # 去掉-result = result.replace("核酸檢測結果為","")result = result.replace("核酸檢測結果","")# print(result)# 姓名校正if result.find(name) == -1:# print("識別失敗")name_ocr = "識別失敗"check_ocr = "字體原因識別失敗"else:# print("名字校正成功:",name)# name_ocr = namename_ocr = "姓名正確"# 陰性與陽性if result.find(check) == -1:# print("識別失敗")check_ocr = "字體原因識別失敗/"else:# print("識別結果與文件核酸結果相同")check_ocr = "核酸結果一致"result = result.replace('陰性', '')result = result.replace('陽性', '')# 檢測機構if result.find("檢測機構") == -1:# print("識別失敗")organization_ocr = "字體原因識別失敗"check_ocr = "字體原因識別失敗/"else:index = result.find("檢測機構")# print("檢測機構:",result[index+4:])organization_ocr = result[index+4:]result = result[0:index]# 核酸檢測時間if result.find("檢測時間") == -1:date_ocr = "字體原因識別失敗"check_ocr = "字體原因識別失敗/"# print("識別失敗")else:index = result.find("檢測時間")# print("檢測時間:",result[index + 4:])date_ocr = result[index + 4:]if date_ocr == date:date_ocr = "檢測時間正確"result = result[0:index - 1]return name_ocr,check_ocr,organization_ocr,date_ocr# 讀取函數,用來讀取文件夾中的所有函數,輸入參數是文件名 def read_directory(directory_name):for filename in os.listdir(directory_name):for filename_1 in os.listdir(directory_name+"/"+filename):# print(filename_1) # 僅僅是為了測試name,date,no,check = load_file_name(filename_1)path = directory_name + "/" + filename+"/"+filename_1name_ocr,check_ocr,organization_ocr,date_ocr = load_image(path,name,date,check)print("提交數據:",name,date,check,no)print("OCR數據:",name_ocr,date_ocr,check_ocr,organization_ocr)result_list = [name,date,check,no,name_ocr,date_ocr,check_ocr,organization_ocr]result_list_total.append(result_list)if __name__ == '__main__':start = timeit.default_timer()# 保存excelresult_list_total=[["姓名","日期","核算結果","學號","姓名校對","時間校對","核酸狀態校對","核酸醫院"]]read_directory("./imge")#這里傳入所要讀取文件夾的絕對路徑,加引號(引號不能省略!)# xlsxwriter只可以新建一個excel,不可以讀取和更新# 創建一個workbook 和增加一個worksheet,默認為sheet1...,也可以直接為sheet命名,例如下邊的testworkbook = xlsxwriter.Workbook('./data.xlsx')# 添加 sheetworksheet = workbook.add_worksheet("elite") # 下方Sheet名worksheet.set_column(0,len(result_list_total),18) #全部列寬為18# 測試數據result_list_total = tuple(result_list_total)print(result_list_total)# 從首行、首列開始.row = 0col = 0# 通過迭代寫入數據.for name,date,check,no,name_ocr,date_ocr,check_ocr,organization_ocr in (result_list_total):worksheet.write(row, col, name)worksheet.write(row, col + 1, date)worksheet.write(row, col + 2, check)worksheet.write(row, col + 3, no)worksheet.write(row, col + 4, name_ocr)worksheet.write(row, col + 5, date_ocr)worksheet.write(row, col + 6, check_ocr)worksheet.write(row, col + 7, organization_ocr)row += 1# 只有此函數才可以生成excelworkbook.close()#中間寫上代碼塊end = timeit.default_timer()print('一共耗時 %s 秒'%(end-start))total = len(result_list_total)print("處理完畢,共處理 %s 條學生信息"%total)3 結果
總結
以上是生活随笔為你收集整理的核酸结果统计难?130行代码实现基于ocr的核酸截图识别存储Excel(复现代码核查核酸报告)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 西北大学计算机考试,西北大学计算机技术
- 下一篇: php写抢票脚本,火车票抢票python