當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Paddle打比赛-古籍文档图像识别与分析算法比赛

發(fā)布時間：2023/12/14 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 Paddle打比赛-古籍文档图像识别与分析算法比赛小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

一、粵港澳大灣區(qū)（黃埔）國際算法算例大賽-古籍文檔圖像識別與分析算法比賽簡介

1.背景及意義

中國幾千年輝煌的華夏文明，留下了海量的古籍文獻資料，這些文字記錄承載著豐富的歷史信息和文化傳承。為響應古籍文化遺產(chǎn)保護、古籍數(shù)字化與推廣應用的國家戰(zhàn)略需求，傳承中華優(yōu)秀傳統(tǒng)文化，挖掘利用古籍文獻中蘊含的豐富知識，古籍透徹數(shù)字化工作勢在必行。
由于古籍文檔圖像的版式復雜、不同朝代的刻字書寫風格差異大、古籍文字圖像存在缺失、污漬、筆墨污染、模糊、印章噪聲干擾、生僻字異體字繁多等技術(shù)挑戰(zhàn)，古籍文檔圖像的識別及理解依然是一個極具挑戰(zhàn)、遠未解決的技術(shù)難題。
為解決我國海量古籍數(shù)字化難題，本競賽旨在征集先進的人工智能算法，解決高精度古籍文字檢測、文本行識別、端到端古籍識別技術(shù)難題，推動古籍OCR技術(shù)進步，為古籍數(shù)字化保護、整理和利用提供人工智能支撐方法，特此舉辦本次比賽。

圖 1古籍文檔示例

2.賽題描述

任務(wù)：古籍文檔圖像分析與識別

輸入： 篇幅級別的古籍文檔圖片

輸出： 利用文檔圖像物理及邏輯版面結(jié)構(gòu)分析、文字檢測、文字識別、文字閱讀順序理解等技術(shù)輸出結(jié)構(gòu)化的文本行坐標以及識別內(nèi)容，其中各個文本的檢測結(jié)果與識別內(nèi)容按閱讀順序進行排列輸出。模型僅輸出正文的檢測識別結(jié)果。忽略如版心、卷號等非結(jié)構(gòu)化的內(nèi)容。

碼表說明 ：

本次比賽提供碼表（下載鏈接見鏈接: https://pan.baidu.com/s/16wUeSZ4JKD6f1Pj9ZhlKww 提取碼: i53n ），其中包含了初賽訓練集、驗證集**(初賽A榜)及測試集（初賽B榜）中出現(xiàn)的字符類別。（注意：由于比賽設(shè)置了zero shot識別場景，訓練集中出現(xiàn)的字符類別沒有完全覆蓋碼表中的類別，目前公布的碼表已完整覆蓋初賽訓練集及初賽A榜測試集的所有字符類別，初賽B榜碼表可能會略有微調(diào)，后續(xù)將擇機公布，請留意比賽官網(wǎng)通知。）**

初賽B榜碼表公布：

下載鏈接見鏈接：https://pan.baidu.com/s/1gaNlKHk6lh5FxC2QP4UuDg
提取碼：umzz (公布日期：2022年9月8日)

3.數(shù)據(jù)集說明

**初賽數(shù)據(jù)集：**訓練集、驗證集與測試集各包括1000幅古籍文檔圖像（共3000張圖像），數(shù)據(jù)選自四庫全書、歷代古籍善本、乾隆大藏經(jīng)等多種古籍數(shù)據(jù)。任務(wù)僅考慮古籍文檔的正文內(nèi)容，忽略如版心、卷號等邊框外的內(nèi)容。
**決賽數(shù)據(jù)集：**由于采取【**擂臺賽】**的形式，除了主辦方提供的原始初賽數(shù)據(jù)集以及決賽數(shù)據(jù)之外，決賽參賽隊伍可申請成為擂主并提供各自的數(shù)據(jù)集供其他進入決賽的隊伍進行訓練和測試，提供的訓練集不少于1000張，測試集不多于1000張，提供的數(shù)據(jù)集標注格式應與主辦方提供的數(shù)據(jù)格式相同。

數(shù)據(jù)集標注格式：

每幅圖像文本行文字及內(nèi)容根據(jù)文本行閱讀順序進行標注，包含在一個單獨的json文件。標注格式如下所示：

{ “image_name_1”, [{“points”: x1, y1, x2, y2, …, xn, yn, “transcription”: text}, {“points”: x1, y1, x2, y2, …, xn, yn, “transcription”: text},…], “image_name_2”, [{“points”: x1, y1, x2, y2, …, xn, yn, “transcription”: text}, {“points”: x1, y1, x2, y2, …, xn, yn, “transcription”: text},…], …… }

x1, y1, x2, y2, …, xn, yn代表文本框的各個點。
對于四邊形文本，n=4；數(shù)據(jù)集中存在少量不規(guī)則文本，對于這類標注，n=16（兩條長邊各8個點）。
Text代表每個文本行的內(nèi)容，模糊無法識別的字均標注為#。
其中文本行的檢測與識別標簽按照正確的閱讀順序給出。端到端識別內(nèi)容按照閱讀順序進行標注，僅考慮文檔的正文內(nèi)容，忽略如版心、卷號等邊框外的內(nèi)容。
閱讀順序的編排如圖2所示。

圖2 端到端古籍文檔圖像結(jié)構(gòu)化識別理解中的閱讀順序標注可視化

4.提交結(jié)果

【初賽A榜】：

**提交格式：**測試圖片同名的CSV文件的壓縮包
提交內(nèi)容：每張圖片對應一個CSV文件，CSV文件中包含文本的檢測框坐標以及對應的識別結(jié)果，并且這些文本都要按照預測得到的閱讀順序進行排列。

Csv文件內(nèi)部格式如下：

x1, y1, x2, y2, x3, y3,…, xn, yn, transcription_1

x1, y1, x2, y2, x3, y3,…, xn, yn, transcription_2

…

x1, y1, x2, y2, x3, y3,…, xn, yn, transcription_n

(其中xn, yn代表坐標，這些坐標按順時針進行排列，transcription_n代表文本的識別內(nèi)容)

提交樣式示例：

鏈接：https://pan.baidu.com/s/1h9smrGBwfJ78IP3WUlkEYQ
提取碼：suzi
提交次數(shù)： 每天1次
開始提交時間： 9月15日

二、數(shù)據(jù)集處理

1.解壓數(shù)據(jù)集

!unzip -qoa data/data167941/dataset.zip

2.數(shù)據(jù)查看

!head -n30 dataset/train/label.json {"image_0.jpg": [{"points": [1286,59,1326,59,1331,851,1290,851],"transcription": "\u53ef\ud878\udcce\u4e45\u4e4e\u820e\u5229\u5f17\u563f\u7136\u4e0d\u8345\u25cf\u4e94\u8eab\u5b50\u81ea\u601d\u89e7\u8131\u7121\u4e45\u8fd1\u6545\u9ed9\u5929\u66f0\u5982\u4f55\ud859\udcbf\ud85b\udf94\u5927\u667a"},{"points": [1249,57,1286,59,1298,851,1251,851],"transcription": "\u800c\u563f\u25cb\u516d\u5929\u554f\ud86e\udc26\u4ee5\u8087\u66f0\u4e94\u767e\u82d0\u5b50\u4ec1\u8005\u4f55\u667a\u6075\u82d0\u4e00\u563f\u7136\u4f55\u8036\u8345\u66f0\u89e7\u8131\u8005\u65e0\ud86e\udc26\u8a00\u8aaa"},{"points": [

3.數(shù)據(jù)格式轉(zhuǎn)換

對PaddleOCR檢測任務(wù)來說，數(shù)據(jù)集格式如下：

" 圖像文件名 json.dumps編碼的圖像標注信息" ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]

故需要對數(shù)據(jù)格式進行轉(zhuǎn)換。

import json# 讀取源label.json f = open('dataset/train/label.json', 'r') x = f.read() y = json.loads(x) f.close()# 查看長度1000 print(len(y)) # 查看數(shù)據(jù)格式 print(y["image_0.jpg"]) # 查看該文件下信息 print(len(y["image_0.jpg"])) 1000 [{'points': [1286, 59, 1326, 59, 1331, 851, 1290, 851], 'transcription': '可𮃎久乎舎利弗嘿然不荅●五身子自思觧脫無久近故黙天曰如何𦒿𦾔大智'}, {'points': [1249, 57, 1286, 59, 1298, 851, 1251, 851], 'transcription': '而嘿○六天問𫠦以肇曰五百苐子仁者何智恵苐一嘿然何耶荅曰觧脫者無𫠦言說'}, {'points': [1213, 60, 1252, 60, 1252, 784, 1213, 784], 'transcription': '故吾扵是不知𫠦云○七身子已離三𭻃惑得心觧脫永絕言𢿘故言不知𫠦云'}, {'points': [1173, 62, 1214, 62, 1224, 845, 1183, 845], 'transcription': '天曰言説文字皆觧脫𢪷●八天辨不思議觧脫即文字也文三𬼘𥘉摽文字即觧脫'}, {'points': [1135, 61, 1179, 61, 1184, 848, 1140, 848], 'transcription': '肇曰舎利弗以言文為失故黙然無言謂順真觧未?悟黙齊𦤺觸𢪱無礙故'}, {'points': [1099, 59, 1143, 59, 1149, 848, 1106, 848], 'transcription': '天說?觧以暁其意𫠦以者何觧脫者不內(nèi)不外不在兩間文字亦不內(nèi)不外不在兩'}, {'points': [1069, 61, 1111, 61, 1110, 854, 1065, 852], 'transcription': '間是故舎利弗無離文字説觧脫也𬼘二觧𥼶𫠦以肇曰法之𫠦在極扵三?𠁅?求文字'}, {'points': [1022, 61, 1066, 61, 1066, 851, 1022, 851], 'transcription': '觧脫俱不可淂如何欲離文字別説觧脫乎𫠦以者何一𭃄諸法是觧脫相○三明'}, {'points': [984, 60, 1025, 60, 1021, 850, 980, 850], 'transcription': '諸法?觧肇曰萬法雖殊無非觧𢪷豈文字之獨異也舎利弗言不復以離媱怒'}, {'points': [946, 60, 985, 60, 978, 850, 938, 850], 'transcription': '𪪧為觧脫乎○𬼘下二明𣂾不𣂾別文二𥘉問也肇曰二乘結(jié)𥁞為觧脫聞上?觧乖'}, {'points': [905, 59, 951, 59, 942, 850, 895, 850], 'transcription': '其夲趣故𦤺斯問天日仏為増上𢢔人説離媱怒癡為觧脫耳𠰥無上𢢔者佛説'}, {'points': [860, 63, 909, 63, 902, 849, 852, 849], 'transcription': '媱怒癡性即是觧脫二荅也増上𢢔者未淂謂淂也身子𢴃小乘𫠦證非増上𢢔'}, {'points': [822, 62, 865, 62, 862, 850, 819, 850], 'transcription': '自謂共佛同?觧脫床名増上𢢔也既未悟縛解平?故為説離縛為觧𠰥大士'}, {'points': [779, 63, 822, 63, 822, 848, 779, 848], 'transcription': '非増上𢢔者為説即縛性脫性脫入不二門也舎利弗言善???天女汝何𫠦淂以何'}, {'points': [735, 62, 782, 62, 781, 846, 734, 846], 'transcription': '為證辨乃如是○三明證不證別文二𬼘𥘉也肇曰善其𫠦説非已𫠦及故問淂何道證'}, {'points': [693, 60, 736, 60, 745, 848, 703, 848], 'transcription': '阿果辨乃如是乎天曰我無淂無證故辨如是○荅文二𬼘𥘉正荅二乘捨縛求脫'}, {'points': [650, 62, 696, 62, 709, 852, 662, 852], 'transcription': '故有淂證大士悟縛脫平?非縛非脫故無淂無證既智窮不二之門故辨無礙'}, {'points': [619, 61, 658, 61, 664, 850, 626, 850], 'transcription': '也𫠦以者何𠰥有淂有證者則扵仏法為増上𢢔○二反厈肇曰𠰥見己有淂必見他'}, {'points': [576, 62, 617, 62, 631, 850, 591, 850], 'transcription': '不淂𬼘扵佛平?之法猶為増上𢢔人何?𦤺無礙之辨乎舎利弗問天汝扵三'}, {'points': [539, 63, 579, 63, 588, 845, 548, 845], 'transcription': '乘為何𢖽求𬼘下三約教明乘無乘別也小乘有法執(zhí)故有差別乘大乘不二平'}, {'points': [497, 63, 539, 63, 550, 849, 508, 849], 'transcription': '?故無乘之乘文二𬼘𥘉問也肇曰上云無淂無證未知何乘故𣸪問也天曰以聲'}, {'points': [459, 63, 502, 63, 509, 853, 467, 853], 'transcription': '聞法化衆(zhòng)生故我為聲聞以因𦄘法化衆(zhòng)生故我為𮝻支仏以大悲法化衆(zhòng)生故我'}, {'points': [422, 65, 462, 65, 466, 851, 426, 851], 'transcription': '為大乘○二荅文二一惣約化𦄘荅二別約𫝆𦄘荅𬼘𥘉也肇曰大乘之道無乘之乘'}, {'points': [379, 65, 423, 65, 430, 827, 386, 827], 'transcription': '爲彼而乘吾何乘也生曰隨彼為之我無?也又觧法花方便説三意同𬼘也'}, {'points': [342, 65, 382, 65, 396, 851, 356, 851], 'transcription': '舎利弗如人入瞻蔔林唯嗅瞻蔔不嗅餘香如是𠰥入𬼘室但聞仏?徳之香不樂'}, {'points': [300, 67, 343, 67, 360, 849, 318, 849], 'transcription': '聞聲聞𮝻支仏?徳香也○𬼘二約𫝆𦄘文四一明𫝆𦄘唯一二𫠦化樂大三室無小法四'}, {'points': [263, 64, 302, 64, 323, 849, 284, 849], 'transcription': '約室顕法𬼘𥘉也肇曰元乘不乘乃為大乘故以香林為喻明浄名之室不離二'}, {'points': [226, 64, 268, 64, 286, 849, 243, 849], 'transcription': '乘之香止乘止𬼘室者豈他嗅?舎利弗有其四𥼶梵四天王諸天龍神鬼√?入'}, {'points': [186, 63, 229, 63, 248, 855, 205, 855], 'transcription': '𬼘室者聞斯上人講說正法𣅜樂佛?徳之香𤼲心而出二明𫠦化皆樂大也舎利'}, {'points': [158, 65, 193, 63, 191, 204, 159, 207], 'transcription': '弗吾止𬼘室十'}, {'points': [183, 198, 200, 197, 200, 222, 183, 222], 'transcription': '有'}, {'points': [161, 207, 191, 205, 204, 856, 167, 859], 'transcription': '二年𥘉不聞?wù)h聲聞𮝻支仏法但聞菩薩大慈大悲不可思議諸'}, {'points': [121, 62, 169, 62, 172, 855, 125, 855], 'transcription': '佛之法三明深肇曰大乘之法𣅜不可思議上問止室久近欲生淪端故答'}, {'points': [80, 63, 122, 63, 131, 853, 90, 853], 'transcription': '以觧脫𫝆言實𭘾以明𫠦聞之不𮦀也生曰諸天鬼神蹔入室尚無不𤼲大意而出'}, {'points': [44, 62, 84, 62, 100, 849, 60, 849], 'transcription': '?況我久聞妙法乎然則不?不為大悲?為大矣舎利弗𬼘室?，F(xiàn)八未曽有'}, {'points': [2, 60, 45, 60, 62, 848, 19, 848], 'transcription': '難淂之法𬼘四明未曽有室不說二乘之法也文三標𥼶結(jié)𬼘𥘉標也何謂為八'}] 36 # 格式轉(zhuǎn)換 image_info_lists = {} ff = open("dataset/train/label.txt", 'w') for i in range(1000):# print(f"image_{i}.jpg")old_info = y[f"image_{i}.jpg"]new_info = []for item in old_info:image_info = {}image_info["transcription"] = item['transcription']points = item["points"]if len(points)==8:image_info["points"] = [[points[0], points[1]], [points[2], points[3]], [points[4], points[5]],[points[6], points[7]]]elif len(points)==32:image_info["points"] = [[points[0], points[1]], [points[2], points[3]], [points[4], points[5]],[points[6], points[7]], [points[8], points[9]],[points[10], points[11]],[points[12], points[13]], [points[14], points[15]],[points[16], points[17]],[points[18], points[19]], [points[20], points[21]],[points[22], points[23]],[points[24], points[25]], [points[26], points[27]],[points[28], points[29]],[points[30], points[31]]]elif len(points)==34:image_info["points"] = [[points[0], points[1]], [points[2], points[3]], [points[4], points[5]],[points[6], points[7]], [points[8], points[9]],[points[10], points[11]],[points[12], points[13]], [points[14], points[15]],[points[16], points[17]],[points[18], points[19]], [points[20], points[21]],[points[22], points[23]],[points[24], points[25]], [points[26], points[27]],[points[28], points[29]],[points[30], points[31]],[points[32], points[33]]] else:continue new_info.append(image_info)image_info_lists[f"image_{i}.jpg"] = new_infoff.write(f"image_{i}.jpg" + "\t" + json.dumps(new_info) + "\n") ff.close() # 查看數(shù)據(jù) print(image_info_lists["image_0.jpg"][0]) !head -n1 dataset/train/label.txt

4.分割數(shù)據(jù)集

前800為訓練集
后200為測試集

%cd ~ !wc -l dataset/train/label.txt !head -800 dataset/train/label.txt >dataset/train/train.txt !tail -200 dataset/train/label.txt >dataset/train/eval.txt

三、PaddleOCR環(huán)境準備

1.PaddleOCR下載

# !git clone https://gitee.com/paddlepaddle/PaddleOCR.git --depth=1

2.PaddleOCR安裝

%cd ~/PaddleOCR/ !python -m pip install -q -U pip --user !pip install -q -r requirements.txt /home/aistudio/PaddleOCR # !mkdir pretrain_models/ # %cd pretrain_models # !wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar # !tar -xvf ch_PP-OCRv3_det_distill_train.tar

四、模型訓練

!pip list|grep opencv opencv-contrib-python 4.6.0.66 opencv-python 4.2.0.32

1.opencv降級

opencv版本不對，需要降級，不然訓練報以下錯誤。

Traceback (most recent call last):File "tools/train.py", line 30, in <module>from ppocr.data import build_dataloaderFile "/home/aistudio/PaddleOCR/ppocr/data/__init__.py", line 35, in <module>from ppocr.data.imaug import transform, create_operatorsFile "/home/aistudio/PaddleOCR/ppocr/data/imaug/__init__.py", line 19, in <module>from .iaa_augment import IaaAugmentFile "/home/aistudio/PaddleOCR/ppocr/data/imaug/iaa_augment.py", line 24, in <module>import imgaugFile "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/imgaug/__init__.py", line 7, in <module>from imgaug.imgaug import * # pylint: disable=redefined-builtinFile "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/imgaug/imgaug.py", line 18, in <module>import cv2File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/cv2/__init__.py", line 181, in <module>bootstrap()File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/cv2/__init__.py", line 175, in bootstrapif __load_extra_py_code_for_module("cv2", submodule, DEBUG):File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/cv2/__init__.py", line 28, in __load_extra_py_code_for_modulepy_module = importlib.import_module(module_name)File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/importlib/__init__.py", line 127, in import_modulereturn _bootstrap._gcd_import(name[level:], package, level)File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/cv2/mat_wrapper/__init__.py", line 33, in <module>cv._registerMatType(Mat) AttributeError: module 'cv2' has no attribute '_registerMatType' !pip uninstall opencv-python -y !pip uninstall opencv-contrib-python -y !pip install opencv-python==4.2.0.32 Found existing installation: opencv-python 4.2.0.32 Uninstalling opencv-python-4.2.0.32:Successfully uninstalled opencv-python-4.2.0.32 Found existing installation: opencv-contrib-python 4.6.0.66 Uninstalling opencv-contrib-python-4.6.0.66:Successfully uninstalled opencv-contrib-python-4.6.0.66 Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting opencv-python==4.2.0.32Using cached https://pypi.tuna.tsinghua.edu.cn/packages/34/a3/403dbaef909fee9f9f6a8eaff51d44085a14e5bb1a1ff7257117d744986a/opencv_python-4.2.0.32-cp37-cp37m-manylinux1_x86_64.whl (28.2 MB) Requirement already satisfied: numpy>=1.14.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from opencv-python==4.2.0.32) (1.19.5) Installing collected packages: opencv-python Successfully installed opencv-python-4.2.0.32

2.訓練配置

ch_PP-OCRv3_det_cml.yml

Global:character_dict_path: ../mb.txt #自定義字典debug: falseuse_gpu: trueepoch_num: 500log_smooth_window: 20print_batch_step: 10save_model_dir: ./output/ch_PP-OCR_v3_det/save_epoch_step: 100eval_batch_step:- 0- 400cal_metric_during_train: falsepretrained_model: nullcheckpoints: nullsave_inference_dir: nulluse_visualdl: falseinfer_img: doc/imgs_en/img_10.jpgsave_res_path: ./checkpoints/det_db/predicts_db.txtdistributed: trueArchitecture:name: DistillationModelalgorithm: Distillationmodel_type: detModels:Student:pretrained:model_type: detalgorithm: DBTransform: nullBackbone:name: MobileNetV3scale: 0.5model_name: largedisable_se: trueNeck:name: RSEFPNout_channels: 96shortcut: TrueHead:name: DBHeadk: 50Student2:pretrained: model_type: detalgorithm: DBTransform: nullBackbone:name: MobileNetV3scale: 0.5model_name: largedisable_se: trueNeck:name: RSEFPNout_channels: 96shortcut: TrueHead:name: DBHeadk: 50Teacher:pretrained: freeze_params: truereturn_all_feats: falsemodel_type: detalgorithm: DBBackbone:name: ResNet_vdin_channels: 3layers: 50Neck:name: LKPANout_channels: 256Head:name: DBHeadkernel_list: [7,2,2]k: 50Loss:name: CombinedLossloss_config_list:- DistillationDilaDBLoss:weight: 1.0model_name_pairs:- ["Student", "Teacher"]- ["Student2", "Teacher"]key: mapsbalance_loss: truemain_loss_type: DiceLossalpha: 5beta: 10ohem_ratio: 3- DistillationDMLLoss:model_name_pairs:- ["Student", "Student2"]maps_name: "thrink_maps"weight: 1.0model_name_pairs: ["Student", "Student2"]key: maps- DistillationDBLoss:weight: 1.0model_name_list: ["Student", "Student2"]balance_loss: truemain_loss_type: DiceLossalpha: 5beta: 10ohem_ratio: 3Optimizer:name: Adambeta1: 0.9beta2: 0.999lr:name: Cosinelearning_rate: 0.001warmup_epoch: 2regularizer:name: L2factor: 5.0e-05PostProcess:name: DistillationDBPostProcessmodel_name: ["Student"]key: head_outthresh: 0.3box_thresh: 0.6max_candidates: 1000unclip_ratio: 1.5Metric:name: DistillationMetricbase_metric_name: DetMetricmain_indicator: hmeankey: "Student"# 數(shù)據(jù)集 Train:dataset:name: SimpleDataSetdata_dir: /home/aistudio/dataset/train/imagelabel_file_list:- /home/aistudio/dataset/train/label.txtratio_list: [1.0]transforms:- DecodeImage:img_mode: BGRchannel_first: false- DetLabelEncode: null- CopyPaste:- IaaAugment:augmenter_args:- type: Fliplrargs:p: 0.5- type: Affineargs:rotate:- -10- 10- type: Resizeargs:size:- 0.5- 3- EastRandomCropData:size:- 960- 960max_tries: 50keep_ratio: true- MakeBorderMap:shrink_ratio: 0.4thresh_min: 0.3thresh_max: 0.7- MakeShrinkMap:shrink_ratio: 0.4min_text_size: 8- NormalizeImage:scale: 1./255.mean:- 0.485- 0.456- 0.406std:- 0.229- 0.224- 0.225order: hwc- ToCHWImage: null- KeepKeys:keep_keys:- image- threshold_map- threshold_mask- shrink_map- shrink_maskloader:shuffle: truedrop_last: falsebatch_size_per_card: 12num_workers: 4# 數(shù)據(jù)集 Eval:dataset:name: SimpleDataSetdata_dir: /home/aistudio/dataset/train/imagelabel_file_list:- /home/aistudio/dataset/train/label.txttransforms:- DecodeImage: # load imageimg_mode: BGRchannel_first: False- DetLabelEncode: # Class handling label- DetResizeForTest:- NormalizeImage:scale: 1./255.mean: [0.485, 0.456, 0.406]std: [0.229, 0.224, 0.225]order: 'hwc'- ToCHWImage:- KeepKeys:keep_keys: ['image', 'shape', 'polys', 'ignore_tags']loader:shuffle: Falsedrop_last: Falsebatch_size_per_card: 1 # must be 1num_workers: 2 # 拷貝配置到對應目錄 !cp ~/ch_PP-OCRv3_det_cml.yml ~/PaddleOCR/configs/det/ch_PP-OCRv3/ %export CUDA_VISIBLE_DEVICES='0,1,2,3' # !python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Optimizer.base_lr=0.0001 %cd ~/PaddleOCR/ !python3 -m paddle.distributed.launch --ips="localhost" --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Optimizer.base_lr=0.0001

五、識別數(shù)據(jù)集準備

把det的數(shù)據(jù)集轉(zhuǎn)換為rec數(shù)據(jù)集，進行模型訓練

# ppocr/utils/gen_label.py # convert the official gt to rec_gt_label.txt %cd ~/PaddleOCR !python ppocr/utils/gen_label.py --mode="rec" --input_path="../dataset/train/train.txt" --output_label="../dataset/train/train_rec_gt_label.txt" !python ppocr/utils/gen_label.py --mode="rec" --input_path="../dataset/train/eval.txt" --output_label="../dataset/train/eval_rec_gt_label.txt"

六、識別模型訓練

1.預訓練模型下載

%cd ~/PaddleOCR/pretrain_models !https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar !tar -xvf ch_PP-OCRv3_rec_train.tar

2.配置訓練參數(shù)

Global:debug: falseuse_gpu: trueepoch_num: 800log_smooth_window: 20print_batch_step: 10save_model_dir: ./output/rec_ppocr_v3_distillationsave_epoch_step: 3eval_batch_step: [0, 2000]cal_metric_during_train: true# 預訓練模型pretrained_model: pretrain_models/ch_PP-OCRv3_rec_train//best_accuracy.pdparamscheckpoints:save_inference_dir:use_visualdl: falseinfer_img: doc/imgs_words/ch/word_1.jpg# 修改碼表character_dict_path: ../mb.txtmax_text_length: &max_text_length 25infer_mode: falseuse_space_char: truedistributed: truesave_res_path: ./output/rec/predicts_ppocrv3_distillation.txtOptimizer:name: Adambeta1: 0.9beta2: 0.999lr:name: Piecewisedecay_epochs : [700, 800]values : [0.0005, 0.00005]warmup_epoch: 5regularizer:name: L2factor: 3.0e-05Architecture:model_type: &model_type "rec"name: DistillationModelalgorithm: DistillationModels:Teacher:pretrained:freeze_params: falsereturn_all_feats: truemodel_type: *model_typealgorithm: SVTRTransform:Backbone:name: MobileNetV1Enhancescale: 0.5last_conv_stride: [1, 2]last_pool_type: avgHead:name: MultiHeadhead_list:- CTCHead:Neck:name: svtrdims: 64depth: 2hidden_dims: 120use_guide: TrueHead:fc_decay: 0.00001- SARHead:enc_dim: 512max_text_length: *max_text_lengthStudent:pretrained:freeze_params: falsereturn_all_feats: truemodel_type: *model_typealgorithm: SVTRTransform:Backbone:name: MobileNetV1Enhancescale: 0.5last_conv_stride: [1, 2]last_pool_type: avgHead:name: MultiHeadhead_list:- CTCHead:Neck:name: svtrdims: 64depth: 2hidden_dims: 120use_guide: TrueHead:fc_decay: 0.00001- SARHead:enc_dim: 512max_text_length: *max_text_length Loss:name: CombinedLossloss_config_list:- DistillationDMLLoss:weight: 1.0act: "softmax"use_log: truemodel_name_pairs:- ["Student", "Teacher"]key: head_outmulti_head: Truedis_head: ctcname: dml_ctc- DistillationDMLLoss:weight: 0.5act: "softmax"use_log: truemodel_name_pairs:- ["Student", "Teacher"]key: head_outmulti_head: Truedis_head: sarname: dml_sar- DistillationDistanceLoss:weight: 1.0mode: "l2"model_name_pairs:- ["Student", "Teacher"]key: backbone_out- DistillationCTCLoss:weight: 1.0model_name_list: ["Student", "Teacher"]key: head_outmulti_head: True- DistillationSARLoss:weight: 1.0model_name_list: ["Student", "Teacher"]key: head_outmulti_head: TruePostProcess:name: DistillationCTCLabelDecodemodel_name: ["Student", "Teacher"]key: head_outmulti_head: TrueMetric:name: DistillationMetricbase_metric_name: RecMetricmain_indicator: acckey: "Student"ignore_space: False# 修改數(shù)據(jù)及 Train:dataset:name: SimpleDataSetdata_dir: /home/aistudio/dataset/train/imageext_op_transform_idx: 1label_file_list:- /home/aistudio/dataset/train/train_rec_gt_label.txttransforms:- DecodeImage:img_mode: BGRchannel_first: false- RecConAug:prob: 0.5ext_data_num: 2image_shape: [48, 320, 3]- RecAug:- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: truebatch_size_per_card: 128drop_last: truenum_workers: 4# 修改數(shù)據(jù)及 Eval:dataset:name: SimpleDataSetdata_dir: /home/aistudio/dataset/train/imageext_op_transform_idx: 1label_file_list:- /home/aistudio/dataset/train/eval_rec_gt_label.txttransforms:- DecodeImage:img_mode: BGRchannel_first: false- MultiLabelEncode:- RecResizeImg:image_shape: [3, 48, 320]- KeepKeys:keep_keys:- image- label_ctc- label_sar- length- valid_ratioloader:shuffle: falsedrop_last: falsebatch_size_per_card: 128num_workers: 4 # 拷貝配置好的文件到指定位置 %cd ~ !cp ~/ch_PP-OCRv3_rec_distillation.yml ~/PaddleOCR/configs/rec/PP-OCRv3/

3.模型訓練

%cd ~/PaddleOCR/#多卡訓練，通過--gpus參數(shù)指定卡號 !python -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

七、聯(lián)推理串

1.模型導出

# 導出檢測模型 !python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./my_exps/det/best_accuracy Global.save_inference_dir=./inference/det # 導出識別模型 !python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=./my_exps/rec/best_accuracy Global.save_inference_dir=./inference/rec

2.聯(lián)推理串

! python tools/infer/predict_system.py \--det_model_dir=inference/det \--rec_model_dir=inference/rec \--image_dir="/home/aistudio/dataset/train/image/image_0.jpg" \--rec_image_shape=3,48,320# show img plt.figure(figsize=(10, 8)) img = plt.imread("./inference_results/test.jpg") c_image_shape=3,48,320# show img plt.figure(figsize=(10, 8)) img = plt.imread("./inference_results/test.jpg") plt.imshow(img)

如上所述，進行預測，提交結(jié)果即可。

建議：用4卡GPU跑，會快一些，不然得好幾天。

此文章為搬運
原項目鏈接

總結(jié)

以上是生活随笔為你收集整理的Paddle打比赛-古籍文档图像识别与分析算法比赛的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。