openvino系列 16. OpenVINO 手写字体识别 OCR
openvino系列 16. OpenVINO 手寫字體識別 OCR
此案例中,我們對手寫中文(簡體)和日語進行OCR識別。該模型一次只能處理一行符號。
- handwritten-japanese-recognition-0001
- handwritten-simplified-chinese-recognition-0001
環(huán)境描述:
- 本案例運行環(huán)境:Win10,10代i5筆記本
- IDE:VSCode
- openvino版本:2022.1
- 代碼鏈接,11-OCR
文章目錄
- openvino系列 16. OpenVINO 手寫字體識別 OCR
- 1 關(guān)于手寫體識別模型使用
- 1.1 `handwritten-japanese-recognition`
- 1.2 `handwritten-simplified-chinese`
- 2 手寫體識別模型代碼
- 2.1 選擇手寫字體模型,并加載
- 2.2 加載圖像,并調(diào)整其尺寸以符合模型輸入尺寸
- 2.3 準(zhǔn)備Charlist
- 2.4 模型推理輸出結(jié)果
1 關(guān)于手寫體識別模型使用
這個案例中,我們對手寫中文(簡體)和日語進行OCR識別。該模型一次只能處理一行符號。
本筆記本使用的模型是handwritten-japanese-recognition和handwritten-simplified-chinese。將模型輸出解碼為可讀文本 kondate_nakayosi 和 scut_ept 字符列表被使用。這兩種模型都可以在 Open Model Zoo 上找到。
1.1 handwritten-japanese-recognition
這里我們不對模型的具體算法做解釋,只技術(shù)其輸入輸出。
輸入:[1,1,96,2000],對應(yīng) [B,C,H,W],即:B - batch size,C - number of channels,H - image height,W - image width。
注意:源圖片在保持寬高比的情況下,調(diào)整到特定高度(如96),調(diào)整后的寬度不大于2000,然后右下角用邊緣值填充到2000。
輸出:[186,1,4442],對應(yīng) [W,B,L],即:W - output sequence length,B - batch size,L - confidence distribution across the supported symbols in Kondate and Nakayosi。
1.2 handwritten-simplified-chinese
輸入:[1,1,96,2000],對應(yīng) [B,C,H,W],即:B - batch size,C - number of channels,H - image height,W - image width。
注意:源圖片在保持寬高比的情況下,調(diào)整到特定高度(如96),調(diào)整后的寬度不大于2000,然后右下角用邊緣值填充到2000。
輸出:[186,1,4059],對應(yīng) [W,B,L],即:W - output sequence length,B - batch size,L - confidence distribution across the supported symbols in SCUT-EPT。
2 手寫體識別模型代碼
2.1 選擇手寫字體模型,并加載
from collections import namedtuple from itertools import groupby from pathlib import Pathimport cv2 import matplotlib.pyplot as plt import numpy as np from openvino.runtime import Core# Directories where data will be placed model_folder = "model" data_folder = "data" charlist_folder = f"{data_folder}/charlists" # Precision used by model precision = "FP16"# To group files, you have to define the collection. In this case, you can use `namedtuple`. Language = namedtuple(typename="Language", field_names=["model_name", "charlist_name", "demo_image_name"] ) chinese_files = Language(model_name="handwritten-simplified-chinese-recognition-0001",charlist_name="chinese_charlist.txt",demo_image_name="handwritten_chinese_test.jpg", ) japanese_files = Language(model_name="handwritten-japanese-recognition-0001",charlist_name="japanese_charlist.txt",demo_image_name="handwritten_japanese_test.png", )print("1 - Choose a language model to download, either Chinese or Japanese.") # Select language by using either language='chinese' or language='japanese' language = "chinese" languages = {"chinese": chinese_files, "japanese": japanese_files} selected_language = languages.get(language)# Download the model path_to_model_weights = Path(f'{model_folder}/intel/{selected_language.model_name}/{precision}/{selected_language.model_name}.bin') if not path_to_model_weights.is_file():download_command = f'omz_downloader --name {selected_language.model_name} --output_dir {model_folder} --precision {precision}'print(download_command)! $download_command else:print("model has been downloaded.")print("2 - Load the model, and print its input and output") ie = Core() path_to_model = path_to_model_weights.with_suffix(".xml") model = ie.read_model(model=path_to_model) # Select Device Name compiled_model = ie.compile_model(model=model, device_name="CPU") recognition_output_layer = compiled_model.output(0) recognition_input_layer = compiled_model.input(0) print("- model input shape: {}".format(recognition_input_layer)) print("- model output shape: {}".format(recognition_output_layer))Terminal打印:
1 - Choose a language model to download, either Chinese or Japanese. model has been downloaded. 2 - Load the model, and print its input and output - model input shape: <ConstOutput: names[actual_input] shape{1,1,96,2000} type: f32> - model output shape: <ConstOutput: names[output] shape{186,1,4059} type: f32>2.2 加載圖像,并調(diào)整其尺寸以符合模型輸入尺寸
下一步是加載圖像。該模型需要單通道圖像作為輸入,這就是我們以灰度讀取圖像的原因。加載輸入圖像后,下一步是獲取用于計算比例的信息。 這描述了所需輸入層高度與當(dāng)前圖像高度之間的比率。在下面的單元格中,圖像將被調(diào)整大小和填充以保持字母成比例并符合輸入形狀。
print("3 - load image to test.") # Read file name of demo file based on the selected model file_name = selected_language.demo_image_name # Text detection models expects an image in grayscale format # IMPORTANT!!! This model allows to read only one line at time # Read image image = cv2.imread(filename=f"{data_folder}/{file_name}", flags=cv2.IMREAD_GRAYSCALE) # Fetch shape image_height, _ = image.shape print("- Original image shape: {}".format(image.shape)) print("- Image scale needs to be reshaped into: {}".format(recognition_input_layer.shape)) # B,C,H,W = batch size, number of channels, height, width _, _, H, W = recognition_input_layer.shape print("- We need to first resize image then add paddings in order to align with model input size.") # Calculate scale ratio between input shape height and image height to resize image scale_ratio = H / image_height # Resize image to expected input sizes resized_image = cv2.resize(image, None, fx=scale_ratio, fy=scale_ratio, interpolation=cv2.INTER_AREA ) # Pad image to match input size, without changing aspect ratio resized_image = np.pad(resized_image, ((0, 0), (0, W - resized_image.shape[1])), mode="edge" ) # Reshape to network the input shape input_image = resized_image[None, None, :, :]## Visualise Input Image plt.figure() plt.axis("off") plt.imshow(image, cmap="gray", vmin=0, vmax=255); plt.figure(figsize=(20, 1)) plt.axis("off") plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255);Terminal 打印:
3 - load image to test. - Original image shape: (115, 1250) - Image scale needs to be reshaped into: {1, 1, 96, 2000} - We need to first resize image then add paddings in order to align with model input size.輸入的原圖如下:
尺寸調(diào)整后如下:
2.3 準(zhǔn)備Charlist
現(xiàn)在模型已加載,圖像已準(zhǔn)備就緒。 下一步,我們下載的字符列表。在我們使用它之前,必須在字符列表的開頭添加一個空白符號。
print("4 - Prepare Charlist, which is a ground truth list which we could match with our inference results.") # Get dictionary to encode output, based on model documentation used_charlist = selected_language.charlist_name # With both models, there should be blank symbol added at index 0 of each charlist blank_char = "~" with open(f"{charlist_folder}/{used_charlist}", "r", encoding="utf-8") as charlist:letters = blank_char + "".join(line.strip() for line in charlist)2.4 模型推理輸出結(jié)果
現(xiàn)在運行推理。 compiled_model() 采用與模型輸入順序相同的輸入列表。 然后我們可以從輸出張量中獲取輸出。
模型格式的輸出為 W x B x L,其中:
- W - 輸出序列長度
- B - 批量大小
- L - Kondate 和 Nakayosi 中支持的符號的置信度分布。
要獲得更易于閱讀的格式,請選擇概率最高的符號。 由于 CTC 解碼的限制,我們將刪除并發(fā)符號,然后刪除空白。
最后一步是從 charlist 中的相應(yīng)索引中獲取符號。
# Run inference on the model predictions = compiled_model([input_image])[recognition_output_layer] print("5 - Model Inference. Prediction results shape: {}".format(predictions.shape)) # Remove batch dimension predictions = np.squeeze(predictions) print("- We first squeeze the inference result into shape: {}".format(predictions.shape)) # Run argmax to pick the symbols with the highest probability predictions_indexes = np.argmax(predictions, axis=1) # Use groupby to remove concurrent letters, as required by CTC greedy decoding output_text_indexes = list(groupby(predictions_indexes)) # Remove grouper objects output_text_indexes, _ = np.transpose(output_text_indexes, (1, 0)) print("- We find out the highest probability character, and remove concurrent letters and grouper objects into shape: {}".format(output_text_indexes.shape)) # Remove blank symbols output_text_indexes = output_text_indexes[output_text_indexes != 0] print("- We remove blank symbolsa into shape: {}".format(output_text_indexes.shape)) # Assign letters to indexes from output array output_text = [letters[letter_index] for letter_index in output_text_indexes] print("- Final results: {}".format(output_text)) # Print Output plt.figure(figsize=(20, 1)) plt.axis("off") plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255)Terminal 打印:
5 - Model Inference. Prediction results shape: (186, 1, 4059) - We first squeeze the inference result into shape: (186, 4059) - We find out the highest probability character, and remove concurrent letters and grouper objects into shape: (32,) - We remove blank symbolsa into shape: (20,) - Final results: ['人', '有', '悲', '歡', '離', '合', ',', '月', '有', '陰', '睛', '圓', '缺', ',', '此', '事', '古', '難', '全', '。']總結(jié)
以上是生活随笔為你收集整理的openvino系列 16. OpenVINO 手写字体识别 OCR的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: u-blox gps 串口驱动安装恢复解
- 下一篇: [vue] 组件中写name选项有什么作