手写体数字图像识别图像_手写识别调整笔画大小而不是图像
手寫體數(shù)字圖像識(shí)別圖像
A straightforward algorithm to dealing with handwritten symbol recognition problems in Machine Learning
一種處理機(jī)器學(xué)習(xí)中手寫符號(hào)識(shí)別問(wèn)題的簡(jiǎn)單算法
The recognition of handwritten symbols is one of the most popular and representative examples when dealing with many machine-learning problems. Even more so, when understanding deep neural-network concepts. For this reason, in addition to its obvious practical applications, it is a widely researched topic.
噸他手寫識(shí)別符號(hào)的是有許多機(jī)器學(xué)習(xí)問(wèn)題時(shí)最流行,最有代表性的例子之一。 在理解深入的神經(jīng)網(wǎng)絡(luò)概念時(shí)更是如此。 因此,除了其明顯的實(shí)際應(yīng)用之外,它還是一個(gè)被廣泛研究的主題。
Recently, I developed a simple web-based calculator that evaluates handwritten numerical expressions (handcalc.io) using a deep neural network kernel, without the use of any existing external machine learning or image processing libraries. My initial approach was to train the model using existing databases found online, but this proved to be ineffective, considering there were significant discrepancies between these and the symbols written from within the web-app.
最近,我開發(fā)了一個(gè)簡(jiǎn)單的基于Web的計(jì)算器,該計(jì)算器使用深度神經(jīng)網(wǎng)絡(luò)內(nèi)核來(lái)評(píng)估手寫數(shù)字表達(dá)式( handcalc.io ),而無(wú)需使用任何現(xiàn)有的外部機(jī)器學(xué)習(xí)或圖像處理庫(kù)。 我最初的方法是使用在線找到的現(xiàn)有數(shù)據(jù)庫(kù)來(lái)訓(xùn)練模型,但事實(shí)證明這是無(wú)效的,因?yàn)檫@些與從Web應(yīng)用程序中編寫的符號(hào)之間存在顯著差異。
In this article I dive in more detail into the motivation and process of implementing a so-called stroke-scaling algorithm. This was a crucial piece to create an effective training dataset for the neural network, as well as a compatible pre-processing step for the web-app’s user inputs to work with the prediction model.
在本文中,我將更詳細(xì)地介紹實(shí)現(xiàn)所謂的筆劃縮放算法的動(dòng)機(jī)和過(guò)程。 這是創(chuàng)建有效的神經(jīng)網(wǎng)絡(luò)訓(xùn)練數(shù)據(jù)集的關(guān)鍵部分,也是Web應(yīng)用程序用戶輸入與預(yù)測(cè)模型一起使用的兼容預(yù)處理步驟。
在驗(yàn)證/測(cè)試數(shù)據(jù)集之外識(shí)別符號(hào) (Recognising symbols outside the validation/testing datasets)
The process of training the neural network model initially seemed to be a straight-forward task, considering the availability of existing handwritten-symbol databases (e.g. MNIST Database). Unfortunately, although my first trained models achieved accuracies above 95% on their validation sets, they proved to be virtually useless when dealing with symbols coming from outside the dataset.
考慮到現(xiàn)有手寫符號(hào)數(shù)據(jù)庫(kù)(例如MNIST數(shù)據(jù)庫(kù))的可用性,訓(xùn)練神經(jīng)網(wǎng)絡(luò)模型的過(guò)程最初似乎很簡(jiǎn)單。 不幸的是,盡管我的第一個(gè)訓(xùn)練有素的模型在其驗(yàn)證集上獲得了95%以上的準(zhǔn)確度,但事實(shí)證明,當(dāng)處理來(lái)自數(shù)據(jù)集外部的符號(hào)時(shí),它們幾乎沒有用。
The reason is visually evident: The symbol-data belonging to existing datasets usually originates from existing handwritten texts, where, in general, the stroke is rather thick compared to the symbols proportions (Fig. 1). Each datapoint is a rather non-sparse matrix containing a swarm of agglomerated pixels of different intensities.
原因在視覺上是顯而易見的:屬于現(xiàn)有數(shù)據(jù)集的符號(hào)數(shù)據(jù)通常源自現(xiàn)有的手寫文本,通常,與符號(hào)比例相比,筆畫的筆觸較粗(圖1)。 每個(gè)數(shù)據(jù)點(diǎn)都是一個(gè)相當(dāng)稀疏的矩陣,其中包含大量不同強(qiáng)度的聚集像素。
Fig. 1: Sample symbol ‘5’ from the MNIST dataset圖1:來(lái)自MNIST數(shù)據(jù)集的樣本符號(hào)“ 5”The data-points coming from the web-application, however, were sparse, noiseless, and contained thin strokes. The pixels are filled or unfilled, and thus embody a “purer” version of the symbols meant by the writer (Fig. 2).
但是,來(lái)自Web應(yīng)用程序的數(shù)據(jù)點(diǎn)稀疏,無(wú)噪音且筆觸細(xì)小。 像素被填充或未填充,因此體現(xiàn)了書寫者所指符號(hào)的“較純”版本(圖2)。
Fig. 2: Sample symbol ‘5’ from the drawing board within the web-application圖2:Web應(yīng)用程序中來(lái)自繪圖板的示例符號(hào)“ 5”Both versions are clearly incompatible, considering their nature is inherently different, and the initial model was therefore doomed to be unsatisfactory for the intended application.
考慮到它們的本質(zhì)是本質(zhì)上的不同,這兩個(gè)版本顯然是不兼容的,因此,最初的模型注定不能滿足預(yù)期的應(yīng)用程序的要求。
Facing the status quo, virtually the only viable solution was to create a new dataset from scratch, compatible with the type of data-points the users would generate when using the application (the dataset can be found here). For this, I created a simple image-processing library that allowed me to extract the data from photographed pieces of paper with thin handwritten symbols (the library, together with the custom neural network library can be found here).
面對(duì)現(xiàn)狀,實(shí)際上唯一可行的解??決方案是從頭開始創(chuàng)建一個(gè)新數(shù)據(jù)集,該數(shù)據(jù)集與用戶在使用該應(yīng)用程序時(shí)將生成的數(shù)據(jù)點(diǎn)類型兼容(可以在此處找到數(shù)據(jù)集)。 為此,我創(chuàng)建了一個(gè)簡(jiǎn)單的圖像處理庫(kù),該庫(kù)使我可以從帶有薄手寫符號(hào)的照相紙中提取數(shù)據(jù)(該庫(kù)以及自定義神經(jīng)網(wǎng)絡(luò)庫(kù)都可以在此處找到)。
Although the library contains many functionalities, it was the stroke-scaling algorithm that really ensured consistency across all data-points, whether it was during the creation of the model’s datasets, or during the image pre-processing within the web-app.
盡管該庫(kù)包含許多功能,但筆劃縮放算法確實(shí)確保了所有數(shù)據(jù)點(diǎn)的一致性,無(wú)論是在創(chuàng)建模型數(shù)據(jù)集期間還是在Web應(yīng)用程序內(nèi)進(jìn)行圖像預(yù)處理期間。
行程縮放算法 (The stroke-scaling algorithm)
The main idea behind this algorithm is to scale images without altering the information about the stroke itself. When one draws a symbol, our mind only thinks of the pure shape of the stroke to draw. The same symbol written with pens of different thicknesses should “contain” the same information, since the same thing was meant. This algorithm seeks to scale symbol-images preserving the stroke’s information.
該算法背后的主要思想是在不更改筆觸信息的情況下縮放圖像。 當(dāng)一個(gè)人繪制一個(gè)符號(hào)時(shí),我們的大腦只會(huì)想到要繪制的筆畫的純凈形狀。 用相同粗細(xì)的筆書寫的相同符號(hào)應(yīng)該“包含”相同的信息,因?yàn)檫@意味著相同的意思。 該算法試圖縮放符號(hào)圖像,以保留筆劃的信息。
In the case of scaling up the original image (Fig.3), every line of one pixel width may only change its length, but not thickness.
在放大原始圖像的情況下(圖3),每一個(gè)像素寬度的線只能改變其長(zhǎng)度,而不能改變其厚度。
Fig. 3: Unscaled symbol (14x14 pixels)圖3:未縮放的符號(hào)(14x14像素)The scaled up image, may look thinner, but in reality contains the same information, as if one would have drawn the symbol with the same hand movement, just on a canvas of higher resolution (Fig. 4):
放大后的圖像可能看起來(lái)更細(xì),但實(shí)際上包含相同的信息,就好像是在相同分辨率的畫布上以相同的手勢(shì)繪制了該符號(hào)一樣(圖4):
Fig. 4: Scaled symbol (28x28 pixels)圖4:縮放符號(hào)(28x28像素)During the scaling process, the following steps take place: first, every filled pixel of the original image is mapped once to a blank canvas of the desired size, and secondly, an interpolated line between every touching pixel is created, except when two pixels touch each other diagonally on a corner. As a consequence, lines may only change length, but not width. Since each pixel is always mapped, in the case of downscaling, the thin lines do not disappear, and remain at least of one pixel width.
在縮放過(guò)程中,將執(zhí)行以下步驟:首先,將原始圖像的每個(gè)填充像素映射到所需大小的空白畫布一次,其次,在每個(gè)觸摸像素之間創(chuàng)建一條內(nèi)插線,除非兩個(gè)像素觸摸在對(duì)角線上彼此對(duì)角。 結(jié)果,線只能改變長(zhǎng)度,而不能改變寬度。 由于每個(gè)像素始終被映射,因此在縮小比例的情況下,細(xì)線不會(huì)消失,并且至少保持一個(gè)像素寬度。
算法實(shí)現(xiàn) (Algorithm implementation)
To keep the code pragmatic, the functions presented hereunder are methods within more complex class implementations, and thus, not standalone codebases. This allows for simplicity, avoiding too many arguments passed onto the functions, and keeps the reader’s focus on key areas.
為了使代碼實(shí)用,下面介紹的功能是更復(fù)雜的類實(shí)現(xiàn)中的方法,因此不是獨(dú)立的代碼庫(kù)。 這樣可以簡(jiǎn)化操作,避免將過(guò)多的參數(shù)傳遞給函數(shù),并使讀者將注意力集中在關(guān)鍵區(qū)域。
Originally, I implemented the code in Python for the preprocessing of data-points during the creation of the datasets, and in JavaScript for the pre-processing of user-inputs within the web-application. However, I also included an implementation in C++, offering a variant that prioritises performance.
最初,我在Python中實(shí)現(xiàn)了代碼,用于在創(chuàng)建數(shù)據(jù)集期間對(duì)數(shù)據(jù)點(diǎn)進(jìn)行預(yù)處理,而在JavaScript中實(shí)現(xiàn)了用于對(duì)Web應(yīng)用程序中的用戶輸入進(jìn)行預(yù)處理的代碼。 但是,我還包括C ++的實(shí)現(xiàn),提供了優(yōu)先考慮性能的變體。
All three implementations display different levels of abstraction: the JavaScript version deals with images using custom objects like grids, fields, and coordinates, offering a more intuitive approach; the Python implementation deals with the images as a 2D Numpy arrays; and the C++ implementation uses basic 1D arrays, which offers better time and memory performance, at the cost of less readable source-code.
這三種實(shí)現(xiàn)都顯示了不同的抽象級(jí)別:JavaScript版本使用諸如網(wǎng)格,字段和坐標(biāo)之類的自定義對(duì)象處理圖像,從而提供了更直觀的方法; Python實(shí)現(xiàn)將圖像作為2D Numpy數(shù)組處理; C ++的實(shí)現(xiàn)使用基本的一維數(shù)組,該數(shù)組可提供更好的時(shí)間和內(nèi)存性能,但源代碼的可讀性較低。
It is important mentioning that the algorithm uses a method to recognise whether pixels touching diagonally are part of a corner (and should therefore skip the interpolation step). To understand the details behind this implementation, please visit the full version of each algorithm (linked at the title and caption of each version).
值得一提的是,該算法使用一種方法來(lái)識(shí)別對(duì)角線接觸的像素是否為角的一部分(因此應(yīng)跳過(guò)插值步驟)。 要了解此實(shí)現(xiàn)背后的細(xì)節(jié),請(qǐng)?jiān)L問(wèn)每種算法的完整版本(鏈接在每個(gè)版本的標(biāo)題和標(biāo)題上)。
JavaScript (full implementation found here):As mentioned above, this implementation deals with a rather sophisticated abstraction of the image canvas. It uses the grid object, which essentially contains its dimensions as attributes, and a 1D array of fields - another custom object - each of which owns an immutable 2D coordinate, and a boolean holding the information of wether the field is filled or not.
JavaScript (在此處找到完整的實(shí)現(xiàn)) : 如上所述,此實(shí)現(xiàn)處理圖像畫布的相當(dāng)復(fù)雜的抽象。 它使用網(wǎng)格對(duì)象,該對(duì)象本質(zhì)上包含其維作為屬性,以及一維字段數(shù)組-另一個(gè)自定義對(duì)象-每個(gè)對(duì)象都具有不可變的2D坐標(biāo),以及一個(gè)布爾值,用于保存是否填充該字段的信息。
Apart from the specificalities of how the objects in question are created, accessed, or mutated, the algorithm remains inherently unchanged towards the Python and C++ implementation, and the naming conventions provide an intuitive understanding.
除了特定對(duì)象如何創(chuàng)建,訪問(wèn)或變異的特殊性外,該算法對(duì)于Python和C ++實(shí)現(xiàn)本質(zhì)上保持不變,并且命名約定提供了直觀的理解。
scale(xFields, yFields) {// Scales a grid to fit the given position. Only the stroke is scaled, meaning that// each pixel will only be mapped once to the destination. Its position will be scaled but// it's thickness wont. Between non-corner filled pixels there will be a interpolated line// created to keep the stroke continuous.// Destination canvas (Object containing an array of fields with a coordinate, and a boolean)let scaledGrid = new Grid(xFields, yFields);// Dimensions of original grid/imageconst shapeX = this.grid.xFields;const shapeY = this.grid.yFields;// Scaled dimensions without extra width, in case there are filled pixels in the edge (they don't change width)const xFieldsAugmented = shapeX !== 1 ? Math.ceil(((xFields - 1) * shapeX) / (shapeX - 1)) : 0; const yFieldsAugmented = shapeY !== 1 ? Math.ceil(((yFields - 1) * shapeY) / (shapeY - 1)) : 0;// Scaling ratiosconst scalingX = xFieldsAugmented / shapeX;const scalingY = yFieldsAugmented / shapeY;// Array with relative positions of sorrounding fields in relation to the current fieldconst positions = [[-1, 1],[0, 1],[1, 1],[1, 0],[-1, -1],];// Iteration through every field in the original grid// If any of the sorrounding fields in the original grid are field, linear interpolation// between the current pixel and the surrounding one is performed, filling fields inbetween for (let y = 0; y < shapeY; y++) {for (let x = 0; x < shapeX; x++) {// Filled fields in the original grid get mapped into the destination (scaled) gridif (this.grid.getField(x, y).isFilled) {const xScaled = Math.floor(x * scalingX);const yScaled = Math.floor(y * scalingY);scaledGrid.getField(xScaled, yScaled).isFilled = this.grid.getField(x, y).isFilled;// Every position is checked for (let position of positions) {// Calculates the adjacent pixel, checking it's not out of boundsconst xNext = 0 <= x + position[1] && x + position[1] < shapeX ? x + position[1] : x;const yNext = 0 <= y + position[0] && y + position[0] < shapeY ? y + position[0] : y;// Interpolation happens only if the next pixel is filled AND they're not in a corner (to avoid lines between // diagonally touching pixels in a corner)if (this.grid.getField(xNext, yNext).isFilled && !this._isCorner(x, y, position)) {const xScaledNext = Math.floor(xNext * scalingX);const yScaledNext = Math.floor(yNext * scalingY);//Linear interpolation between mapping of current pixel and adjacent pixel in the destination gridconst tMax = Math.max(Math.abs(xScaledNext - xScaled),Math.abs(yScaledNext - yScaled));for (let t = 1; t < tMax; t++) {const xP = Math.floor(xScaled + (t / tMax) * (xScaledNext - xScaled));const yP = Math.floor(yScaled + (t / tMax) * (yScaledNext - yScaled));scaledGrid.getField(xP, yP).isFilled = this.grid.getField(x,y).isFilled;}}}}}}return this.grid.replaceFields(scaledGrid); }Python (full implementation found here): For performance and clear semantics, the natural way to approach this is by using Numpy arrays. These allow for 2D-indexing, and are optimised to perform faster and more efficiently with memory than lists, considering they manage space similarly to classic C++ arrays, which use adjacent memory slots.
Python (在此處找到完整的實(shí)現(xiàn)) : 為了獲得良好的性能和清晰的語(yǔ)義,解決此問(wèn)題的自然方法是使用Numpy數(shù)組。 這些允許2D索引,并且考慮到它們與使用相鄰內(nèi)存插槽的經(jīng)典C ++數(shù)組類似地管理空間,因此對(duì)內(nèi)存進(jìn)行優(yōu)化以使其比列表更快,更高效。
This implementation is somewhat less abstract than the JavaScript implementation, and belongs to a more complex implementation within a Python subclass, but it should be intelligible enough, despite being out of its full context.
此實(shí)現(xiàn)比JavaScript實(shí)現(xiàn)抽象的要少一些,并且屬于Python子類中的一個(gè)更復(fù)雜的實(shí)現(xiàn),但是盡管它沒有完整的上下文,但它也應(yīng)該足夠清晰。
def scale(self,xFields,yFields):'''Scales an image to the specified dimensions, without keeping ratio.If scaleStroke=False, all pixels are stretched/compressed, otherwiseonly filled pixels are taken in consideration and spaces in between areinterpolated'''# Destination canvas (numpy array containing 0's or 1's)scaledData = np.zeros((yFields,xFields))# Dimensions of original grid/imageshapeX = self.imageData.data.shape[1]shapeY = self.imageData.data.shape[0]# Scaled dimensions without extra width, in case there are filled pixels in the edge (they don't change width)xFieldsAugmented = math.ceil((xFields - 1)*shapeX/(shapeX-1)) if shapeX != 1 else 0yFieldsAugmented = math.ceil((yFields - 1)*shapeY/(shapeY-1)) if shapeY != 1 else 0# Scaling ratiosscalingX = xFieldsAugmented / self.imageData.data.shape[1]scalingY = yFieldsAugmented / self.imageData.data.shape[0]# Array with relative positions of sorrounding fields in relation to the current fieldpositions = [[-1,1],[0,1],[1,1],[1,0],[-1,-1]]# Iteration through every field in the original grid# If any of the sorrounding fields in the original grid are field, linear interpolation# between the current pixel and the surrounding one is performed, filling fields inbetweenfor y in range(0,self.imageData.data.shape[0]):for x in range(0,self.imageData.data.shape[1]):# Filled fields in the original grid get mapped into the destination (scaled) gridif(self.imageData.data[y][x]):xScaled = math.floor(x * scalingX)yScaled = math.floor(y * scalingY)scaledData[yScaled][xScaled] = self.imageData.data[y][x]# Every position is checked for position in positions:# Calculates the adjacent pixel, checking it's not out of boundsxNext = x+position[1] if 0 <= x+position[1] < self.imageData.data.shape[1] else xyNext = y+position[0] if 0 <= y+position[0] < self.imageData.data.shape[0] else y# Interpolation happens only if the next pixel is filled AND they're not in a corner (to avoid lines between # diagonally touching pixels in a corner)if(self.imageData.data[yNext][xNext] and (not self._isCorner(x,y,position))):xScaledNext = math.floor(xNext * scalingX)yScaledNext = math.floor(yNext * scalingY)# Linear interpolation between mapping of current pixel and adjacent pixel in the destination gridtMax = max(abs(xScaledNext-xScaled),abs(yScaledNext-yScaled))for t in range(1,tMax):xP = math.floor(xScaled+(t/tMax)*(xScaledNext-xScaled))yP = math.floor(yScaled+(t/tMax)*(yScaledNext-yScaled))scaledData[yP][xP] = self.imageData.data[y][x]self.imageData.data = scaledDatareturn self.imageDataC++ (full implementation found here):This is the most austere variant of all three implementations, and for that reason, the one that puts the most emphasis on performance, taking advantage of the low-level control that the language offers.
C ++(可在此處找到完整的實(shí)現(xiàn)): 這是所有三種實(shí)現(xiàn)中最嚴(yán)酷的變體,因此,這是最強(qiáng)調(diào)性能的一種,它利用了該語(yǔ)言提供的低級(jí)控制。
Recalling the Python implementation, using 2D arrays for the images may sound like the most intuitive approach for this third version. However, since we’re allocating the information dynamically, using nested arrays would not be declared in adjacent memory slots: the outer array would then hold pointers to arrays saved, in general, throughout the whole memory. This would significantly increase fetching times and render the algorithm rather inefficient. For this reason, both original and scaled images are stored in 1D arrays of fixed size. Accessing the array might be slightly more cumbersome, but worth the effort considering the indisputable performance improvement.
回顧Python的實(shí)現(xiàn),對(duì)于圖像使用2D數(shù)組聽起來(lái)可能是此第三版最直觀的方法。 但是,由于我們是動(dòng)態(tài)分配信息,因此不會(huì)在相鄰的內(nèi)存插槽中聲明使用嵌套數(shù)組:外部數(shù)組通常會(huì)保存指向整個(gè)內(nèi)存中保存的數(shù)組的指針。 這將顯著增加獲取時(shí)間,并使算法效率降低。 因此,原始圖像和縮放后的圖像都存儲(chǔ)在固定大小的一維數(shù)組中。 訪問(wèn)陣列可能會(huì)稍微麻煩一些,但是考慮到無(wú)可爭(zhēng)議的性能改進(jìn),值得付出努力。
void scaleStroke(const unsigned int xFieldsScaled, const unsigned int yFieldsScaled) {// Scales a grid to fit the given position. Only the stroke is scaled, meaning that// each pixel will only be mapped once to the destination. Its position will be scaled but// it's thickness wont. Between non-corner filled pixels there will be a interpolated line// created to keep the stroke continuous. // Destination canvas as a one-dimensional array (to optimize memory allocation and speed)bool* scaledData = new bool[xFieldsScaled * yFieldsScaled];// Initialization of destination canvasfor (unsigned int y = 0; y < yFieldsScaled; ++y) {for (unsigned int x = 0; x < xFieldsScaled; ++x) {scaledData[y * xFieldsScaled + x] = false;}}// Scaled dimensions without extra width, in case there are filled pixels in the edge (they don't change width)const int xFieldsAugmented = xFields != 1 ? ceil(1.0 * (xFieldsScaled - 1) * xFields / (xFields - 1)) : 0;const int yFieldsAugmented = yFields != 1 ? ceil(1.0 * (yFieldsScaled - 1) * yFields / (yFields - 1)) : 0;// Scaling ratiosconst double scalingX = 1.0 * xFieldsAugmented / xFields;const double scalingY = 1.0 * yFieldsAugmented / yFields;// Array with relative positions of sorrounding fields in relation to the current fieldconst int positions[5][2] = {{-1, 1}, {0, 1}, {1, 1}, {1, 0}, {-1, -1}};// Iteration through every field in the original grid// If any of the sorrounding fields in the original grid are field, linear interpolation// between the current pixel and the surrounding one is performed, filling fields inbetween for (unsigned int y = 0; y < yFields; ++y) {for (unsigned int x = 0; x < xFields; ++x) {// Filled fields in the original grid get mapped into the destination (scaled) gridif (gridArray[y * xFields + x]) {const int xScaled = x * scalingX;const int yScaled = y * scalingY;scaledData[yScaled * xFieldsScaled + xScaled] = true;// Every position is checked for (unsigned int i = 0; i < 5; ++i) {// Calculates the adjacent pixel, checking it's not out of boundsconst int xNext =0 <= x + positions[i][1] && x + positions[i][1] < xFields ? x + positions[i][1] : x;const int yNext =0 <= y + positions[i][0] && y + positions[i][0] < yFields ? y + positions[i][0] : y;// Interpolation happens only if the next pixel is filled AND they're not in a corner (to avoid lines between // diagonally touching pixels in a corner)if (gridArray[yNext * xFields + xNext] &&!(isCorner(x, y, positions[i]))) {const int xScaledNext = xNext * scalingX;const int yScaledNext = yNext * scalingY;//Linear interpolation between mapping of current pixel and adjacent pixel in the destination gridconst int tMax = max(abs(xScaledNext - xScaled), abs(yScaledNext - yScaled));for (unsigned int t = 1; t < tMax; ++t) {const int xP = xScaled + (1.0 * t / tMax) * (xScaledNext - xScaled);const int yP = yScaled + (1.0 * t / tMax) * (yScaledNext - yScaled);scaledData[yP * xFieldsScaled + xP] = true;}}}}}}// Dynamic memory deallocationdelete[] gridArray;// Update of objects attributesgridArray = scaledData;xFields = xFieldsScaled;yFields = yFieldsScaled;}先決條件和警告(Pre-requisites and caveats)
This algorithm was ideated for the catering to specific requirements, and therefore works best when certain characteristics are fulfilled. To avoid pitfalls, and ensure a successful employment, the following assumptions about the images to be processed should be fulfilled :
該算法旨在滿足特定要求,因此在滿足某些特征時(shí)效果最佳。 為了避免出現(xiàn)陷阱,并確保成功使用,應(yīng)滿足以下有關(guān)要處理圖像的假設(shè):
Pixels must be booleans (filled/unfilled):
像素必須為布爾值(已填充/未填充):
The algorithm is not made to work with pixels of different intensities. In the case of dealing with generic images, achieving the desired, compatible format will require desaturation and normalization to make sure pixels are either black or white. Additionally, applying a denoising algorithm preliminarily could prove beneficial.
該算法不適用于不同強(qiáng)度的像素。 在處理通用圖像的情況下,要獲得所需的兼容格式,將需要進(jìn)行去飽和和標(biāo)準(zhǔn)化處理,以確保像素為黑色或白色。 此外,初步應(yīng)用降噪算法可能會(huì)被證明是有益的。
The symbol contained should be made out of thin strokes:
包含的符號(hào)應(yīng)使用細(xì)筆畫制成:
Since the algorithm maps all pixels to the resized canvas and interpolates a line between adjacent ones, lines of two or more pixels width will separate as independent, adjacent lines. Undesirable connecting lines will appear between the pixels of both lines caused by the interpolation step. To avoid this, it is imperative to achieve thin strokes when creating datasets, making sure to utilise symbols with proportions significantly larger than their stroke’s width. Considering the image’s resolution, the symbol’s width should occupy, at most, one pixel thick. Alternatively, one could include a simple, custom pre-processing algorithm to make strokes thinner.
由于該算法將所有像素映射到調(diào)整后的畫布,并在相鄰像素之間插入一條線,因此兩個(gè)或更多像素寬度的線將分開為獨(dú)立的相鄰線。 由于插值步驟,兩條線的像素之間將出現(xiàn)不希望的連接線。 為避免這種情況,在創(chuàng)建數(shù)據(jù)集時(shí)必須實(shí)現(xiàn)細(xì)筆觸,確保使用比例明顯大于其筆觸寬度的符號(hào)。 考慮到圖像的分辨率,符號(hào)的寬度最多應(yīng)占據(jù)一個(gè)像素厚。 或者,可以包括一種簡(jiǎn)單的自定義預(yù)處理算法,以使筆劃更細(xì)。
The scaling ratio should not be too big:The algorithm uses linear interpolation between mapped pixels. For this reason, resizing an image to a much bigger canvas will make the symbol look unnaturally angular. To overcome this flaw, one could easily change the type of interpolation used in the algorithm (e.g. polynomial interpolation, spline interpolation, etc.), which will be left for the reader to implement, if necessary.
縮放比例不應(yīng)太大: 該算法在映射像素之間使用線性插值。 因此,將圖像調(diào)整為更大的畫布將使該符號(hào)看起來(lái)不自然。 為了克服這一缺陷,可以輕松地更改算法中使用的插值類型(例如多項(xiàng)式插值,樣條插值等),如有必要,留給讀者實(shí)施。
結(jié)論 (Conclusion)
This article proposed an alternative algorithm to resizing images, which allows thin structures to preserve their stroke’s width. Whether the reader deals with deep learning problems around handwritten recognition, or only needs effective resizing algorithm of this kind, these implementations will hopefully simplify the creation of more effective custom datasets or the processing of existing ones to be compatible with each other.
本文提出了一種調(diào)整圖像大小的替代算法,該算法允許薄型結(jié)構(gòu)保留其筆觸的寬度。 無(wú)論讀者是圍繞手寫識(shí)別來(lái)解決深度學(xué)習(xí)問(wèn)題,還是僅需要這種有效的大小調(diào)整算法,這些實(shí)現(xiàn)方式都有望簡(jiǎn)化更有效的自定義數(shù)據(jù)集的創(chuàng)建或現(xiàn)有數(shù)據(jù)集的處理,以使彼此兼容。
Should the reader be interested in deep learning problems involving handwritten symbols, I highly recommend to visit the related project links, which could well be of useful insight:
如果讀者對(duì)涉及手寫符號(hào)的深度學(xué)習(xí)問(wèn)題感興趣,我強(qiáng)烈建議訪問(wèn)相關(guān)的項(xiàng)目鏈接,這可能會(huì)很有用:
handCalc | Written Calculator (Project Repository):https://github.com/michheusser/handCalc
handCalc | 書面計(jì)算器(項(xiàng)目資料庫(kù)): https://github.com/michheusser/handCalc
Image Processing and Training of Neural Network:https://github.com/michheusser/neural-network-training
神經(jīng)網(wǎng)絡(luò)的圖像處理和訓(xùn)練: https://github.com/michheusser/neural-network-training
Dataset Creation:https://www.kaggle.com/michelheusser/handwritten-digits-and-operators
數(shù)據(jù)集創(chuàng)建: https://www.kaggle.com/michelheusser/handwriting-digits-and-operators
翻譯自: https://medium.com/@michheusser/handwritten-recognition-resizing-strokes-instead-of-images-b787af9935fc
手寫體數(shù)字圖像識(shí)別圖像
總結(jié)
以上是生活随笔為你收集整理的手写体数字图像识别图像_手写识别调整笔画大小而不是图像的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 每日一Lua(2)-语句
- 下一篇: linux CPU占用率高