當前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

MLIR算子量化Quantization

發布時間：2023/11/28 生活经验 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 MLIR算子量化Quantization 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

MLIR算子量化Quantization
本文概述了MLIR量化系統的設計。雖然術語“量化”是高度過載的，用于將浮點計算轉換為以整數數學表示，適配的變量進行推理的技術的相當窄的范圍，如低位深度推理引擎（如TFLite）所支持的，各種加速器硬件和許多DSP。
很大程度上受到了本文所采用的方法的啟發，其中包含了許多擴展和修改。它具體記錄了MLIR在這一主題上的立場，而不是一般性的參考。
Uniform quantization
o Fixed point values
o Affine values
o Relation
o Converting between real and fixed point or affine
? Usage within MLIR
? Quantization Dialect
o Quantized type
o Quantized type conversion operations
o Instrumentation and constraint operations
? Integration with simulated quantization at training time
? TFLite native quantization
o General algorithm
Uniform quantization均勻量子化
MLIR支持的主要量化機制，通過實數線上的等間距點，來表示不動點和仿射變換。

此外，該方案可以應用于：
?每層per-layer：應用于目標類型中的每個值。
?每軸per-axis（也稱為每通道）：沿張量類型的特定軸，分別應用于每個索引。
? per-layer : Applying to every value within the target type.
? per-axis (also called per-channel) : Applying individually to each index along a specific axis of a tensor type.
定點值
定點值是實數除以刻度。將實數除以的結果稱為標度值。
The $real_value = scaled_value * scale$
縮放可以解釋為相鄰縮放值之間的距離（以實單位表示）。例如，如果標度為 $π\pi$ ，則具有此標度的定點值只能表示 $π\pi$ 的倍數，而不能表示兩者之間的值。將任意實數轉換為給定值的固定點值的最大舍入誤差 $s c a l e$ is $scale2\frac{scale}{2}$ 。
繼續上一示例，當 $\pi$ , 最大舍入誤差為 $π2\frac{\pi}{2}$ .
可以對具有不同比例的縮放值執行乘法，使用與實值乘法相同的算法（注意，乘積縮放值具有 $KaTeX parse error: Undefined control sequence: \mbox at position 32: … = scale_{left \?m?b?o?x?{ } operand} * …$ ).
可以對縮放值執行加法，只要具有相同的縮放比例，使用相同的實值加法算法。在計算機上有符號整數表示縮放值，并對這些有符號整數執行算子運算變得很方便，因為結果將是正確的縮放值。
Affine values
從數學上講，仿射值是將實值零點加到標度值上的結果。或者（等價地），從仿射值中減去一個零點得到一個縮放值：
$real_value = scaled_value * scale = (affine_value - zero_point) * scale$
從本質上說，仿射值是縮放值的某個常量的移動。算術（即加法、減法、乘法、除法）通常不能直接對仿射值執行；它們必須首先轉換為等效的縮放值。
如上所述，使用仿射值的目的，更有效地表示在計算過程中實際遇到的實際值。將遇到的實數值不是圍繞實數零對稱的。假設在計算過程中遇到實零，應表示為實零。
存儲由有符號整數表示的縮放值是低效的，因為某些有符號整數永遠不會被使用。實際上，與這些有符號整數對應的位模式將被浪費。
為了用整數值仿射值精確地表示實零，零點必須是最小仿射值和最大仿射值（含）之間的整數。例如，給定一個由8位無符號整數表示的仿射值，我們有： $KaTeX parse error: Can't use function '\u' in math mode at position 11: 0\leq zero\?u? ?point\leq 255$ 。這一點很重要，因為在深度神經網絡的卷積運算中，經常需要將輸入和輸出歸零，所以零必須是可精確表示的，否則結果會有偏差。
Relation
實值、固定點值和仿射值通過以下等式進行關聯，該等式演示了如何將一種類型的數字轉換為另一種類型：
$real_value = scaled_value * scale = (affine_value - zero_point) * scale$
計算機通常使用有限位數存儲數學值。雖然上述轉換是精確的，但要將結果存儲在有限的位中，通常必須對轉換結果進行舍入（這兩種情況都適用：使用浮點存儲和使用定點存儲）。對舍入行為的全面討論超出了本文的范圍，除非另有說明，否則可以安全地假設舍入應符合RNE的IEEE754默認值（在硬件允許的情況下）。
Converting between real and fixed point or affine
To convert a real value to a fixed point value, we must know the scale. To convert a real value to an affine value, we must know the scale and the zero point.
Real to affine
要將實值元素的輸入張量（通常由浮點格式表示，通常為單精度），轉換為由整數類型（例如8位無符號整數）表示的仿射元素張量，可以執行以下轉換（不需要使用整型的所有可表示值）：
$KaTeX parse error: No such environment: align* at position 8: \begin{?a?l?i?g?n?*?}? af&fine_value_…$
In the above, we assume that $real_value$ is a Single, $s c a l e$ is a Single, $r o u n d T o N e a r e s t I n t e g e r$ returns a signed 32-bit integer, and $zero_point$ is an unsigned 8-bit or 16-bit integer.
位深度和定點值的數目表示典型硬件上的常見類型，但不限于特定位深度或使用N位整數的整個范圍的要求。
仿射到實數
要將uint8或uint16表示的仿射元素的輸出張量，轉換為實值元素的張量（通常用浮點格式表示，通常為單精度），可以執行以下轉換：
$KaTeX parse error: No such environment: align* at position 8: \begin{?a?l?i?g?n?*?}? re&al_value_{S…$
在上面的例子中，假設減法的結果，32位有符號整數格式，并且 $r o u n d T o N e a r e s t F l o a t$ 返回Single精度。
仿射到不動點
當仿射標度和不動點標度相同時，從仿射值中減去零點得到等價的不固定值。
$KaTeX parse error: Undefined control sequence: \mbox at position 34: …fine_value_{non\?m?b?o?x?{-}negative} - …$
Fixed point to affine
當仿射尺度和不動點尺度相同時，將零點加到不動點的值上，得到等價的仿射值。
$KaTeX parse error: Undefined control sequence: \mbox at position 19: …fine_value_{non\?m?b?o?x?{-}negative} = …$
Usage within MLIR
MLIR中正在開發的量化系統有幾個內容：
Quantization dialect containing:
o A family of QuantizedTypes which represent the mapping between expressed values (typically of a floating point computer type) and storage values (typically of an integral computer type).
o Type conversion ops for converting between types based on a QuantizedType and its expressed and storage sub-types.
o Instrumentation ops for assigning instrumentation points within the computation where runtime statistics may help guide the quantization process.
? Integration with simulated quantization at training time
? TFLite native quantization
o The TFLite op-set natively supports uniform-quantized variants.
o Passes and tools exist to convert directly from the TensorFlow dialect to the TFLite quantized operation set.
并不是所有的量子化應用都會用到所有這些設置。TensorFlow到TensorFlow Lite的轉換，使用QuantizedTypes，但有自己的類型轉換算子和支持數學的表達式。
Quantization Dialect
Quantized type
TODO: Flesh this section out.
? QuantizedType base class
? UniformQuantizedType
Quantized type conversion operations
? qcast : Convert from an expressed type to QuantizedType
? dcast : Convert from a QuantizedType to its expressed type
? scast : Convert between a QuantizedType and its storage type
Instrumentation and constraint operations
? const_fake_quant : Emulates the logic of the historic TensorFlow fake_quant_with_min_max_args operation.
? stats_ref : Declares that statistics should be gathered at this point with a unique key and made available to future passes of the solver.
? stats : Declares inline statistics (per layer and per axis) for the point in the computation. stats_ref ops are generally converted to statistical operations once trial runs have been performed.
? coupled_ref : Declares points in the computation to be coupled from a type inference perspective based on a unique key.
Integration with simulated quantization at training time
訓練時與模擬量化的集成
TensorFlow歷來使用tf.quantization.fake_quant_模擬訓練時，量化效果的算子族。
正如最初實現的那樣，TensorFlow Lite是推理時此類操作的主要對象。當啟用量化推斷時，如果每個合格的張量都經過一個適當的偽量化節點（張量可以應用偽量化的規則，多少有些牽扯），那么TensorFlow Lite將使用偽量化操作的屬性，判斷如何從量化算子轉換為使用kernel子集。
在基于MLIR的量化中，偽量化算子將它們轉換成一個序列來處理的，該序列是qcast*（quantize），然后是dcast（dequantize），具有適當的UniformQuantizedType作為qcast算子的對象。

后續的編譯器傳遞保留量化，以某種方式模擬的知識，同時允許編譯器靈活地移動類型轉換，簡化了計算，并將其轉換為基于積分算子的形式。
允許部分量化的計算，其中不能簡化為積分運算的部分，仍然以浮點形式執行，并在邊界處進行適當的轉換。
TFLite native quantization
TODO: Flesh this out
General algorithm

Take input min/max information and set the ArrayInfo (which really is InputOrOutputArrayInfo.
In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q -> tfl.DQ).
Hardcode logic/propagation needs to happen here.
Run TF constant folding.
In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).
Run quantization pass that take (tfl.DQ (for both input and weights) -> op -> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with (constant_quant).

總結

以上是生活随笔為你收集整理的MLIR算子量化Quantization的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：最大限度地减少块输出中间结果的计算和存储
下一篇： OpenGL在图形管道中调用了什么用户模

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

生活经验

MLIR算子量化Quantization

總結