當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

工业机器人入门实用教程_机器学习实用入门

發布時間：2023/12/15 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了工业机器人入门实用教程_机器学习实用入门小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

工業機器人入門實用教程

Following on from my earlier post on Data Science, here I will try to summarize and compile the major practical concepts of Machine Learning in a handy, easy to use, language-agnostic, reference guide format. Most of the information is presented as short and succinct bullet points. I expect this to be especially valuable to beginners or as a quick look-up for those with a basic level of experience in data science and machine learning.

在我之前關于數據科學的文章之后，我將在這里嘗試以一種方便，易于使用，與語言無關的參考指南格式總結和編譯機器學習的主要實用概念。大多數信息以簡短的要點表示。我希望這對初學者特別有用，或者對于那些具有數據科學和機器學習基礎知識的人來說是快速查找。

入門概念 (Introductory Concepts)

Let us get some basic terminologies out of our way first:

讓我們首先擺脫一些基本術語：

Structured data refers to data stored in a predefined format, e.g., tables, spreadsheets or relational databases
結構化數據是指以預定義格式存儲的數據，例如表，電子表格或關系數據庫
Unstructured data, on the other hand, does not have a predefined format and therefore, cannot be saved in a tabular form. Unstructured data can come in a variety of types, e.g., blobs of text, images, videos, audio files
另一方面， 非結構化數據沒有預定義的格式，因此無法以表格形式保存。非結構化數據可以有多種類型，例如，文本，圖像，視頻，音頻文件的斑點
Categorical data is any data that can be labeled and usually comprises of a range of fixed values, e.g., gender, nationality, risk grades. Categorical data can be either nominal (without any inherent ordering, e.g., gender) or ordinal (ordered or ranked data, e.g., risk grades). These fixed values are known as classes or categories
分類數據是可以被標記的任何數據，通常包含一系列固定值，例如性別，國籍，風險等級。分類數據可以是名義數據(沒有任何固有的排序，例如性別)或排序數據(排序或排序的數據，例如風險等級)。這些固定值稱為類或類別
Features or Predictors: input data/variables used by an ML model, usually denoted with X, to predict the target variable
特征或預測變量：ML模型使用的輸入數據/變量，通常用X表示，以預測目標變量
Target variable: the data point that we want to predict by an ML model, often represented with y
目標變量 ：我們要通過ML模型預測的數據點，通常用y表示
Classification problem involves predicting a discrete class of a categorical target variable, e.g., spam or not, default or non-default
分類問題涉及預測分類目標變量的離散類別，例如，是否為垃圾郵件，默認還是非默認
Regression problem deals with predicting a continuous numeric value, e.g., sales, house price
回歸問題涉及預測連續的數值，例如銷售，房價
Feature Engineering: transforming existing features or engineering new input features that can potentially be more useful during model training. E.g., calculating the number of months from today for a date variable
特征工程 ：轉換現有特征或工程新的輸入特征，這些特征在模型訓練期間可能會更加有用。例如，為日期變量計算從今天起的月數
Training, Validation & Test data: Training data is used during initial model training/fitting. Validation data is used to evaluate the model, usually to fine-tune model parameters or identify the most suitable ML model among many. Test data is used for the final evaluation of a short-listed or fine-tuned model
訓練，驗證和測試數據 ：訓練數據用于初始模型訓練/擬合。驗證數據用于評估模型，通常用于微調模型參數或識別眾多模型中最合適的ML模型。測試數據用于最終入圍或微調模型的最終評估
Overfitting happens when a model performs well on the training data but poorly on the test/validation data, i.e., fails to generalize adequately on new and unseen data
當模型在訓練數據上表現良好但在測試/驗證數據上表現不佳時，即無法在新的和看不見的數據上充分歸納，就會發生過度擬合
Underfitting occurs when the model is not complex and robust enough that it is unable to learn the variable relationships from the training data, and has low accuracy even when applied on the data on which it was trained
當模型不夠復雜且不夠健壯以至于無法從訓練數據中學習變量關系，并且即使將其應用于訓練數據時，其準確性也很低，就會發生欠擬合
Model bias and variance: A model is said to be biased when it performs poorly on the training dataset as a result of underfitting. Variance is associated with how well or poorly the model performs on the test/validation set, with a high variance being usually caused by overfitting
模型偏差和方差 ：當模型由于訓練不足而在訓練數據集上表現不佳時，被認為是偏差的。方差與模型在測試/驗證集上的表現有多好或差有關，通常由過度擬合導致高方差
Generalization, closely related to overfitting and model variance, refers to a model’s ability to make correct predictions on new, previously unseen data
泛化與過度擬合和模型方差密切相關，是指模型針對新的，以前看不見的數據做出正確預測的能力
Regularization techniques improve the generalizability of a model, e.g., through penalizing or shrinking regression coefficients towards zero
正則化技術例如通過將回歸系數降低或縮小至零來提高模型的通用性
Ensemble Learning is a modeling technique that combines multiple models into one
集成學習是一種將多種模型組合為一個模型的建模技術
Baseline Model is a naive model/heuristic used as a reference point to evaluate a conventional ML model
基線模型是一種天真的模型/啟發式方法，用作評估常規ML模型的參考點
Hyperparameters are the specific model parameters that can be tweaked during model training
超參數是在模型訓練期間可以調整的特定模型參數

數據清理和特征工程 (Data Cleaning and Feature Engineering)

Data cleaning transforms raw data into a form and format that can be effectively and efficiently processed by ML models. Despite their perceived intelligence and robustness, the GIGO principle remains valid in ML. Refer to my previous article for further details.

數據清理將原始數據轉換為可以由ML模型有效處理的形式和格式。盡管它們具有智能和魯棒性，但GIGO原則在ML中仍然有效。有關更多詳細信息，請參閱我的上一篇文章。

Deal with missing data:

處理丟失的數據：

Drop all records with missing features — not recommended
刪除所有缺少功能的記錄-不建議
Heuristic-based imputation using domain knowledge
使用領域知識進行啟發式插補
Mean/median/mode imputation of missing values
缺失值的均值/中位數/眾數推算
Use a random value or a constant to fill in missing data
使用隨機值或常量填寫缺失的數據
Utilize k Nearest Neighbors or a linear regression model to predict and impute missing values
利用k最近鄰或線性回歸模型來預測和估算缺失值

Some other typical data cleaning tasks include:

其他一些典型的數據清理任務包括：

Identify and delete zero-variance features
識別和刪除零方差特征
Identify, and potentially drop, features that exhibit multicollinearity, or a high degree of pairwise correlation
識別并可能表現出多重共線性或高度成對相關性的特征
Evaluate features with low variance or near-zero variance utilizing domain knowledge. Mostly applicable for numerical and nominal categorical data
利用領域知識評估具有低方差或接近零方差的特征。最適用于數值和名義分類數據
Drop, if applicable, duplicate records
刪除(如果適用)重復記錄
Identify outliers and determine an appropriate strategy to deal with them — either drop them, trim them, or leave them as it is since some ML models can effectively deal with outliers
識別異常值并確定適當的策略來對其進行處理-丟棄它們，修剪它們或保留它們原樣，因為某些ML模型可以有效地處理異常值

特征工程 (Feature Engineering)

Feature engineering is more of an art than science and relies predominantly on one’s domain knowledge. Done correctly, it has the potential to increase a model’s predictive power.

要素工程更多地是一門藝術，而不是科學，并且主要依靠一個人的領域知識。正確完成后，它有可能增加模型的預測能力。

Feature engineering techniques for numerical data:

數值數據的特征工程技術：

Scale, normalize, or standardize using log scales, z-scores, min-max
使用對數刻度，z分數，最小-最大來縮放，歸一化或標準化
Create new features using mathematical or statistical interaction(s) within raw numerical features, e.g., through addition, subtraction, or a statistical test
使用原始數值特征內的數學或統計相互作用來創建新特征，例如，通過加法，減法或統計檢驗
Utilize statistical transformations to convert skewed distributions to Gaussian-like, e.g., log/power and Box-Cox transform
利用統計變換將偏態分布轉換為類似高斯的分布，例如對數/冪和Box-Cox變換
Dimensionality reduction techniques, e.g., Principal Component Analysis (PCA)
降維技術，例如主成分分析(PCA)
Binning a numerical feature into categories is generally not recommended. However, there are certain use cases (e.g., credit risk scoring) where it is a proven and well-researched industry best practice
通常不建議將數字特征分類。但是，在某些用例(例如，信用風險評分)中，這是行之有效且經過充分研究的行業最佳實踐

Feature engineering techniques for categorical data:

分類數據的特征工程技術：

Ordinal encoding: convert ordered categorical data into numerical values, e.g., Good, Bad, Worse converts to 1, 2, 3
順序編碼：將有序的分類數據轉換為數值，例如，良好，不良，較差轉換為1、2、3
One-hot encoding for nominal categorical data. Each feature’s category is converted to a separate column where its presence is denoted by 1 and absence by 0. E.g., [USA, UK, Australia] is converted to [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
名義分類數據的一鍵編碼。每個地圖項的類別都將轉換為單獨的列，其中的存在由1表示，不存在由0表示。例如，[美國，英國，澳大利亞]轉換為[[1、0、0]，[0、1、0]， [0，0，1]]
Certain specific techniques that are widely utilized in Natural Language Processing, e.g., feature hashing scheme and word embeddings
在自然語言處理中廣泛使用的某些特定技術，例如，特征哈希方案和單詞嵌入

建模概述和原則 (Modeling Overview and Principles)

So what exactly does a Machine Learning model do? Given a set of features, X in training data, an ML model tries to iteratively find an ideal statistical function (often called the training function) that maps X to the training data’s target variable, y, most accurately. Finding the ideal training function usually involves making certain assumptions about the underlying data and its form. This training function is then used to predict y on any new data in the future, given X.

那么，機器學習模型到底能做什么？ 給定訓練數據中的一組特征X ，一個ML模型試圖迭代找到理想的統計函數(通常稱為訓練函數)，該函數將X最準確地映射到訓練數據的目標變量y 。尋找理想的訓練功能通常涉及對基礎數據及其形式進行某些假設。然后，在給定X ，使用該訓練函數來預測未來任何新數據的y 。

一些基本的建模原理 (Some basic modeling principles)

Here we will touch on some general modeling related principles and philosophies:

在這里，我們將探討一些與一般建模有關的原理和理念：

Model accuracy and its interpretability tradeoff

模型的準確性及其可解釋性的權衡

Better model accuracy usually results in relatively lower model interpretability.

更好的模型準確性通常會導致相對較低的模型可解釋性。

Complex models, like deep neural networks and ensemble decision trees, usually perform better than simpler models. However, they are much less interpretable in the sense that the training function is not easy to understand for a layperson. Simpler models, like linear regression, logistic regression, and a single decision tree, are easily interpretable at the cost of accuracy.

諸如深層神經網絡和整體決策樹之類的復雜模型通常比簡單模型表現更好。但是，就培訓人員不容易理解的培訓功能而言，它們的解釋性要差得多。諸如線性回歸，邏輯回歸和單個決策樹之類的簡單模型很容易以準確性為代價進行解釋。

Consider a simple logistic regression model. It provides us with the coefficients for each feature that, in turn, provide us with insights as to how useful that feature is for the prediction problem.

考慮一個簡單的邏輯回歸模型。它為我們提供了每個特征的系數，進而為我們提供了該特征對預測問題的有用性的見解。

Therefore, model selection, at times, is usually driven by the level of complexity and interpretability required. Some domains, like credit scoring, more or less mandate the use of an easily interpretable model. Hence, logistic regression has been historically and widely used in credit scoring problems. However, for image detection, recognition, and natural language processing, interpretability is less of a concern. Therefore, complex, deep neural networks can be safely deployed in these domains.

因此，有時模型選擇通常由所需的復雜性和可解釋性級別決定。某些領域(例如信用評分)或多或少要求使用易于解釋的模型。因此，邏輯回歸在歷史上一直廣泛用于信用評分問題。但是，對于圖像檢測，識別和自然語言處理，可解釋性就不那么重要了。因此，可以在這些域中安全地部署復雜的深度神經網絡。

Bias/Variance tradeoff

偏差/方差折衷

Generally speaking, a model’s bias can be improved either by parameter tuning or selecting an altogether different model, while variance can be reduced with more training data, regularization techniques, or preventing any data leakage between training and test sets. Bias and variance can be simultaneously improved only to a certain extent, beyond which an improvement in one will usually result in deterioration of another.

一般而言，可以通過參數調整或選擇完全不同的模型來改善模型的偏差，而可以通過使用更多的訓練數據，正則化技術來減少方差，或者防止訓練和測試集之間的任何數據泄漏。偏差和方差只能同時在一定程度上得到改善，超過這一點通常會導致另一方的惡化。

This tradeoff is very common and requires a delicate balance in practice. Note that there will always be some unavoidable variance due to random noise in data, which makes it practically impossible to reduce variance to 0.

這種折衷是很常見的，并且在實踐中需要微妙的平衡。請注意，由于數據中的隨機噪聲，總會有一些不可避免的方差，這實際上使方差減小到0是不可能的。

Occam’s Razor

奧卡姆剃刀

Occam’s Razor is a general philosophical principle that states: if there are two explanations for an event or a fact, then it is highly likely that the simplest explanation with the least number of assumptions will most likely be correct. When applied to ML, Occam’s Razor principle implies that, when comparing two models with similar predictive power or accuracy, we should select the simpler model.

奧卡姆(Occam)的《剃刀》(Razor)是一條普遍的哲學原理，其中指出：如果對事件或事實有兩種解釋，那么假設數量最少的最簡單解釋很可能是正確的。當應用于ML時，Occam的Razor原理意味著，當比較具有相似預測能力或準確性的兩個模型時，我們應該選擇更簡單的模型。

No Free Lunch Theorem

沒有免費的午餐定理

No single machine learning model works best for all possible problems. Wolpert and Macready1 stated: “If an algorithm performs better than random search on some class of problems then it must perform worse than random search on the remaining problems”. It is common to try multiple relevant models and find one that works best for your particular problem.

沒有任何一種機器學習模型能夠最好地解決所有可能出現的問題。 Wolpert和Macready1表示：“如果算法在某些類別的問題上比隨機搜索要好，那么它在其余問題上的性能必須比隨機搜索差”。通常嘗試多個相關模型，然后找到最適合您特定問題的模型。

分類法建模 (Modeling Taxonomy)

Machine learning models can be classified in several ways, some of which are:

機器學習模型可以通過幾種方式分類，其中一些是：

Parametric vs. Non-Parametric

參數與非參數

Parametric models make strong assumptions on training data to identify and simplify the training function to a known form, the parameters of which fully describe and capture the relationships between features and the target variable.

參數模型對訓練數據進行了強有力的假設，以識別訓練函數并將其簡化為已知形式，其參數完全描述并捕獲了特征與目標變量之間的關系。

For example, a linear regression model assumes a linear relationship between the input features and the target variable and will try to find the best possible linear function for the same. However, if this assumption is invalid, then the model will predict poor results.

例如，線性回歸模型假設輸入要素與目標變量之間存在線性關系，并將嘗試為其找到最佳的線性函數。但是，如果此假設無效，則該模型將預測不良結果。

Examples of parametric models include linear regression, logistic regression, and Naive Bayes.

參數模型的示例包括線性回歸，邏輯回歸和樸素貝葉斯。

Non-Parametric models don’t make strong assumptions about either the training data or the form of the training function and, thereby, are generally more flexible but at the cost of potential overfitting.

非參數模型對訓練數據或訓練函數的形式均未作強力假設，因此通常更靈活，但以潛在的過度擬合為代價。

Examples of non-parametric models include k-Nearest Neighbors, decision trees, and Support Vector Machines (SVM).

非參數模型的示例包括k最近鄰，決策樹和支持向量機(SVM)。

Supervised vs. Unsupervised

有監督與無監督

Supervised models try to predict the known target variable(s), while unsupervised models do not have any knowledge about the target variable(s) beforehand. The goal of unsupervised learning is to try and understand the relationships between the variables or observations.

監督模型試圖預測已知的目標變量，而無監督模型則事先不了解目標變量。無監督學習的目的是試圖理解變量或觀察值之間的關系。

Supervised models include linear regression, logistic regression, and decision trees while unsupervised models include k-Nearest neighbors, SVM, isolation forest, and PCA

有監督的模型包括線性回歸，邏輯回歸和決策樹，而無監督的模型包括k最近鄰居，SVM，隔離林和PCA

Blackbox vs. DescriptiveBlackbox models utilize multiple complex algorithms to make decisions, but we do not know how the decision/prediction is arrived at. E.g., deep learning and neural networks.

黑盒與描述性黑盒模型利用多種復雜算法來做出決策，但是我們不知道決策/預測是如何得出的。例如，深度學習和神經網絡。

Descriptive models provide clear insight into why and how they make their decisions. E.g., linear regression, logistic regression, decision trees

描述性模型可以清楚地了解為什么以及如何做出決策。例如，線性回歸，邏輯回歸，決策樹

Oft-used ML Models

常用的ML模型

Some of the widely used ML models, outside the deep machine learning domain, include:

深度機器學習領域之外的一些廣泛使用的ML模型包括：

Linear Regression is a simple and extensively used supervised learning model for regression problems. It predicts a numerical target variable on the assumption that there is a linear relationship between features and the target variable
線性回歸是用于回歸問題的簡單且廣泛使用的監督學習模型。它假設要素與目標變量之間存在線性關系，從而預測數字目標變量
Logistic Regression is extensively used to predict the class, or the probability of being assigned to it, given a set of features. Hence, it is a supervised model for classification problems. Logistic regression assumes that all features have a linear relationship with the log-odds (logit) of the target variable. Need to be very careful when predicting imbalanced classes
給定一組功能， Logistic回歸廣泛用于預測類別或分配給類別的概率。因此，它是分類問題的監督模型。 Logistic回歸假定所有要素與目標變量的對數奇數(logit)具有線性關系。預測班級不平衡時需要非常小心
k-Nearest Neighbors (KNN) is a supervised model that can be used for both classification and regression problems. It predicts through a simple majority vote of the k number of nearest neighbors of each observation through a distance metric (most commonly Euclidean distance)
k最近鄰(KNN)是一種監督模型，可用于分類和回歸問題。它通過距離度量(最常見的是歐幾里得距離)通過簡單的多數表決來預測每個觀測值的k個最近鄰居
k-Means Clustering is an unsupervised clustering algorithm that assigns observations to groups in a manner that minimizes variance between individual observations within each cluster
k均值聚類是一種無監督的聚類算法，它以最小化每個聚類中各個觀察之間的方差的方式將觀察分配給組
Decision Trees are supervised models that can be utilized for both regression and classification problems using a sequence of rules. A single decision tree is rarely used in practice, given its risk of overfitting. Instead, ensemble or bagging concepts are utilized to minimize the model variance
決策樹是監督模型，可以使用一系列規則來解決回歸和分類問題。考慮到過度決策的風險，實際上很少使用單個決策樹。取而代之的是，采用集合或裝袋的概念來最小化模型差異
Random Forest is an ensemble model that combines multiple decision trees through the concept of bagging to reduce model error
隨機森林是一個集成模型，通過裝袋的概念組合了多個決策樹以減少模型錯誤
Support Vector Machine (SVM) is a supervised classification model that aims to find an ideal hyperplane or boundary between all the possible classes that maximizes the distance between them. This hyperplane is then used for classification
支持向量機(SVM)是一種監督分類模型，旨在在所有可能的類之間找到理想的超平面或邊界，以使它們之間的距離最大化。然后將此超平面用于分類

模型評估 (Model Evaluation)

But how good is our model in making predictions? Model evaluation gives us the answer.

但是我們的模型在做出預測方面有多好？模型評估為我們提供了答案。

評估策略 (Evaluation Strategies)

A brief overview of the various evaluation strategies follows:

各種評估策略的簡要概述如下：

Train/Test Split

訓練/測試拆分

Divide the complete dataset into two subsets, called training and test (usually in 80/20, 75/25, or 70/30 split). Train the model on the training set and apply evaluation metrics on model predictions made on the test set. This is not the ideal approach as there is no separate dataset to test, evaluate, and compare model parameters (called hyperparameter optimization) or multiple models.

將完整的數據集分為兩個子集，稱為訓練和測試(通常以80 / 20、75 / 25或70/30分割)。在訓練集上訓練模型，并將評估指標應用于在測試集上進行的模型預測。這不是理想的方法，因為沒有單獨的數據集可以測試，評估和比較模型參數(稱為超參數優化)或多個模型。

Conducting such evaluation on the test set and using the results to tune the model will result in data leakage from the test set to the training set and unreliable final evaluation metrics. This is because we are using information from the test set (that should be considered as new, unseen data that we will encounter in production) to train the model.

在測試集上進行此類評估，并使用結果來調整模型，將導致數據從測試集泄漏到訓練集，并導致不可靠的最終評估指標。這是因為我們正在使用測試集中的信息(應將其視為生產中將會遇到的新的，看不見的數據)來訓練模型。

Train/Validation/Test Split

訓練/驗證/測試拆分

Divide the complete dataset into three subsets. Train single or multiple models on the training set and apply preliminary evaluation metrics on model predictions made on the validation set. Use these results to fine-tune a single model or select the optimal model. Once a final model has been selected, apply it to the yet untouched test dataset and evaluate thereon. This prevents any data leakage and is a better, but not the ideal evaluation approach.

將完整的數據集分為三個子集。在訓練集上訓練單個或多個模型，并將初步評估指標應用于在驗證集上做出的模型預測。使用這些結果可以微調單個模型或選擇最佳模型。選擇最終模型后，將其應用于尚未修改的測試數據集并在其上進行評估。這樣可以防止任何數據泄漏，是一種更好的方法 ，但不是理想的評估方法。

Cross-Validation (CV)

交叉驗證(CV)

CV is applied to the training set after a train/test split. CV splits the training set into multiple subsets (called folds) and fits the model on all but one sets and evaluates it on the holdout set. This would result in multiple evaluation metrics (dependent upon the number of folds), the average and standard deviation of which is used to select the final model.

訓練/測試拆分后，將CV應用于訓練集。 CV將訓練集分為多個子集(稱為折疊)，并將模型擬合到除一組以外的所有子集上，并在保留集上對其進行評估。這將導致多個評估指標(取決于折數)，其平均值和標準差用于選擇最終模型。

Once the final model is selected, it is trained again on the whole training set and evaluated on the test set, which was left untouched during the entire process. CV is the ideal model evaluation approach.

一旦選擇了最終模型，就可以在整個訓練集上對其進行再次訓練，并在測試集上進行評估，而測試集在整個過程中都保持不變。簡歷是理想的模型評估方法 。

Some of the standard CV techniques include:

一些標準的簡歷技術包括：

Leave One Out CV (LOOCV): fit and train the model on all but one observations
留下一個簡歷(LOOCV)：除一個觀測值外，對模型進行擬合和訓練
k-Fold CV: fit and train the model on k-1 number of folds and evaluate on the holdout set
k折CV：以k-1折數擬合并訓練模型，并根據保留集進行評估
Repeated k-Fold CV: similar to k-Fold CV but the process is repeated for a specified number of times
重復k折CV：類似于k折CV，但該過程重復指定的次數
Stratified k-Fold CV: similar to k-Fold CV but here the folds are made by preserving the percentage of samples for each target class. Useful for imbalanced data
分層k折CV：類似于k折CV，但此處的折痕是通過保留每種目標類別的樣品百分比來進行的。有用的不平衡數據
Repeated Stratified k-Fold CV: a combination of Repeated k-Fold CV and Stratified k-Fold CV
重復的分層k折CV：重復的k折疊CV和分層的k折CV的組合

評估指標 (Evaluation Metrics)

There are tens of model evaluation metrics out there, some of the more widely used are described below:

有數十種模型評估指標，下面介紹一些更廣泛使用的指標：

Classification Metrics:

分類指標：

Accuracy: the ratio of correct predictions to the total number of predictions. Not suitable for an imbalanced dataset
準確性：正確預測與預測總數之比。不適合不平衡的數據集
Precision: the ratio of true positives to the total number of predicted positives
精度：真實陽性與預測陽性總數的比率
Recall, also known as Sensitivity or True Positive Rate (TPR): the ratio of true positives to the actual number of positives
回憶，也稱為敏感度或真實陽性率(TPR)：真實陽性與實際陽性數量的比率
F-Score: a single score to measure Precision and Recall together
F分數：單個分數，可同時測量“精確度”和“召回率”
Area Under the Receiver Operating Characteristic Curve (AUROC): a single number that summarizes the information of a ROC curve
接收器工作特性曲線( AUROC )下的區域：一個數字，總結了ROC曲線的信息
Brier Score, Cohen’s Kappa Statistic, etc.
Brier得分，科恩的Kappa統計信息等。

Refer to my previous article for further details on these metrics.

有關這些指標的更多詳細信息，請參閱我的上一篇文章。

Regression Metrics

回歸指標

Mean Absolute Error (MAE): the average absolute difference between actual and predicted values
平均絕對誤差(MAE) ：實際值與預測值之間的平均絕對差
Median Absolute Error: the median of absolute differences between actual and predicted values
中位數絕對誤差 ：實際值和預測值之間的絕對差的中值
Mean Squared Error (MSE): the average of the squared differences between actual and predicted values
均方誤差(MSE) ：實際值和預測值之間的平方差的平均值
Root Mean Squared Error (RMSE): a simple root of MSE
均方根誤差(RMSE)： MSE的簡單根

結論 (Conclusion)

I hope the above pointers would come in handy in your machine learning journey.

希望以上指示對您的機器學習過程有所幫助。

Feel free to reach out to me to discuss anything related to machine learning or data and financial analytics.

歡迎與我聯系，討論與機器學習或數據和財務分析有關的任何內容。

Learn on!

繼續學習！

翻譯自: https://towardsdatascience.com/a-practical-introduction-to-machine-learning-f43d8badc5a7