如何学习 azure_Azure的监督学习
如何學習 azure
Machine learning sounds cool, doesn’t it? I’m a biology student who didn’t have any idea about this branch of computer science. This lockdown gave me the time and strength to explore it. For those who need a layman intro to machine learning, I shall share an example. One day my dad asked me what do I keep studying? I didn’t know how to explain it to him. Words going on in my mind were normalization, overfitting, models, azure, etc. The next minute, he was trying to type a text to a friend by using google speech recognition on his phone. My next sentence was, that’s what I am studying dad! The science behind this process is what is called machine learning. It is a subset of artificial intelligence that focuses on creating programs that are capable of learning without explicit instruction.
機器學習聽起來很酷,不是嗎? 我是生物學專業的學生,??對計算機科學的這個分支一無所知。 這種鎖定使我有時間和精力進行探索。 對于那些需要入門的機器學習入門者,我將分享一個例子。 有一天我爸爸問我繼續學習什么? 我不知道如何向他解釋。 我腦海中常出現的單詞是規范化,過度擬合,模型,天藍色等。第二分鐘,他試圖通過在手機上使用Google語音識別功能向朋友輸入文本。 我的下一句話是,這就是我正在學習的爸爸! 該過程背后的科學就是所謂的機器學習。 它是人工智能的子集,專注于創建無需明確指令即可學習的程序。
The following article includes one of the basic concepts of machine learning i.e. Supervised Learning. Hope you all enjoy it! 1. Supervised Learning: Classification
以下文章包括機器學習的基本概念之一,即監督學習。 希望大家喜歡! 1.監督學習:分類
The first type of supervised learning that we’ll look at is classification. Recall that the main distinguishing characteristic of classification is the type of output it produces:
我們要研究的第一類監督學習是分類 。 回想一下分類的主要區別特征是它產生的輸出類型:
In a classification problem, the outputs are categorical or discrete.Within this broad definition, there are several main approaches, which differ based on how many classes or categories are used, and whether each output can belong to only one class or multiple classes. Let’s have a look.
在 分類 問題中,輸出是分類的或離散的。 在這個寬泛的定義內,有幾種主要方法,這些方法根據所使用的類別或類別的數量以及每個輸出是否只能屬于一個類別或多個類別而有所不同。 我們來看一下。
Some of the most common types of classification problems include:
最常見的分類問題類型包括:
· Classification on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.
· 表格數據的分類 :數據以行和列的形式提供,可能源自多種數據源。
· Classification on image or sound data: The training data consists of images or sounds whose categories are already known.
· 圖像或聲音數據的分類 :訓練數據由其類別已知的圖像或聲音組成。
· Classification on text data: The training data consists of texts whose categories are already known.
· 文本數據的分類 :訓練數據由類別已知的文本組成。
As we know, machine learning requires numerical data. This means that with images, sound, and text, several steps need to be performed during the preparation phase to transform the data into numerical vectors that can be accepted by the classification algorithms.
眾所周知,機器學習需要數值數據。 這意味著對于圖像,聲音和文本,在準備階段需要執行幾個步驟,以將數據轉換為分類算法可以接受的數值向量。
Source: Udacity course for ML in Azure資料來源:Azure中的ML Udacity課程The following images are just an introduction to the various algorithms with their major characteristics. No need to get overwhelmed! Learning about algorithms is a slow and steady process.
下圖只是各種算法的主要特征介紹。 無需不知所措! 學習算法是一個緩慢而穩定的過程。
Source: Udacity course for ML in Azure資料來源:Azure中的ML Udacity課程 Source: Udacity course for ML in Azure資料來源:Azure中ML的Udacity課程*One-vs-all method: A binary model is created for each of the multiple output classes. Each of these binary models for the individual classes is assessed against its complement (all other classes in the model) as though it were a binary classification issue. Prediction is then performed by running these binary classifiers and choosing the prediction with the highest confidence score.
* 一對多方法 :為多個輸出類中的每個類創建一個二進制模型。 針對每個類別的這些二進制模型中的每一個都將根據其補語(模型中的所有其他類別)進行評估,就好像它是二進制分類問題一樣。 然后,通過運行這些二進制分類器并選擇具有最高置信度得分的預測來執行預測。
In essence, an ensemble of individual models is created and the results are then merged, to create a single model that predicts all classes. Thus, any binary classifier can be used as the basis for a one-vs-all model.
本質上,創建單個模型的集合,然后將結果合并,以創建預測所有類的單個模型。 因此,任何二進制分類器都可以用作“一對多”模型的基礎。
*SMOTE (synthetic minority oversampling technique) is one of the most commonly used oversampling methods to solve the imbalance problem. It aims to balance class distribution by randomly increasing minority class examples by replicating them. SMOTE synthesizes new minority instances between existing minority instances.
* SMOTE (合成少數群體過采樣技術)是解決 不平衡問題的最常用過采樣方法之一。 它旨在通過隨機復制少數族裔的例子來平衡階級分布。 SMOTE在現有少數派實例之間合成新的少數派實例。
2. Multi-Class Algorithms a) Multi-class Logistic Regression *Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.
2.多類算法a)多類Logistic回歸* Logistic回歸是一種分類方法,用于根據類別因變量與一個或多個假設具有邏輯分布的自變量之間的關系來預測類別因變量的值。 如果因變量只有兩個可能的值(成功/失敗),則邏輯回歸是二進制的。 如果因變量具有兩個以上的可能值(給定診斷測試結果的血液類型),則邏輯回歸是多項式。
2 Key parameters to configure this algorithm are: -Optimization tolerance: control when to stop the iterations. If the improvement between iterations is less than the specified threshold, the algorithm stops and returns the current model.
2配置此算法的關鍵參數是:- 優化容差 :控制何時停止迭代。 如果迭代之間的改進小于指定的閾值,則算法將停止并返回當前模型。
-Regularization weight: Regularization is a method to prevent overfitting by penalizing the models with extreme coefficient values. This factor determines how much to penalize the models at each iteration.
-正則化權重 :正則化是一種通過對極端系數值進行懲罰的模型來防止過度擬合的方法。 這個因素決定了每次迭代要對模型進行多少懲罰。
b) Multi-class Neural Network Include the input layer, a hidden layer, and an output layer. The relationship between input and output is learned from training the neural network on input data. 3 key parameters include: -The number of hidden nodes: Lets you customize the number of hidden nodes in the neural network. -Learning rate: Controls the size of the step taken at each iteration before correction. -The number of Learning Iterations: The maximum number of times the algorithm should process the training cases. c) Multi-class Decision Forest An ensemble of decision trees. Works by building multiple decision trees and then voting on the most popular output class. 5 key parameters include: -Resampling method: This controls the method used to create the individual trees. -The number of decision trees: This specifies the maximum number of decision trees that can be created in the ensemble. -Maximum depth of the decision trees: This is a number to limit the maximum depth of any decision tree. -The number of random splits per node: The number of splits to use when building each node of the tree. -The minimum number of samples per leaf node: This controls the minimum number of cases that are required to create any terminal node in a tree.
b)多類神經網絡包括輸入層,隱藏層和輸出層。 輸入和輸出之間的關系是通過在輸入數據上訓練神經網絡來學習的。 3個關鍵參數包括:- 隱藏節點的數量 :讓您自定義神經網絡中隱藏節點的數量。 - 學習率 :控制校正前每次迭代所采取步驟的大小。 - 學習迭代次數:算法應處理訓練案例的最大次數。 c)多類決策森林決策樹的集合。 通過構建多個決策樹,然后對最受歡迎的輸出類進行投票來工作。 5個關鍵參數包括:-重采樣方法:此控件控制用于創建單個樹的方法。 - 決策樹的數量 :這指定可以在集合中創建的決策樹的最大數量。 - 決策樹的最大深度 :這是一個數字,用于限制任何決策樹的最大深度。 - 每個節點的隨機分割數 :構建樹的每個節點時要使用的分割數。 - 每個葉節點的最小樣本數 :這控制在樹中創建任何終端節點所需的最小案例數。
3. Supervised Learning: Regression In a regression problem, the output is numerical or continuous. 3.1 Introduction to Regression Common types of regression problems include:
3.有監督的學習:回歸 在 回歸 問題中,輸出是數字或連續的。 3.1回歸簡介回歸問題的常見類型包括:
· Regression on tabular data: The data is available in the form of rows and columns, potentially originating from a wide variety of data sources.
· 表格數據的回歸:數據以行和列的形式提供,可能源自多種數據源。
· Regression on image or sound data: Training data consists of images/sounds whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform images/sounds into numerical vectors accepted by the algorithms.
· 圖像或聲音數據的回歸:訓練數據由其數字分數已知的圖像/聲音組成。 在準備階段需要執行幾個步驟,以將圖像/聲音轉換為算法接受的數值向量。
Regression on text data: Training data consists of texts whose numerical scores are already known. Several steps need to be performed during the preparation phase to transform the text into numerical vectors accepted by the algorithms. Examples: Housing prices, Customer churn, Customer Lifetime Value, Forecasting (time series), and Anomaly Detection.
對文本數據進行回歸:訓練數據由數字分數已知的文本組成。 在準備階段需要執行幾個步驟,以將文本轉換為算法接受的數值向量。 示例:房價,客戶流失,客戶生命周期價值,預測(時間序列)和異常檢測。
3.2 Categories of Algorithms Common machine learning algorithms for regression problems include:
3.2算法類別用于回歸問題的常見機器學習算法包括:
· Linear Regression
·線性回歸
· Fast training, linear model
·快速訓練,線性模型
· Decision Forest Regression
·決策森林回歸
· Accurate, fast training times
·準確,快速的培訓時間
· Neural Net Regression
·神經網絡回歸
· Accurate, long training times
·準確,長時間的培訓
Source: Udacity course for ML in Azure資料來源:Azure中ML的Udacity課程Numerical Outcome: Dependent variable *Ordinary least squares method: Calculates error as a sum of the squares of distance from the actual value to the predicted line. It fits the model by minimizing the squared error. This method assumes a strong linear relationship between the inputs and the dependent variable. *Gradient Descent: The approach is to minimize the amount of error at each step of the model training process.
數值結果:因變量* 普通最小二乘法 :將誤差計算為從實際值到預測線的距離的平方。 它通過最小化平方誤差來擬合模型。 該方法假定輸入和因變量之間具有很強的線性關系。 * 梯度下降 :該方法是在模型訓練過程的每個步驟中最小化誤差量。
Source: Udacity course for ML in Azure資料來源:Azure中的ML Udacity課程The algorithm supports some of the same hyper-parameters discussed for multi-class decision forest algorithms such as the number of trees, maximum depth, etc.
該算法支持為多類決策森林算法討論的某些相同的超參數,例如樹的數量,最大深度等。
Source: Udacity course for ML in Azure資料來源:Azure中的ML Udacity課程Since it is a supervised learning method, it requires a tagged dataset that includes a label column which must be a numerical data type. The algorithm also supports the same hyper-parameters as the number of hidden nodes, learning rate, and the number of iterations that were included in a multi-class neural network algorithm. *Regularization is one of the hyperparameters in machine learning which is the process of regularizing the parameters that restrict, regularizes, or reduces the coefficient estimates towards zero. This technique avoids the risk of overfitting by discouraging the learning of a more complex or flexible model.
由于這是一種有監督的學習方法,因此需要帶標簽的數據集,該數據集包括必須為數字數據類型的標簽列。 該算法還支持與多類神經網絡算法中包含的隱藏節點數,學習率和迭代數相同的超參數。 * 正則化是機器學習中的超參數之一,它是將限制,正則化或將系數估計值減小為零的參數進行正則化的過程。 通過阻止學習更復雜或更靈活的模型,該技術避免了過擬合的風險。
4. Automate the training of Regressors Key challenges in successfully training a machine learning model include: -selecting features from the ones available in the datasets -choosing the right algorithm for the task -tuning the hyperparameters of the selected algorithm -selecting the right evaluation metrics to measure the performance of the trained model -the entire process is pretty iterative The idea behind Automated ML is to enable the automated exploration of the combinations needed to successfully produce a trained model. It intelligently tests multiple algorithms and hyper-parameters in parallel and returns the best one. The next steps include the deployment of the model into production and further customization or refinement if needed to improve performance.
4.自動化回歸器的訓練成功訓練機器學習模型的主要挑戰包括:-從數據集中可用的特征中選擇特征-為任務選擇正確的算法-調整所選算法的超參數-選擇正確的評估指標衡量訓練模型的性能-整個過程是反復進行的。自動化ML的想法是使能夠自動探索成功生成訓練模型所需的組合。 它可以并行智能地測試多種算法和超參數,并返回最佳算法。 下一步包括將模型部署到生產中,并在需要提高性能時進一步定制或完善。
Source: Udacity course for ML in Azure資料來源:Azure中的ML Udacity課程 Source: Udacity course for ML in Azure資料來源:Azure中的ML Udacity課程Material Reference: Udacity Fundamental Course in Machine Learning for Microsoft Azurehttps://docs.microsoft.com/en-us/azure/?product=featuredhttps://docs.microsoft.com/en-us/
物料參考:適用于Microsoft Azure的機器學習中的Udacity基礎課程https://docs.microsoft.com/zh-cn/azure/?product=featured https://docs.microsoft.com/zh-CN/
Happy learning :)
快樂學習:)
翻譯自: https://medium.com/ml-course-microsoft-udacity/supervised-learning-with-azure-23204eae32d6
如何學習 azure
總結
以上是生活随笔為你收集整理的如何学习 azure_Azure的监督学习的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 安卓怎么下刺激战场国际服(汉典安字的基本
- 下一篇: chlgo空调是怎么制热的