當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习:贝叶斯和优化方法_Facebook使用贝叶斯优化在机器学习模型中进行更好的实验

發布時間：2023/12/15 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习:贝叶斯和优化方法_Facebook使用贝叶斯优化在机器学习模型中进行更好的实验小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

機器學習:貝葉斯和優化方法

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

我最近開始了一份有關AI教育的新時事通訊。 TheSequence是無BS(意味著沒有炒作，沒有新聞等)，它是專注于AI的新聞通訊，需要5分鐘的閱讀時間。目標是讓您了解機器學習項目，研究論文和概念的最新動態。請通過以下訂閱嘗試一下：

Hyperparameter optimization is a key aspect of the lifecycle of machine learning applications. While methods such as grid search are incredibly effective for optimizing hyperparameters for specific isolated models, they are very difficult to scale across large permutations of models and experiments. A company like Facebook operates thousands of concurrent machine learning models that need to be constantly tuned. To achieve that, Facebook engineering teams need to regularly conduct A/B tests in order to determine the right hyperparameter configuration. Data in those tests is difficult to collect and they are typically conducted in isolation of each other which end up resulting in very computationally expensive exercises. One of the most innovative approaches in this area came from a team of AI researchers from Facebook who published a paper proposing a method based on Bayesian optimization to adaptively design rounds of A/B tests based on the results of prior tests.

超參數優化是機器學習應用程序生命周期的關鍵方面。盡管諸如網格搜索之類的方法對于優化特定隔離模型的超參數非常有效，但它們很難在較大的模型和實驗排列范圍內擴展。像Facebook這樣的公司運營著成千上萬個需要不斷調整的并發機器學習模型。為此，Facebook工程團隊需要定期進行A / B測試，以確定正確的超參數配置。這些測試中的數據很難收集，它們通常彼此隔離地進行，最終導致計算量很大。該領域最具創新性的方法之一來自Facebook的一組AI研究人員，他們發表了一篇論文，提出了一種基于貝葉斯優化的方法，該方法可以根據先前測試的結果自適應地設計A / B測試輪次。

為什么要進行貝葉斯優化？ (Why Bayesian Optimization?)

Bayesian optimization is a powerful method for solving black-box optimization problems that involve expensive function evaluations. Recently, Bayesian optimization has evolved as an important technique for optimizing hyperparameters in machine learning models. Conceptually, Bayesian optimization starts by evaluating a small number of randomly selected function values, and fitting a Gaussian process (GP) regression model to the results. The GP posterior provides an estimate of the function value at each point, as well as the uncertainty in that estimate. The GP works well for Bayesian optimization because it provides excellent uncertainty estimates and is analytically tractable. It provides an estimate of how an online metric varies with the parameters of interest.

貝葉斯優化是一種解決黑盒優化問題的強大方法，該問題涉及昂貴的函數評估。最近，貝葉斯優化已發展成為一種優化機器學習模型中超參數的重要技術。從概念上講，貝葉斯優化首先評估少量隨機選擇的函數值，然后將高斯過程(GP)回歸模型擬合到結果中。 GP后驗提供了每個點上函數值的估計以及該估計中的不確定性。 GP對于貝葉斯優化非常有效，因為它提供了出色的不確定性估計并且在分析上易于處理。它提供了有關在線指標如何隨目標參數變化的估計。

Let’s imagine an environment in which we are conducting random and regular experiments on machine learning models. In that scenario, Bayesian optimization can be used to construct a statistical model of the relationship between the parameters and the online outcomes of interest and uses that model to decide which experiments to run. The concept is well illustrated in the following figure in which each data marker corresponds to the outcome of an A/B test of that parameter value. We can use the GP to decide which parameter to test next by balancing exploration (high uncertainty) with exploitation (good model estimate). This is done by computing an acquisition function that estimates the value of running an experiment with any given parameter value.

讓我們想象一下一個環境，在該環境中我們將對機器學習模型進行隨機和常規實驗。在這種情況下，貝葉斯優化可用于構建參數與感興趣的在線結果之間關系的統計模型，并使用該模型來確定要運行的實驗。下圖很好地說明了該概念，其中每個數據標記都對應于該參數值的A / B測試結果。我們可以使用GP來決定探索(高不確定性)與開發(良好模型估計)之間的平衡，然后再測試哪個參數。這是通過計算獲取函數來完成的，該獲取函數會估計使用任何給定參數值運行實驗的值。

Source: https://projecteuclid.org/download/pdfview_1/euclid.ba/1533866666資料來源： https : //projecteuclid.org/download/pdfview_1/euclid.ba/1533866666

The fundamental goal of Bayesian optimization when applied to hyperparameter optimization is to determine how valuable is an experiment for a specific hyperparameter configuration. Conceptually, Bayesian optimization works very efficiently for isolated models but its value proposition is challenged when used in scenarios running random experiments. The fundamental challenge is related to the noise introduced in the observations.

將貝葉斯優化應用于超參數優化時，其基本目標是確定實驗對于特定超參數配置的價值。從概念上講，貝葉斯優化對于隔離模型非常有效，但是在用于隨機實驗的場景中使用貝葉斯優化時，其價值主張受到挑戰。基本挑戰與觀測結果中引入的噪聲有關。

噪聲和貝葉斯優化 (Noise and Bayesian Optimization)

Random experiments in machine learning systems introduce high levels of noise in the observations. Additionally, many of the constraints for a given experiment can be considered noisy data in and out itself which can affect the results of an experiment. Suppose that we are trying to evaluate the value of a function f(x) for a given observation x. With observation noise, we now have uncertainty not only in the value f(x), but we also have uncertainty in which observation is the current best, x*, and its value, f(x*).

機器學習系統中的隨機實驗在觀察結果中引入了高水平的噪聲。此外，給定實驗的許多約束條件都可以被視為噪聲輸入和輸出本身，這會影響實驗結果。假設我們正在嘗試評估給定觀察值x的函數f(x)的值。對于觀察噪聲，我們現在不僅在值f(x)上具有不確定性，而且在當前最佳觀察值x *及其值f(x *)上也具有不確定性。

Typically, Bayesian optimization models use heuristics to handle noisy observations but those perform very poorly with high levels of noise. To address this challenge, the Facebook team came up with a clever answer: why not to factor in noise as part of the observations?

通常，貝葉斯優化模型使用啟發式方法來處理嘈雜的觀察結果，但是在高噪聲水平下，它們的表現非常差。為了應對這一挑戰，Facebook團隊提出了一個聰明的答案：為什么不將噪音作為觀察的一部分呢？

Imagine if, instead of computing the expectation of observing f(x) we observe yi = f(xi) + €i, where €i is the observation noise. Mathematically, GP works similarly with noise observation as it does with noiseless data. Without going crazy about the math, in their research paper, the Facebook team showed that this type of approximation is very well suited for Monte Carlo optimizations which yield incredibly accurate results estimating the correct observation.

想象一下，如果不是觀察f(x)的期望，而是觀察yi = f ( x i )+ €i ，其中€i是觀察噪聲。在數學上，GP與無噪聲數據的工作原理相似。在研究數學的過程中，Facebook團隊毫不猶豫地表示，這種近似方法非常適合蒙特卡洛優化方法，該方法產生難以置信的準確結果，可估計正確的觀測值。

帶有噪聲數據的貝葉斯優化 (Bayesian Optimization with Noisy Data in Action)

The Facebook team tested their research in a couple of real world scenarios at Facebook scale. The first was to optimize 6 parameters of one of Facebook’s ranking systems. The second example was to optimize 7 numeric compiler flags related to CPU usage used in their HipHop Virtual Machine(HHVM). For that second experiment, the first 30 iterations were randomly created. At that point, the Bayesian optimization with noisy data method was able to identity CPU time as the hyperparameter configuration that needed to be evaluated and started running different experiments to optimize its value. The results are clearly illustrated in the following figure.

Facebook團隊在Facebook規模的幾個真實場景中測試了他們的研究。首先是優化Facebook排名系統之一的6個參數。第二個示例是優化與HipHop虛擬機(HHVM)中使用的CPU使用率相關的7個數字編譯器標志。對于第二個實驗，隨機創建了前30個迭代。那時，帶噪聲數據的貝葉斯優化方法能夠將CPU時間標識為需要評估的超參數配置，并開始運行不同的實驗以優化其價值。下圖清楚地說明了結果。

Source: https://projecteuclid.org/download/pdfview_1/euclid.ba/1533866666資料來源： https : //projecteuclid.org/download/pdfview_1/euclid.ba/1533866666

Techniques such as Bayesian optimization with noisy data are incredibly powerful in large scale machine learning algorithms. While we have done a lot of work on optimization methods, most of those methods remain highly theoretical. Its nice to see Facebook pushing the boundaries of this nascent space.

在大型機器學習算法中，諸如帶噪聲數據的貝葉斯優化之類的技術非常強大。盡管我們在優化方法方面做了大量工作，但其中大多數方法仍具有很高的理論性。很高興看到Facebook突破了這個新興領域的界限。

翻譯自: https://medium.com/dataseries/facebook-uses-bayesian-optimization-to-conduct-better-experiments-in-machine-learning-models-6f834169d005